1

Huggingface Jobs (NOW HIRING)

LLM Infrastructure Engineer

Houston, TX · On-site

$97K - $127K/yr

Build and deploy LLM inference services using HuggingFace Transformers and PyTorch * Optimize GPU workloads and CUDA memory usage * Implement streaming inference APIs for real-time model responses

Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar * Experience with queues, scheduling, traffic-control, fleet management at scale * Experience with ...

Proficiency in Python and libraries such as pytorch and huggingface transformers. * Experience with LLMs providers (OpenAI, Cohere, Anthropic, Huggingface or related) is preferred. * Familiarity ...

Proficiency in Python and libraries such as pytorch and huggingface transformers. * Experience with LLMs providers (OpenAI, Cohere, Anthropic, Huggingface or related) is preferred. * Familiarity ...

... HuggingFace Transformers. • Experience with LLM fine-tuning techniques: LoRA, QLoRA, RLHF, or instruction-tuning. • Hands-on with vector search, embeddings, and semantic retrieval (RAG ...

Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar * Experience with queues, scheduling, traffic-control, fleet management at scale * Experience with ...

next page

Showing results 1-20

Huggingface information

See salary details

$8

$26

$61

How much do huggingface jobs pay per hour?

As of Jun 16, 2026, the average hourly pay for huggingface in the United States is $26.34, according to ZipRecruiter salary data. Most workers in this role earn between $15.14 and $30.77 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive in the Huggingface position, and why are they important?

To thrive in a role at Hugging Face, you typically need strong skills in machine learning, natural language processing (NLP), and software development, supported by a relevant degree in computer science or a related field. Familiarity with frameworks like PyTorch or TensorFlow, plus experience using version control systems such as Git, are often required; open-source contributions and cloud platform knowledge are a plus. Excellent communication, collaborative teamwork, and problem-solving abilities help candidates stand out in this dynamic, innovation-driven environment. These strengths are crucial because they enable individuals to develop high-impact AI tools, work effectively in interdisciplinary teams, and contribute to open-source communities.

What does a typical day look like for an engineer working at Hugging Face?

As an engineer at Hugging Face, your day typically involves collaborating with team members to design, develop, and improve state-of-the-art machine learning models and tools, with a strong focus on open-source NLP projects. You’ll participate in code reviews, experiment with new technologies, engage with the community through forums or GitHub, and help support user questions or issues. Expect a fast-paced, collaborative environment where cross-functional teamwork with product managers, researchers, and other engineers is common. The work is project-driven, with plenty of opportunities to contribute ideas, learn from experts, and advance your technical skills.

What is a Huggingface job?

A Hugging Face job typically refers to a role at Hugging Face, a company specializing in machine learning and natural language processing (NLP). Employees at Hugging Face work on developing and maintaining open-source AI tools, including the popular Transformers library. Roles range from research and engineering to product and community development, often focusing on advancing state-of-the-art AI models.

What cities are hiring for Huggingface jobs? Cities with the most Huggingface job openings:
What are the most commonly searched types of Huggingface jobs? The most popular types of Huggingface jobs are:
What states have the most Huggingface jobs? States with the most job openings for Huggingface jobs include:
Infographic showing various Huggingface job openings in the United States as of June 2026, with employment types broken down into 25% Full Time, 25% Part Time, and 50% Contract. Highlights an 75% In-person, and 25% Remote job distribution, with an average salary of $54,791 per year, or $26.3 per hour.

LLM Infrastructure Engineer

AMSYS Talent

Houston, TX • On-site

$97K - $127K/yr

Full-time

This job post has expired today. Applications are no longer accepted.


Job description

We are looking for a Senior Python / AI API Engineer to build and deploy production-grade services powering Large Language Model (LLM) applications. This role focuses on developing high-performance APIs for model inference, optimizing GPU workloads, and deploying AI services in cloud environments.
This is an engineering-focused role, not research. We are looking for someone who has built and shipped AI systems into production and understands the challenges of scalable inference and model serving.
Key Responsibilities
  • Develop high-performance APIs using Python (3.10+) and FastAPI
  • Build and deploy LLM inference services using HuggingFace Transformers and PyTorch
  • Optimize GPU workloads and CUDA memory usage
  • Implement streaming inference APIs for real-time model responses
  • Containerize and deploy services using Docker and GPU-enabled infrastructure
  • Deploy AI workloads in Azure environments (AKS, ACI, or Container Apps)

Required Skills
  • Strong Python development experience (3.10+)
  • Hands-on experience building production APIs with FastAPI
  • Experience with HuggingFace Transformers and PyTorch
  • Solid understanding of REST API design
  • Experience deploying containerized applications with Docker

Nice to Have
  • Experience with OpenAI-compatible APIs, vLLM, or Text Generation Inference (TGI)
  • Experience deploying AI workloads on Azure GPU infrastructure
  • Familiarity with LoRA / PEFT fine-tuning
  • Exposure to legal or financial NLP use cases

Ideal Candidate: A hands-on engineer who understands how LLM systems run in production-from model loading and tokenization to GPU deployment and scalable APIs.