HuggingFace Hub * Experience with large model deployments (open-source LLMs preferred): LLaMA, * Mistral, Falcon, Mixtral * Comfortable with tuning libraries (HuggingFace Trainer, DeepSpeed, FSDP ...
Quick apply
HuggingFace Hub * Experience with large model deployments (open-source LLMs preferred): LLaMA, * Mistral, Falcon, Mixtral * Comfortable with tuning libraries (HuggingFace Trainer, DeepSpeed, FSDP ...
Quick apply
HuggingFace Hub * Experience with large model deployments (open-source LLMs preferred): LLaMA, * Mistral, Falcon, Mixtral * Comfortable with tuning libraries (HuggingFace Trainer, DeepSpeed, FSDP ...
OpenAI, HuggingFace models, Azure OpenAI Service * Prompt engineering, embeddings (e.g., FAISS, Pinecone) * Fine-tuning and model adaptation for domain-specific datasets * Python, RESTful APIs ...
Quick apply
OpenAI, HuggingFace models, Azure OpenAI Service * Prompt engineering, embeddings (e.g., FAISS, Pinecone) * Fine-tuning and model adaptation for domain-specific datasets * Python, RESTful APIs ...
$187K - $395K/yr
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar * Experience with queues, scheduling, traffic-control, fleet management at scale * Experience with ...
$187K - $395K/yr
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar * Experience with queues, scheduling, traffic-control, fleet management at scale * Experience with ...
... g., Huggingface, NLTK, spaCy, transformer-based models); with largescale data pipelines using distributed processing frameworks (e.g., Spark, Databricks); and collaborating with cross-functional ...
... g., Huggingface, NLTK, spaCy, transformer-based models); with largescale data pipelines using distributed processing frameworks (e.g., Spark, Databricks); and collaborating with cross-functional ...
... HuggingFace Transformers. • Experience with LLM fine-tuning techniques: LoRA, QLoRA, RLHF, or instruction-tuning. • Hands-on with vector search, embeddings, and semantic retrieval (RAG ...
... HuggingFace Transformers. • Experience with LLM fine-tuning techniques: LoRA, QLoRA, RLHF, or instruction-tuning. • Hands-on with vector search, embeddings, and semantic retrieval (RAG ...
$187K - $395K/yr
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar * Experience with queues, scheduling, traffic-control, fleet management at scale * Experience with ...
$187K - $395K/yr
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar * Experience with queues, scheduling, traffic-control, fleet management at scale * Experience with ...
... Huggingface), Emad Mostaque (Stability AI) and many others. Tell us what excites you about PrimeIntellect, something impressive that that you've built, and how you'd accelerate open and decentralized ...
... Huggingface), Emad Mostaque (Stability AI) and many others. Tell us what excites you about PrimeIntellect, something impressive that that you've built, and how you'd accelerate open and decentralized ...
Deep experience fine-tuning open-source LLMs using HuggingFace Transformers, DeepSpeed, vLLM, FSDP, LoRA/QLoRA * Worked with both base and instruction-tuned models; familiar with SFT, RLHF, DPO ...
Quick apply
Deep experience fine-tuning open-source LLMs using HuggingFace Transformers, DeepSpeed, vLLM, FSDP, LoRA/QLoRA * Worked with both base and instruction-tuned models; familiar with SFT, RLHF, DPO ...
Required : • 5+ years of industry experience in Machine Learning, Infrastructure or related fields • Experience with deep learning framework such as Pytorch or Huggingface or LLM serving ...
Required : • 5+ years of industry experience in Machine Learning, Infrastructure or related fields • Experience with deep learning framework such as Pytorch or Huggingface or LLM serving ...
San Jose, CA · On-site
$240K - $252K/yr
... g., Huggingface, NLTK, spaCy, transformer-based models); with largescale data pipelines using distributed processing frameworks (e.g., Spark, Databricks); and collaborating with cross-functional ...
San Jose, CA · On-site
$240K - $252K/yr
... g., Huggingface, NLTK, spaCy, transformer-based models); with largescale data pipelines using distributed processing frameworks (e.g., Spark, Databricks); and collaborating with cross-functional ...
San Francisco, CA · On-site
$150K - $225K/yr
Have significant experience with PyTorch, HuggingFace, or similar libraries * Familiar with a modern RL training framework * Are comfortable working long hours in a high-intensity, early-stage ...
San Francisco, CA · On-site
$150K - $225K/yr
Have significant experience with PyTorch, HuggingFace, or similar libraries * Familiar with a modern RL training framework * Are comfortable working long hours in a high-intensity, early-stage ...
San Francisco, CA · On-site
D. in computer science, machine learning, or a related field, with 5+ years of related research experience. • Familiar with relevant frameworks and libraries (e.g., pytorch and huggingface). • ...
San Francisco, CA · On-site
D. in computer science, machine learning, or a related field, with 5+ years of related research experience. • Familiar with relevant frameworks and libraries (e.g., pytorch and huggingface). • ...
San Francisco, CA · On-site
$150K - $225K/yr
Have basic familiarity with PyTorch, HuggingFace, or similar libraries * Can spin up a GPU cluster and train/evaluate a model * Are comfortable working long hours in a high-intensity, early-stage ...
San Francisco, CA · On-site
$150K - $225K/yr
Have basic familiarity with PyTorch, HuggingFace, or similar libraries * Can spin up a GPU cluster and train/evaluate a model * Are comfortable working long hours in a high-intensity, early-stage ...
Deepspeed, Huggingface TGI, FSDP) * Experience in projects involving LLMs
Deepspeed, Huggingface TGI, FSDP) * Experience in projects involving LLMs
Leverage a broad stack of technologies - Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more - to reveal the insights hidden within huge volumes of numeric and textual data.
Leverage a broad stack of technologies - Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more - to reveal the insights hidden within huge volumes of numeric and textual data.
Leverage a broad stack of technologies - Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more - to reveal the insights hidden within huge volumes of numeric and textual data.
Leverage a broad stack of technologies - Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more - to reveal the insights hidden within huge volumes of numeric and textual data.
... and HuggingFace Transformers, with a good understanding of statistical analysis and shell programming • Must be fluent in English. • The duration of the internship is at least 4 months but ...
... and HuggingFace Transformers, with a good understanding of statistical analysis and shell programming • Must be fluent in English. • The duration of the internship is at least 4 months but ...
Leverage a broad stack of technologies - Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more - to reveal the insights hidden within huge volumes of numeric and textual data.
Leverage a broad stack of technologies - Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more - to reveal the insights hidden within huge volumes of numeric and textual data.
Deepspeed, Huggingface TGI) * Experience in turning applied research results into product components
Deepspeed, Huggingface TGI) * Experience in turning applied research results into product components
Torrance, CA · On-site
Experience with ML/LLM libraries such as vLLM, LangChain, PyTorch, and HuggingFace. * Practical experience developing, deploying, and scaling AI Agents in a production environment. * Ability to ...
Torrance, CA · On-site
Experience with ML/LLM libraries such as vLLM, LangChain, PyTorch, and HuggingFace. * Practical experience developing, deploying, and scaling AI Agents in a production environment. * Ability to ...
$8.71 - $13.42
16% of jobs
$14.86 is the 25th percentile. Wages below this are outliers.
$13.42 - $18.13
29% of jobs
The median wage is $19.30 / hr.
$18.13 - $22.83
19% of jobs
$27.01 is the 75th percentile. Wages above this are outliers.
$22.83 - $27.54
12% of jobs
$27.54 - $32.25
8% of jobs
$32.25 - $36.96
5% of jobs
$36.96 - $41.66
4% of jobs
$41.66 - $46.37
2% of jobs
$46.37 - $51.08
2% of jobs
$51.08 - $55.79
1% of jobs
$55.79 - $60.50
1% of jobs
$8
$25
$60
To thrive in a role at Hugging Face, you typically need strong skills in machine learning, natural language processing (NLP), and software development, supported by a relevant degree in computer science or a related field. Familiarity with frameworks like PyTorch or TensorFlow, plus experience using version control systems such as Git, are often required; open-source contributions and cloud platform knowledge are a plus. Excellent communication, collaborative teamwork, and problem-solving abilities help candidates stand out in this dynamic, innovation-driven environment. These strengths are crucial because they enable individuals to develop high-impact AI tools, work effectively in interdisciplinary teams, and contribute to open-source communities.
As an engineer at Hugging Face, your day typically involves collaborating with team members to design, develop, and improve state-of-the-art machine learning models and tools, with a strong focus on open-source NLP projects. You’ll participate in code reviews, experiment with new technologies, engage with the community through forums or GitHub, and help support user questions or issues. Expect a fast-paced, collaborative environment where cross-functional teamwork with product managers, researchers, and other engineers is common. The work is project-driven, with plenty of opportunities to contribute ideas, learn from experts, and advance your technical skills.
A Hugging Face job typically refers to a role at Hugging Face, a company specializing in machine learning and natural language processing (NLP). Employees at Hugging Face work on developing and maintaining open-source AI tools, including the popular Transformers library. Roles range from research and engineering to product and community development, often focusing on advancing state-of-the-art AI models.
Full-time
Posted 9 days ago
ML Ops Engineer — Agentic AI Lab (Founding Team)
Location: San Francisco Bay Area
Type: Full-Time
Compensation: Competitive salary + meaningful equity (founding tier)
Backed by 8VC, we're building a world-class team to tackle one of the industry’s most critical infrastructure problems.
About the RoleOur AI Lab is pioneering the future of intelligent infrastructure through open-source LLMs, agent-native pipelines, retrieval-augmented generation (RAG), and knowledge-graph-grounded models.
We’re hiring an ML Ops Engineer to be the glue between ML research and production systems — responsible for automating the model training, deployment, versioning, and observability pipelines that power our agents and AI data fabric.
You’ll work across compute orchestration, GPU infrastructure, fine-tuned model lifecycle management, model governance, and security e
Responsibilities
Build and maintain secure, scalable, and automated pipelines for:
LLM fine-tuning, SFT, LoRA, RLHF, DPO training
RAG embedding pipelines with dynamic updates
Model conversion, quantization, and inference rollout
Manage hybrid compute infrastructure (cloud, on-prem, GPU clusters) for training and
inference workloads using Kubernetes, Ray, and Terraform
Containerize models and agents using Docker, with reproducible builds and CI/CD via
GitHub Actions or ArgoCD
Implement and enforce model governance: versioning, metadata, lineage, reproducibility,
and evaluation capture
Create and manage evaluation and benchmarking frameworks (e.g. OpenLLM-Evals,
RAGAS, LangSmith)
Integrate with security and access control layers (OPA, ABAC, Keycloak) to enforce
model policies per tenant
Instrument observability for model latency, token usage, performance metrics, error
tracing, and drift detection
Support deployment of agentic apps with LangGraph, LangChain, and custom inference
backends (e.g. vLLM, TGI, Triton)
Model Infrastructure:
4+ years in MLOps, ML platform engineering, or infra-focused ML roles
Deep familiarity with model lifecycle management tools: MLflow, Weights & Biases, DVC,
HuggingFace Hub
Experience with large model deployments (open-source LLMs preferred): LLaMA,
Mistral, Falcon, Mixtral
Comfortable with tuning libraries (HuggingFace Trainer, DeepSpeed, FSDP, QLoRA)
Familiarity with inference serving: vLLM, TGI, Ray Serve, Triton Inference Server
Automation + Infra:
Proficient with Terraform, Helm, K8s, and container orchestration
Experience with CI/CD for ML (e.g. GitHub Actions + model checkpoints)
Managed hybrid workloads across GPU cloud (Lambda, Modal, HuggingFace Inference,
Sagemaker)
Familiar with cost optimization (spot instance scaling, batch prioritization, model sharding)
Agent + Data Pipeline Support:●
Familiarity with LangChain, LangGraph, LlamaIndex or similar RAG/agent orchestration tools
Built embedding pipelines for multi-source documents (PDF, JSON, CSV, HTML)
Integrated with vector databases (Weaviate, Qdrant, FAISS, Chroma)
Security & Governance:
Implemented model-level RBAC, usage tracking, audit trails
Integrated with API rate limits, tenant billing, and SLA observability
Experience with policy-as-code systems (OPA, Rego) and access layers
Preferred Stack
LLM Ops: HuggingFace, DeepSpeed, MLflow, Weights & Biases, DVC
Infra: Kubernetes (GKE/EKS), Ray, Terraform, Helm, GitHub Actions, ArgoCD
Serving: vLLM, TGI, Triton, Ray Serve
Pipelines: Prefect, Airflow, Dagster
Monitoring: Prometheus, Grafana, OpenTelemetry, LangSmith
Security: OPA (Rego), Keycloak, Vault
Languages: Python (primary), Bash, optionally Rust or Go for tooling
Mindset & Culture Fit
Builder's mindset with startup autonomy: you automate what slows you down
Obsessive about reproducibility, observability, and traceability
Comfortable with a hybrid team of AI researchers, DevOps, and backend engineers
Interested in aligning ML systems to product delivery, not just papers
Bonus: experience with SOC2, HIPAA, or GovCloud-grade model operations
Experience:
5+ years as a full stack or backend engineer
Experience owning and delivering production systems end-to-end
Prior experience with modern frontend frameworks (React, Next.js)
Familiarity with building APIs, databases, cloud infrastructure, or deployment workflows at scale
Comfortable working in early-stage startups or autonomous roles, prior experience as a founder, founding engineer, or a 0-1 pre-seed startup is a big plus
Mindset:
Comfortable with ambiguity, eager to prototype and iterate quickly
Strong sense of ownership — prefers to build systems rather than wait for tickets
Enjoys thinking about architecture, performance, and tradeoffs at every level
Clear communicator and pragmatic team player
Values equity and impact over prestige or hierarchy
Prior startup or founding team experience
Your work will enable models and agents to be trained, evaluated, deployed, and governed at
scale — across many tenants, models, and tasks. This is the backbone of a secure, reliable,
and scalable AI-native enterprise system. If you dream about using AI to solve some really hard
real world problems – we would love to hear from you.