1

Llm Ml Rag Jobs (NOW HIRING)

Staff AI/ML Engineer

San Francisco, CA · On-site

$250K - $350K/yr

Hands-on LLM/agent building (e.g., LangChain/Graph, CrewAI) and tuning to quality benchmarks ... Experience with semantic search, RAG, and vector databases. * Experience with prompt engineering ...

EU AI Act, MITRE ATLAS, OWASP Top 10 for LLM/ML * Hands‑on AI security experience : RAG, MCP, Agentic systems (design → production) * Security architecture & engineering : Proven track record ...

Staff AI/ML Engineer

San Francisco, CA · On-site +1

$250K - $350K/yr

Hands-on LLM/agent building (e.g., LangChain/Graph, CrewAI) and tuning to quality benchmarks ... Experience with semantic search, RAG, and vector databases. * Experience with prompt engineering ...

Own and optimize pipelines that combine classical ML and LLM-based systems (RAG, scoring, summarization, etc.) * Fine-tune and evaluate LLMs using both open-source and proprietary data * Collaborate ...

AI/ML Architect

Irvine, CA · On-site

$68.50 - $88/hr

LLM & Generative AI Development * Agent-Based Systems * Retrieval-Augmented Systems (RAG) * Enterprise AI Integration * AI/ML, Data Science, AI Architecture, Python, LLM, CI/CD Must Have Skills: • ...

Python/Gen AI Developer

Reston, VA · On-site

$52.25 - $72/hr

SQL * AWS Data Services * LLM * ML * Sagemaker * Bedrock Top 3 Soft Skills: * Confidence in ... Familiarity with vector databases (e.g., FAISS, Pinecone) and retrieval-augmented generation (RAG)

Gen AI/Python Developer

Reston, VA · On-site

$52.25 - $72/hr

SQL * AWS Data Services * LLM * ML * Sagemaker * Bedrock Top 3 Soft Skills: * Confidence in ... Familiarity with vector databases (e.g., FAISS, Pinecone) and retrieval-augmented generation (RAG)

next page

Showing results 1-20

Llm Ml Rag information

See salary details

$45K

$75.3K

$110K

How much do llm ml rag jobs pay per year?

As of Jun 25, 2026, the average yearly pay for llm ml rag in the United States is $75,300.00, according to ZipRecruiter salary data. Most workers in this role earn between $62,000.00 and $87,000.00 per year, depending on experience, location, and employer.

What are some typical challenges faced when working on Retrieval-Augmented Generation (RAG) systems in large language model (LLM) machine learning roles?

Professionals working on LLM ML RAG systems often encounter challenges such as ensuring the accuracy and relevancy of retrieved documents, managing latency for real-time queries, and seamlessly integrating retrieval mechanisms with generation models. Additionally, keeping up with evolving datasets and maintaining high-quality knowledge bases can be demanding. Collaboration with data engineers and domain experts is common to refine retrieval pipelines and optimize the end-to-end system.

What is the difference between Llm Ml Rag vs Data Scientist?

AspectLlm Ml RagData Scientist
Required CredentialsMaster's or PhD in ML, AI, or related fields; certifications in ML frameworksDegree in Computer Science, Statistics, or related; certifications in data analysis or ML
Work EnvironmentResearch labs, AI development teams, tech companiesBusiness analytics, research, product development teams
Employer & Industry UsageTech firms, AI startups, research institutionsFinance, healthcare, tech, consulting firms
Common Search & ComparisonOften compared for ML specialization and research focusCompared for data analysis, modeling, and business insights

While both roles involve working with machine learning, Llm Ml Rag typically focuses on research and development of large language models, requiring advanced ML expertise. Data Scientists often work on analyzing data, building predictive models, and deriving insights for business decisions. The roles overlap in skills but differ in focus and application areas.

What are the key skills and qualifications needed to thrive as an LLM ML RAG (Retrieval-Augmented Generation) Engineer, and why are they important?

To excel as an LLM ML RAG Engineer, you need a strong background in machine learning, natural language processing, and large language models, typically supported by a degree in computer science or a related field. Proficiency with tools and frameworks like Python, PyTorch/TensorFlow, Hugging Face Transformers, and vector databases (e.g., FAISS, Pinecone) is essential, along with experience in deploying and fine-tuning LLMs and integrating retrieval systems. Strong problem-solving skills, attention to detail, and the ability to collaborate with cross-functional teams distinguish top performers in this role. These skills ensure the effective development and deployment of advanced AI solutions that combine generative and retrieval capabilities for high-impact applications.

What are LLM ML RAG jobs?

LLM ML RAG jobs involve working with Large Language Models (LLMs), Machine Learning (ML), and Retrieval-Augmented Generation (RAG) systems. Professionals in these roles typically design, develop, and optimize AI systems that combine language models with retrieval techniques to improve accuracy, relevance, and factual grounding in generated outputs. These jobs often require expertise in natural language processing, deep learning, data engineering, and information retrieval. Key responsibilities might include integrating RAG pipelines, fine-tuning LLMs, and ensuring high-quality responses from AI applications.
More about Llm Ml Rag jobs
What cities are hiring for Llm Ml Rag jobs? Cities with the most Llm Ml Rag job openings:
What states have the most Llm Ml Rag jobs? States with the most job openings for Llm Ml Rag jobs include:
Infographic showing various Llm Ml Rag job openings in the United States as of June 2026, with employment types broken down into 97% Full Time, 1% Part Time, and 2% Contract. Highlights an 81% Physical, 4% Hybrid, and 15% Remote job distribution, with an average salary of $75,300 per year, or $36.2 per hour.

LLM / RAG Evaluation Engineer

Prophecy Technologies

Austin, TX • On-site

Full-time

Posted 23 days ago


Job description

Job Summary
We are seeking an experienced LLM / RAG Evaluation Engineer to design, implement, and scale evaluation frameworks for Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, and agentic AI workflows. This role focuses on assessing quality, reliability, safety, robustness, and performance of production-grade Generative AI systems used in real-world applications.
Key Responsibilities
  • Design and execute LLM response evaluation pipelines, including automated and human-in-the-loop approaches
  • Evaluate RAG systems for retrieval accuracy, grounding, relevance, and hallucination detection
  • Build and apply evaluation metrics for agentic AI systems, including:
  • Multi-step reasoning
  • Tool usage
  • Planning and memory workflows
  • Develop Python-based evaluation frameworks, benchmarks, and testing utilities
  • Analyze model outputs, identify failure modes, and provide actionable insights to improve system performance
  • Define and track KPIs for Generative AI systems, covering quality, safety, robustness, and trustworthiness
  • Collaborate with ML engineers, researchers, and product teams to improve GenAI architectures
  • Validate and compare prompt strategies, retrieval strategies, and system designs
  • Clearly document evaluation methodologies, results, and recommendations for stakeholders

Required Skills & Experience
  • Strong proficiency in Python
  • Proven experience in LLM response evaluation (quality, coherence, accuracy, bias, hallucinations)
  • Hands-on experience with RAG systems and retrieval-based architectures
  • Understanding of agentic AI systems and multi-step reasoning workflows
  • Experience evaluating Generative AI systems in real or near-production environments
  • Knowledge of NLP fundamentals and LLM behavior
  • Experience with prompt engineering, prompt testing, and prompt evaluation

Preferred Skills
  • Experience with LLM orchestration frameworks (LangChain, LlamaIndex, etc.)
  • Familiarity with automated evaluation tools, benchmarks, and scoring frameworks
  • Experience designing or managing human evaluation workflows
  • Understanding of AI safety, reliability, bias, and trustworthiness principles
  • Prior experience evaluating production-grade GenAI systems

Nice to Have
  • Experience with vector databases and retrieval pipelines
  • Exposure to cloud-based AI platforms
  • Research or experimentation background in LLM evaluation and benchmarking