1

Evaluation Engineer Jobs (NOW HIRING)

LTS is seeking a RAG & Evaluation Engineer to join a small, senior engineering team applying frontier AI to one of the most consequential legacy systems still running in production today. The mission ...

LTS is seeking a RAG & Evaluation Engineer to join a small, senior engineering team applying frontier AI to one of the most consequential legacy systems still running in production today. The mission ...

Test & Evaluation Engineer (DHS/CBP Systems) Company: Think Evolutionary (on behalf of our client, C2C) Location: Remote (U.S.) with travel to DHS/CBP sites; potential future hybrid in Washington, DC ...

next page

Showing results 1-20

Evaluation Engineer information

See salary details

$55.5K

$128.4K

$166K

How much do evaluation engineer jobs pay per year?

As of Jun 23, 2026, the average yearly pay for evaluation engineer in the United States is $128,450.00, according to ZipRecruiter salary data. Most workers in this role earn between $118,500.00 and $150,000.00 per year, depending on experience, location, and employer.

What does an evaluation engineer do?

An evaluation engineer assesses the performance, reliability, and safety of products, systems, or processes through testing, analysis, and data collection. They often use specialized tools and techniques to ensure standards are met and may prepare reports for stakeholders. This role typically requires strong analytical skills and knowledge of engineering principles.

What engineers make $300,000 a year?

Senior engineers in specialized fields such as petroleum, aerospace, or software engineering can earn $300,000 or more annually, especially with extensive experience, advanced skills, and leadership roles. High compensation often involves working in high-demand industries, holding managerial or executive positions, or possessing rare technical expertise and certifications.

What engineers make $500,000?

Senior engineers in specialized fields such as petroleum, aerospace, or software engineering with extensive experience and advanced skills can earn $500,000 or more annually. High compensation often involves leadership roles, bonuses, stock options, or working in high-demand industries with complex projects.

What is an Evaluation Engineer?

An Evaluation Engineer is a professional who assesses products, systems, or processes to ensure they meet specified standards and performance criteria. They are responsible for designing and conducting tests, analyzing results, and recommending improvements or changes. Evaluation Engineers work in various industries, including manufacturing, electronics, software, and automotive, to support product development and quality assurance. Their work helps companies deliver reliable and effective products to the market.

What are some common challenges faced by Evaluation Engineers when assessing new products or systems?

Evaluation Engineers often encounter challenges such as tight project deadlines, rapidly evolving technology, and the need to balance thorough testing with efficiency. They may also face difficulties in obtaining comprehensive data or replicating real-world scenarios during evaluations. Collaborating closely with cross-functional teams—like design, manufacturing, and quality assurance—is essential to address these challenges and ensure accurate, actionable results.

What is the difference between Evaluation Engineer vs Test Engineer?

AspectEvaluation EngineerTest Engineer
Required CredentialsBachelor's in Engineering, certifications in testing or evaluation methodsBachelor's in Engineering, certifications in testing or quality assurance
Work EnvironmentResearch labs, product development, quality assessmentManufacturing plants, testing labs, product validation
Industry UsageUsed in electronics, aerospace, automotive for evaluating performanceUsed across industries for testing products and systems

Evaluation Engineers focus on assessing product performance, reliability, and compliance through detailed analysis, often in research or development settings. Test Engineers primarily execute testing procedures to identify defects and ensure quality during manufacturing or pre-release stages. While both roles require technical skills and certifications, Evaluation Engineers emphasize evaluation and analysis, whereas Test Engineers concentrate on testing execution and defect detection.

What engineer is in highest demand?

Evaluation engineers are in high demand in industries such as manufacturing, aerospace, and electronics, especially those with skills in testing, data analysis, and quality assurance. Their expertise in assessing product performance and compliance makes them valuable as companies prioritize reliability and safety, often requiring certifications and proficiency with testing tools. The demand for evaluation engineers continues to grow with advancements in technology and quality standards.

What are the key skills and qualifications needed to thrive as an Evaluation Engineer, and why are they important?

To thrive as an Evaluation Engineer, you need a solid background in engineering principles, analytical problem-solving, and experience with product testing, often supported by a degree in engineering or a related field. Familiarity with testing equipment, data analysis tools (such as MATLAB or LabVIEW), and industry-specific standards or certifications is typically required. Strong attention to detail, effective communication, and collaboration skills help Evaluation Engineers accurately assess products and share findings with cross-functional teams. These skills are crucial for ensuring product quality, safety, and compliance with regulatory and customer requirements.
More about Evaluation Engineer jobs
What cities are hiring for Evaluation Engineer jobs? Cities with the most Evaluation Engineer job openings:
Who are the top companies hiring for Evaluation Engineer jobs? The top employers for Evaluation Engineer jobs are:
What states have the most Evaluation Engineer jobs? States with the most job openings for Evaluation Engineer jobs include:

LLM / RAG Evaluation Engineer

Prophecy Technologies

Austin, TX • On-site

Full-time

Posted 20 days ago


Job description

Job Summary
We are seeking an experienced LLM / RAG Evaluation Engineer to design, implement, and scale evaluation frameworks for Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, and agentic AI workflows. This role focuses on assessing quality, reliability, safety, robustness, and performance of production-grade Generative AI systems used in real-world applications.
Key Responsibilities
  • Design and execute LLM response evaluation pipelines, including automated and human-in-the-loop approaches
  • Evaluate RAG systems for retrieval accuracy, grounding, relevance, and hallucination detection
  • Build and apply evaluation metrics for agentic AI systems, including:
  • Multi-step reasoning
  • Tool usage
  • Planning and memory workflows
  • Develop Python-based evaluation frameworks, benchmarks, and testing utilities
  • Analyze model outputs, identify failure modes, and provide actionable insights to improve system performance
  • Define and track KPIs for Generative AI systems, covering quality, safety, robustness, and trustworthiness
  • Collaborate with ML engineers, researchers, and product teams to improve GenAI architectures
  • Validate and compare prompt strategies, retrieval strategies, and system designs
  • Clearly document evaluation methodologies, results, and recommendations for stakeholders

Required Skills & Experience
  • Strong proficiency in Python
  • Proven experience in LLM response evaluation (quality, coherence, accuracy, bias, hallucinations)
  • Hands-on experience with RAG systems and retrieval-based architectures
  • Understanding of agentic AI systems and multi-step reasoning workflows
  • Experience evaluating Generative AI systems in real or near-production environments
  • Knowledge of NLP fundamentals and LLM behavior
  • Experience with prompt engineering, prompt testing, and prompt evaluation

Preferred Skills
  • Experience with LLM orchestration frameworks (LangChain, LlamaIndex, etc.)
  • Familiarity with automated evaluation tools, benchmarks, and scoring frameworks
  • Experience designing or managing human evaluation workflows
  • Understanding of AI safety, reliability, bias, and trustworthiness principles
  • Prior experience evaluating production-grade GenAI systems

Nice to Have
  • Experience with vector databases and retrieval pipelines
  • Exposure to cloud-based AI platforms
  • Research or experimentation background in LLM evaluation and benchmarking