1

Ai Rater Jobs in Reston, VA (NOW HIRING)

Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...

AI Evaluation Scientist

Mclean, VA · On-site

$105K - $145K/yr

Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...

Meta is looking for an AI Policy Manager to join our AI Policy team. In this role, you will work ... Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual ...

Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...

Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...

AI Requirements Engineer

Arlington, VA · On-site

$123K - $162K/yr

Overview/ Job Responsibilities We are seeking an accomplished AI Requirements Engineer to lead the ... Proven ability to deliver measurable improvements in user experience, adoption rates, and process ...

Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...

Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...

Implement LLMOps to monitor model performance, detect hallucination rates, and manage model versioning and drift. 4. Public Sector Advisory & Governance * Collaborate with the customer's AI Center of ...

We are seeking an accomplished AI Requirements Engineer to lead the end-to-end design, development ... Proven ability to deliver measurable improvements in user experience, adoption rates, and process ...

... rates, analyst productivity gains, cycle time reductions, and product quality improvements. • Provide leadership with data-driven evidence supporting review board decisions to expand AI tool access ...

New

Senior AI Defense Engineer

Washington, DC · On-site

$129K - $177K/yr

... rate limiting, abuse detection). Enable detections and monitoring for AI-specific attack patterns using logs, telemetry, and model signals. Work with platform teams to secure the integration and ...

Senior AI Defense Engineer

Washington, DC · On-site

$129K - $177K/yr

... rate limiting, abuse detection). Enable detections and monitoring for AI-specific attack patterns using logs, telemetry, and model signals. Work with platform teams to secure the integration and ...

next page

Showing results 1-20

Ai Rater information

What is an AI Rater job?

An AI Rater evaluates and provides feedback on artificial intelligence models, typically improving search engines, chatbots, or recommendation systems. They assess the relevance, accuracy, and quality of AI-generated content based on specific guidelines. This role requires strong analytical skills, attention to detail, and familiarity with the subject matter being reviewed. AI Raters often work remotely and on a flexible schedule.

What are the key skills and qualifications needed to thrive in the Ai Rater position, and why are they important?

To thrive as an AI Rater, you generally need strong attention to detail, analytical thinking, and proficiency in English, often supported by formal education such as a high school diploma or higher. Familiarity with web browsers, online research, and company-specific rating platforms or guidelines is essential. Excellent time management, adaptability, and effective written communication help individuals excel in this position. These skills and qualities ensure accurate and consistent evaluations of AI-generated content, directly impacting the improvement of artificial intelligence systems.

What does a typical day look like for an AI Rater?

A typical day for an AI Rater involves reviewing and evaluating various types of content, such as search engine results, social media posts, advertisements, or chatbot responses, to ensure they meet quality and relevancy standards. You may follow detailed guidelines to rate or annotate content, complete assigned tasks in a web-based platform, and provide feedback to help improve AI performance. Most positions are remote and offer flexible schedules, allowing you to plan your workload around personal commitments. Collaboration is generally limited, as most work is performed independently, but periodic communication with team leads for training or updates is common.

What are popular job titles related to Ai Rater jobs in Reston, VA? For Ai Rater jobs in Reston, VA, the most frequently searched job titles are:
What job categories do people searching Ai Rater jobs in Reston, VA look for? The top searched job categories for Ai Rater jobs in Reston, VA are:
What cities near Reston, VA are hiring for Ai Rater jobs? Cities near Reston, VA with the most Ai Rater job openings:
Infographic showing various Ai Rater job openings in Reston, VA as of June 2026, with employment types broken down into 12% Internship, 42% Full Time, 39% Part Time, and 7% Contract. Highlights an 68% In-person, and 32% Remote job distribution.

$105K - $145K/yr

Full-time

Posted 24 days ago


Job description

We are looking for an AI Evaluation Scientist to design and execute evaluation processes that ensure our predictive and generative AI systems are accurate, reliable, safe, and aligned with mission requirements. This role is essential for establishing trust in AI solutions and supporting continuous improvement across the AI lifecycle. The AI Evaluation Scientist will work closely with engineers, data scientists, governance analysts, and product teams to develop evaluation metrics, build test harnesses, analyze model behavior, and support responsible deployment. 


  • Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics.
  • Build and maintain automated evaluation scripts, tests, and pipelines that assess AI model outputs and detect performance drift over time.
  • Develop benchmark datasets, challenge sets, and scenario-based test cases tailored to mission and user needs.
  • Perform structured error analysis and behavioral audits of LLMs, retrieval-augmented generation (RAG) systems, and predictive models, documenting findings and improvement recommendations.
  • Collaborate with AI Developers, LLMOps Engineers, and Data Scientists to support iterative experimentation, model hardening, and quality improvements.
  • Contribute to the design of human-in-the-loop evaluation workflows, integrating qualitative and quantitative insight into evaluation reports.
  • Assist in mapping evaluation outcomes to responsible AI principles such as fairness, transparency, reliability, and safety.
  • Partner with AI Governance Analysts to ensure evaluation outputs support compliance, documentation, and risk assessments.
  • Stay current with emerging evaluation tools, frameworks, metrics, and research related to LLM assessment and generative AI reliability.
  • Document evaluation processes, criteria, and results for both technical and non-technical audiences.
  • You will contribute to the growth of our AI & Data Exploitation Practice! 

  • Ability to hold a position of public trust with the U.S. government.
  • Bachelor’s or Master’s degree in Computer Science, Statistics, Machine Learning, Cognitive Science, Human-Computer Interaction, Data Science, or a related field.
  • 2+ years of experience evaluating machine learning models, NLP systems, or generative AI models (LLMs preferred).
  • Familiarity with evaluation metrics, statistical testing, dataset creation, and experimental design for AI systems.
  • Proficiency in Python and relevant libraries such as PyTorch, Hugging Face, scikit-learn, LangChain.
  • Proficiency in AI evaluation frameworks such as Ragas.
  • Experience analyzing structured and unstructured data, including text, documents, and embeddings.
  • Understanding of LLM behavior, prompt evaluation, retrieval pipelines, or RAG architectures.
  • Exposure to responsible AI concepts and governance-aligned evaluation criteria (e.g., fairness, transparency, reliability).
  • Strong analytical skills with the ability to interpret model weaknesses, extract insights, and recommend actionable improvements.
  • Excellent written and verbal communication skills, with the ability to present evaluation findings clearly to technical and non-technical stakeholders.
  • Experience working in agile or iterative development environments is a plus.
  • Familiarity with OWASP LLM Top 10 Risks. 
  • NIH experience. 
  • Relevant certifications (helpful but not required): 
    • NIST AI RMF (AISIC)
    • INFORMS CAP
    • AWS/Azure/Google ML Certifications. 
  • Local to Washington, DC metro area preferred. 

Steampunk relies on several factors to determine salary, including but not limited to geographic location, contractual requirements, education, knowledge, skills, competencies, and experience. The projected compensation range for this position is $105,000 to $145,000.  The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunk’s total compensation package for employees. Learn more about additional Steampunk benefits here. 

Identity Statement

As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud.

Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors.  Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges.  As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers – and rewarding them for outstanding contributions to our growth. If you want to learn more about our story, visit http://www.steampunk.com.