AI Evaluation Scientist
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
The Legal Engineer - AI and Automation will lead the design, implementation, and governance of AI ... Evaluate tools not just on features but on governance capability, hallucination rates, integration ...
The Legal Engineer - AI and Automation will lead the design, implementation, and governance of AI ... Evaluate tools not just on features but on governance capability, hallucination rates, integration ...
Washington, DC · On-site
$140K - $210K/yr
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Washington, DC · On-site
$140K - $210K/yr
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Quick apply
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Quick apply
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Founded in 2015, Shield AI is a venture-backed defense-tech company with the mission of protecting ... Improved deal velocity and win rates through disciplined qualification and positioning
Mclean, VA · On-site
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
Mclean, VA · On-site
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
Washington, DC · On-site +1
$122K/yr
Meta is looking for an AI Policy Manager to join our AI Policy team. In this role, you will work ... Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual ...
Washington, DC · On-site +1
$122K/yr
Meta is looking for an AI Policy Manager to join our AI Policy team. In this role, you will work ... Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual ...
Arlington, VA · On-site
AI Governance SME Job Category: Information Technology Time Type: Full time Minimum Clearance ... rates, relevant prior work experience, specific skills and competencies, education, and ...
Arlington, VA · On-site
AI Governance SME Job Category: Information Technology Time Type: Full time Minimum Clearance ... rates, relevant prior work experience, specific skills and competencies, education, and ...
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
$105K - $145K/yr
Implement evaluation frameworks for AI models, including accuracy, robustness, relevance, bias, hallucination rate, and safety metrics. * Build and maintain automated evaluation scripts, tests, and ...
Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...
Quick apply
Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...
Arlington, VA · On-site
$123K - $162K/yr
Overview/ Job Responsibilities We are seeking an accomplished AI Requirements Engineer to lead the ... Proven ability to deliver measurable improvements in user experience, adoption rates, and process ...
Arlington, VA · On-site
$123K - $162K/yr
Overview/ Job Responsibilities We are seeking an accomplished AI Requirements Engineer to lead the ... Proven ability to deliver measurable improvements in user experience, adoption rates, and process ...
Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...
Quick apply
Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...
Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...
Quick apply
Use data to optimize messaging, channels, and conversion rates * Continuously test, iterate, and improve Required Qualifications * 5+ years of marketing experience in AI, Cybersecurity, Cloud, or ...
Washington, DC · Remote
Implement LLMOps to monitor model performance, detect hallucination rates, and manage model versioning and drift. 4. Public Sector Advisory & Governance * Collaborate with the customer's AI Center of ...
Washington, DC · Remote
Implement LLMOps to monitor model performance, detect hallucination rates, and manage model versioning and drift. 4. Public Sector Advisory & Governance * Collaborate with the customer's AI Center of ...
We are seeking an accomplished AI Requirements Engineer to lead the end-to-end design, development ... Proven ability to deliver measurable improvements in user experience, adoption rates, and process ...
We are seeking an accomplished AI Requirements Engineer to lead the end-to-end design, development ... Proven ability to deliver measurable improvements in user experience, adoption rates, and process ...
... rates, analyst productivity gains, cycle time reductions, and product quality improvements. • Provide leadership with data-driven evidence supporting review board decisions to expand AI tool access ...
New
... rates, analyst productivity gains, cycle time reductions, and product quality improvements. • Provide leadership with data-driven evidence supporting review board decisions to expand AI tool access ...
New
Washington, DC · On-site
$129K - $177K/yr
... rate limiting, abuse detection). Enable detections and monitoring for AI-specific attack patterns using logs, telemetry, and model signals. Work with platform teams to secure the integration and ...
Washington, DC · On-site
$129K - $177K/yr
... rate limiting, abuse detection). Enable detections and monitoring for AI-specific attack patterns using logs, telemetry, and model signals. Work with platform teams to secure the integration and ...
Washington, DC · On-site +1
$133K - $210K/yr
AI Safety Index - an objective rating of AI companies on key safety and security domains, as judged by experts in the field. > FLI is a largely virtual organization, with a team of >30 distributed ...
Washington, DC · On-site +1
$133K - $210K/yr
AI Safety Index - an objective rating of AI companies on key safety and security domains, as judged by experts in the field. > FLI is a largely virtual organization, with a team of >30 distributed ...
Washington, DC · On-site
$129K - $177K/yr
... rate limiting, abuse detection). Enable detections and monitoring for AI-specific attack patterns using logs, telemetry, and model signals. Work with platform teams to secure the integration and ...
Washington, DC · On-site
$129K - $177K/yr
... rate limiting, abuse detection). Enable detections and monitoring for AI-specific attack patterns using logs, telemetry, and model signals. Work with platform teams to secure the integration and ...
An AI Rater evaluates and provides feedback on artificial intelligence models, typically improving search engines, chatbots, or recommendation systems. They assess the relevance, accuracy, and quality of AI-generated content based on specific guidelines. This role requires strong analytical skills, attention to detail, and familiarity with the subject matter being reviewed. AI Raters often work remotely and on a flexible schedule.
To thrive as an AI Rater, you generally need strong attention to detail, analytical thinking, and proficiency in English, often supported by formal education such as a high school diploma or higher. Familiarity with web browsers, online research, and company-specific rating platforms or guidelines is essential. Excellent time management, adaptability, and effective written communication help individuals excel in this position. These skills and qualities ensure accurate and consistent evaluations of AI-generated content, directly impacting the improvement of artificial intelligence systems.
A typical day for an AI Rater involves reviewing and evaluating various types of content, such as search engine results, social media posts, advertisements, or chatbot responses, to ensure they meet quality and relevancy standards. You may follow detailed guidelines to rate or annotate content, complete assigned tasks in a web-based platform, and provide feedback to help improve AI performance. Most positions are remote and offer flexible schedules, allowing you to plan your workload around personal commitments. Collaboration is generally limited, as most work is performed independently, but periodic communication with team leads for training or updates is common.

$105K - $145K/yr
Full-time
Posted 24 days ago
We are looking for an AI Evaluation Scientist to design and execute evaluation processes that ensure our predictive and generative AI systems are accurate, reliable, safe, and aligned with mission requirements. This role is essential for establishing trust in AI solutions and supporting continuous improvement across the AI lifecycle. The AI Evaluation Scientist will work closely with engineers, data scientists, governance analysts, and product teams to develop evaluation metrics, build test harnesses, analyze model behavior, and support responsible deployment.
Steampunk relies on several factors to determine salary, including but not limited to geographic location, contractual requirements, education, knowledge, skills, competencies, and experience. The projected compensation range for this position is $105,000 to $145,000. The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunk’s total compensation package for employees. Learn more about additional Steampunk benefits here.
Identity Statement
As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud.
Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors. Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges. As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers – and rewarding them for outstanding contributions to our growth. If you want to learn more about our story, visit http://www.steampunk.com.