AI Evaluation / Testing (Evaluation Engineer) Fort Worth, TX (Remote) Long Term Contract Job ... Validate prompt engineering changes, model updates, and retrieval strategy modifications through ...
AI Evaluation / Testing (Evaluation Engineer) Fort Worth, TX (Remote) Long Term Contract Job ... Validate prompt engineering changes, model updates, and retrieval strategy modifications through ...
Lead AI Engineer, Business Operations (Hybrid or Remote
Dallas, TX ยท On-site +1
$98K - $129K/yr
Experience with AI development practices - model selection, fine-tuning, prompt engineering ... Remote work environment (US-based). * Travel: Occasional travel (domestic) as needed. Equal ...
Lead AI Engineer, Business Operations (Hybrid or Remote
Dallas, TX ยท On-site +1
$98K - $129K/yr
Experience with AI development practices - model selection, fine-tuning, prompt engineering ... Remote work environment (US-based). * Travel: Occasional travel (domestic) as needed. Equal ...
AI Automation Engineer -Remote
Prairie View, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Prairie View, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
El Paso, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
El Paso, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Commerce, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Commerce, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Arlington, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Arlington, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Beaumont, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Beaumont, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Dallas, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Dallas, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Mckinney, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Mckinney, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
San Antonio, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
San Antonio, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Huntsville, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Huntsville, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
San Marcos, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
San Marcos, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Laredo, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Laredo, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Corpus Christi, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Corpus Christi, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
The Woodlands, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
The Woodlands, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Nacogdoches, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Nacogdoches, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Austin, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Austin, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Houston, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
AI Automation Engineer -Remote
Houston, TX ยท On-site +1
$202K - $234K/yr
Experience creating LLM-backed tools involving prompt engineering and automated evals * 5+ years of ... remote work reimbursement, paid time off, employee assistance programs, and more. Benefits are ...
Staff Machine Learning Engineer - Content and Contributor Intelligence (Remote - United States)
Austin, TX ยท Remote
Summary Yelp engineering culture is driven by our values: we're a cooperative team that values ... We highly value experience of working with LLMs, utilizing LLM APIs (OpenAI, Bedrock, etc), prompt ...
Staff Machine Learning Engineer - Content and Contributor Intelligence (Remote - United States)
Austin, TX ยท Remote
Summary Yelp engineering culture is driven by our values: we're a cooperative team that values ... We highly value experience of working with LLMs, utilizing LLM APIs (OpenAI, Bedrock, etc), prompt ...
Agentic AI Software Developer
Garner, NC ยท On-site +1
Our ideal candidate is a senior engineer with hands-on spec-driven development experience who has ... Prompt Engineering - Develop and refine prompts, constraints, and AI workflows for better outcomes.
Agentic AI Software Developer
Garner, NC ยท On-site +1
Our ideal candidate is a senior engineer with hands-on spec-driven development experience who has ... Prompt Engineering - Develop and refine prompts, constraints, and AI workflows for better outcomes.
Remote Prompt Engineer information
See Texas salary details
$12.09 - $17.84
2% of jobs
$17.84 - $23.58
6% of jobs
$23.58 - $29.32
10% of jobs
$29.32 - $35.06
6% of jobs
$36.02 is the 25th percentile. Wages below this are outliers.
$35.06 - $40.80
3% of jobs
$40.80 - $46.54
6% of jobs
$46.54 - $52.28
9% of jobs
The median wage is $55.94 / hr.
$52.28 - $58.02
12% of jobs
$58.02 - $63.77
12% of jobs
$67.52 is the 75th percentile. Wages above this are outliers.
$63.77 - $69.51
14% of jobs
$69.51 - $75.25
20% of jobs
$12
$52
$75
How much do remote prompt engineer jobs pay per hour?
Is prompt engineer a remote job?
What is a Remote Prompt Engineer job?
A Remote Prompt Engineer designs, refines, and optimizes prompts to improve interactions between users and AI models. They work with natural language processing (NLP) systems to enhance response accuracy and relevance. This role often involves testing different prompts, analyzing AI outputs, and collaborating with developers or researchers to fine-tune language models. Since the position is remote, engineers use online tools and communication platforms to collaborate with teams and stay updated on AI advancements.
What does a typical workday look like for a Remote Prompt Engineer?
As a Remote Prompt Engineer, your workday often involves designing and testing prompts for various AI models, collaborating with product managers and developers, and analyzing model outputs to refine interactions. You may also participate in virtual team meetings to discuss project goals, provide feedback on AI performance, and stay updated on new advancements in prompt engineering. Routine responsibilities include documenting prompt structures, troubleshooting model behavior, and integrating feedback from client or user testing. This dynamic and collaborative environment enables you to contribute creative solutions and drive continuous improvement in AI-driven applications.
What are the key skills and qualifications needed to thrive in the Remote Prompt Engineer position, and why are they important?
To thrive as a Remote Prompt Engineer, you need expertise in natural language processing, prompt design, and a background in computer science or a related field. Familiarity with AI platforms (such as OpenAI or Anthropic APIs), programming languages like Python, and prompt engineering tools is highly valuable. Outstanding communication, collaboration, and problem-solving skills help remote team members excel in optimizing AI performance. These competencies ensure tailored, effective AI solutions and smooth, results-driven teamwork from a remote environment.
What engineer makes $500,000 a year?
How can I make 2000 a week working from home?
How to make $1000 a week remote?
AI Evaluation / Testing (Evaluation Engineer)
Futran Tech Solutions Pvt. Ltd.Fort Worth, TX โข On-site, Remote
Full-time
Posted 7 days ago
Job description
Fort Worth, TX (Remote)
Long Term Contract
Job Summary:
We are seeking a skilled AI Evaluation Engineer to validate AI models and agent workflows built on AWS and Azure as the core AI foundation, with Microsoft Copilot as the primary user experience layer. The role is responsible for ensuring AI systems meet rigorous standards for accuracy, safety, bias, and performance through structured testing, benchmarking, and continuous evaluation pipelines across the full AI lifecycle. The candidate will work closely with AI Architects, AI Engineers, and AI Security Engineers to establish evaluation frameworks that provide confidence in AI outputs before and after production deployment, including Copilot-integrated workflows and RAG-based systems.
Key Responsibilities:
Evaluation Framework Design
Design, build, and maintain end-to-end AI evaluation frameworks covering accuracy, relevance, groundedness, safety, fairness, and performance for LLM-powered systems on AWS and Azure.
Define evaluation strategies tailored to specific AI use cases including RAG pipelines, multi-agent workflows, and Microsoft Copilot-integrated experiences.
Establish standardised scoring rubrics, evaluation metrics, and acceptance thresholds in collaboration with AI Architects and business stakeholders.
Build reusable evaluation datasets, test suites, and golden-set benchmarks representative of real enterprise use cases and edge conditions.
Model & Agent Testing:
Execute structured testing of LLM models, RAG pipelines, and agentic workflows on AWS Bedrock and Azure AI Foundry to validate outputs against defined quality standards.
Test multi-agent orchestration logic including routing, handoff behaviour, context retention, tool use, and escalation pathways under a range of real-world scenarios.
Validate prompt engineering changes, model updates, and retrieval strategy modifications through systematic regression and A/B testing pipelines.
Conduct adversarial testing including prompt injection, jailbreak attempts, and boundary condition probing to assess model robustness and guardrail effectiveness.
Test Microsoft Copilot-integrated workflows and plugins for accuracy, response quality, and alignment with enterprise governance policies.
RAG & Retrieval Evaluation:
Evaluate RAG pipeline quality across the full retrieval chain, including chunking strategies, embedding quality, vector search relevance, re-ranking accuracy, and context utilisation.
Measure retrieval performance using precision, recall, mean reciprocal rank (MRR), and normalised discounted cumulative gain (nDCG) metrics across Azure AI Search and Amazon OpenSearch.
Assess grounding quality and citation accuracy of LLM responses to ensure outputs are faithfully anchored to retrieved enterprise data.
Identify and report retrieval gaps, knowledge staleness, and context window inefficiencies, and work with AI Engineers to drive improvements.
Safety, Bias & Responsible AI Testing:
Design and execute safety evaluation suites to detect harmful, toxic, or policy-violating AI outputs across production and pre-production environments.
Conduct bias and fairness assessments across demographic groups and use case domains to identify discriminatory patterns in AI model outputs.
Validate the effectiveness of guardrails, content filters, and refusal behaviours implemented by AI Security Engineers across AWS Bedrock and Azure AI Foundry safety layers.
Produce responsible AI evaluation reports that evidence compliance with enterprise AI governance standards, regulatory requirements, and Microsoft Responsible AI principles.
Continuous Evaluation Pipelines:
Build and maintain automated, continuous evaluation pipelines integrated into CI/CD workflows to catch quality regressions before deployment to production.
Implement production monitoring to detect model drift, output degradation, hallucination rate increases, and latency regressions in live AI systems on AWS and Azure.
Define alerting thresholds and feedback loops that trigger re-evaluation or rollback when AI system quality falls below agreed acceptance criteria.
Maintain evaluation run history, benchmark versioning, and quality trend dashboards to provide visibility of AI system health over time.
Benchmarking & Performance Testing:
Conduct performance benchmarking of AI models and inference endpoints on AWS and Azure, measuring latency, throughput, token efficiency, and cost-per-query under realistic load conditions.
Compare model versions, retrieval strategies, and prompt configurations against baseline benchmarks to quantify the impact of changes before production promotion.
Benchmark Microsoft Copilot-integrated workflows for end-to-end response time, accuracy, and user experience quality across enterprise use cases.
Produce clear benchmarking reports that support evidence-based decisions on model selection, infrastructure sizing, and optimisation priorities.
Collaboration & Quality Advocacy:
Partner with AI Engineers and AI Architects to integrate evaluation gates into the AI development and deployment lifecycle from design through to production.
Work with AI Security Engineers to align adversarial testing and safety evaluation activities with the broader AI risk and compliance framework.
Communicate evaluation findings clearly to technical and non-technical stakeholders, providing actionable recommendations and quality sign-off for AI releases.
Champion a culture of AI quality and continuous improvement across the delivery team, contributing to shared evaluation standards and best practices.
Required Qualifications:
6-10 years of experience in software testing, data science, or AI/ML engineering with 3+ years focused on evaluation, testing, or quality assurance of LLM-powered or AI systems in production.
Hands-on experience evaluating AI workloads on AWS (Bedrock, SageMaker) and Azure (Azure AI Foundry, Azure ML) including model testing, RAG evaluation, and agent workflow validation.
Experience testing Microsoft Copilot-integrated solutions, Copilot plugins, or Microsoft 365 AI features for quality, accuracy, and governance compliance.
Strong understanding of LLM evaluation metrics including BLEU, ROUGE, BERTScore, faithfulness, relevance, coherence, and task-specific scoring methodologies.
Experience with RAG evaluation frameworks and retrieval metrics (MRR, nDCG, precision, recall) across vector search platforms such as Azure AI Search and Amazon OpenSearch.
Familiarity with responsible AI evaluation principles including bias detection, fairness assessment, safety testing, and regulatory compliance validation.
Experience building automated evaluation and CI/CD pipelines; proficiency in Python and familiarity with evaluation frameworks such as Azure AI Evaluation SDK, Ragas, or DeepEval.
Strong analytical and communication skills with the ability to translate complex evaluation findings into clear quality assessments and release recommendations.