1

Python Llm Jobs in Austin, TX (NOW HIRING)

Strong Python skills and familiarity with ML/LLM frameworks (PyTorch, Transformers, LangChain, LlamaIndex, etc.) * Comfort with embeddings, vector search, retrieval pipelines, and prompt tuning

... Python and SQL - Experience with Docker and containerized deployments - Skilled in AI techniques ... LLM optimization - Implementing data integration solutions using AWS, Azure, GCP - Utilizing AWS ...

Design and implement LLM-based applications using Python * Define system architecture including data pipelines, APIs and model integrations * Work with financial datasets (structured and unstructured)

next page

Showing results 1-20

Python Llm information

See Austin, TX salary details

$13

$58

$85

How much do python llm jobs pay per hour?

As of Jun 18, 2026, the average hourly pay for python llm in Austin, TX is $58.11, according to ZipRecruiter salary data. Most workers in this role earn between $47.88 and $66.01 per hour, depending on experience, location, and employer.

What is a Python LLM job?

A Python LLM job involves working with Large Language Models (LLMs) using Python to develop, fine-tune, and deploy AI models. Responsibilities may include data preprocessing, prompt engineering, model optimization, and integration with applications. Professionals in this role often work with frameworks like TensorFlow, PyTorch, or Hugging Face Transformers. They may also contribute to improving model efficiency, reducing bias, and ensuring ethical AI usage.

What are the key skills and qualifications needed to thrive in the Python Llm position, and why are they important?

To excel as a Python LLM (Large Language Model) Engineer, you need strong skills in Python programming, machine learning, and natural language processing, typically supported by a degree in computer science or a related field. Proficiency with libraries such as TensorFlow, PyTorch, Hugging Face Transformers, and experience with model deployment platforms are often essential, alongside certifications in AI or data science. Effective communication, problem-solving abilities, and collaboration are important soft skills for working in interdisciplinary teams and delivering results in dynamic environments. These skills ensure the development, fine-tuning, and deployment of advanced language models that meet both technical and business objectives.

What are some common challenges faced by Python LLM Engineers in their daily work?

Python LLM Engineers often encounter challenges related to optimizing model performance, managing large datasets, and adapting models to specific business needs. Working with large-scale language models requires balancing computational resource limitations with the need for high accuracy and efficiency. Collaboration with data scientists, product managers, and DevOps engineers is routine to ensure seamless model integration and deployment. Staying updated on the latest advancements in NLP and continuously improving models based on user feedback are also important aspects of the role.

What are the most commonly searched types of Python Llm jobs in Austin, TX? The most popular types of Python Llm jobs in Austin, TX are:
What are popular job titles related to Python Llm jobs in Austin, TX? For Python Llm jobs in Austin, TX, the most frequently searched job titles are:
What job categories do people searching Python Llm jobs in Austin, TX look for? The top searched job categories for Python Llm jobs in Austin, TX are:
What cities near Austin, TX are hiring for Python Llm jobs? Cities near Austin, TX with the most Python Llm job openings:

LLM / RAG Evaluation Engineer

Prophecy Technologies

Austin, TX โ€ข On-site

Full-time

Posted 15 days ago


Job description

Job Summary
We are seeking an experienced LLM / RAG Evaluation Engineer to design, implement, and scale evaluation frameworks for Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, and agentic AI workflows. This role focuses on assessing quality, reliability, safety, robustness, and performance of production-grade Generative AI systems used in real-world applications.
Key Responsibilities
  • Design and execute LLM response evaluation pipelines, including automated and human-in-the-loop approaches
  • Evaluate RAG systems for retrieval accuracy, grounding, relevance, and hallucination detection
  • Build and apply evaluation metrics for agentic AI systems, including:
  • Multi-step reasoning
  • Tool usage
  • Planning and memory workflows
  • Develop Python-based evaluation frameworks, benchmarks, and testing utilities
  • Analyze model outputs, identify failure modes, and provide actionable insights to improve system performance
  • Define and track KPIs for Generative AI systems, covering quality, safety, robustness, and trustworthiness
  • Collaborate with ML engineers, researchers, and product teams to improve GenAI architectures
  • Validate and compare prompt strategies, retrieval strategies, and system designs
  • Clearly document evaluation methodologies, results, and recommendations for stakeholders

Required Skills & Experience
  • Strong proficiency in Python
  • Proven experience in LLM response evaluation (quality, coherence, accuracy, bias, hallucinations)
  • Hands-on experience with RAG systems and retrieval-based architectures
  • Understanding of agentic AI systems and multi-step reasoning workflows
  • Experience evaluating Generative AI systems in real or near-production environments
  • Knowledge of NLP fundamentals and LLM behavior
  • Experience with prompt engineering, prompt testing, and prompt evaluation

Preferred Skills
  • Experience with LLM orchestration frameworks (LangChain, LlamaIndex, etc.)
  • Familiarity with automated evaluation tools, benchmarks, and scoring frameworks
  • Experience designing or managing human evaluation workflows
  • Understanding of AI safety, reliability, bias, and trustworthiness principles
  • Prior experience evaluating production-grade GenAI systems

Nice to Have
  • Experience with vector databases and retrieval pipelines
  • Exposure to cloud-based AI platforms
  • Research or experimentation background in LLM evaluation and benchmarking