Job Summary:
Eli Lilly and Company is a global healthcare leader dedicated to improving lives through innovative medicines. The Generative AI Engineer will design and implement core AI systems that enhance drug discovery processes, focusing on retrieval-augmented generation and automated analysis workflows.
Responsibilities:
• Design, build, and optimize RAG pipelines over internal publications, study reports, electronic lab notebooks, and other scientific documents
• Build hybrid retrieval systems combining vector search with structured metadata, knowledge graphs, and ontology-aware filtering
• Build and optimize text-to-SQL systems over Lilly’s databases, enabling scientists to query gene expression, proteomics, pathway, and variant data through natural language
• Develop schema documentation, semantic annotations, and gold-standard question/SQL pairs that bridge how scientists think about data and how it is stored
• Implement multi-step reasoning approaches (chain-of-thought, self-correction, Reflexion loops) to improve accuracy on complex scientific queries
• Design agentic AI workflows that chain database queries, bioinformatics tools, literature search, and visualization into automated multi-step scientific analyses
• Evaluate and integrate emerging orchestration frameworks (LangGraph, CrewAI, custom architectures) for scientific use cases
• Build evaluation frameworks measuring accuracy, reliability, and scientific validity of AI outputs
Qualifications:
Required:
• PhD in Computer Science, Data Science, or a related technical field with 0-3+ years of experience; or equivalent experience building production LLM systems; MS in Computer Science, Data Science, or a related technical field with 5+ years of experience; or equivalent experience building production LLM systems
• Experience building LLM-powered applications, including at least two of: RAG systems, text-to-SQL, agentic workflows, or fine-tuning pipelines
• Strong software engineering skills in Python with experience building production-grade systems
• Deep familiarity with the modern LLM ecosystem: embedding models, vector databases, and orchestration frameworks
• Experience designing evaluation frameworks for LLM systems — systematic approaches to measuring accuracy, detecting hallucinations, and tracking regressions
• Comfort working with complex, heterogeneous data — databases with hundreds of tables, specialized schemas, or domain-specific vocabularies
• Familiarity with cloud computing environments (AWS preferred), containerization (Docker), and CI/CD practices
• Experience in pharmaceutical, biotech, or life sciences environments
• Familiarity with biomedical data types (omics, clinical, molecular) or scientific databases
• Experience with MLOps/LLMOps tooling: experiment tracking, model registries, prompt versioning, A/B testing for AI systems
• Knowledge of biomedical ontologies (Gene Ontology, MeSH, ChEBI) or experience integrating domain-specific knowledge into LLM systems
• Experience building for regulated environments where auditability, reproducibility, and explainability are requirements
Company:
We're a medicine company turning science into healing to make life better for people around the world. Founded in 1876, the company is headquartered in Indianapolis, USA, with a team of 10001+ employees. The company is currently Late Stage.