Applicants must be authorized to work for ANY employer in the U.S. We are unable to sponsor or take over sponsorship of an employment Visa at this time.
Location: Indianapolis, IN (onsite 5 days a week)
Role Summary
This role is responsible for the architecture, development, and productionization of an enterprise-scale Generative AI platform designed to host, manage, and operationalize fine-tuned and open-source Large Language Models (LLMs) in highly regulated environments. The platform enables secure, performant, and compliant AI inference across internal enterprise applications, with an initial focus on pharmaceutical and life sciences use cases.
The engineer will operate at the intersection of distributed systems engineering, applied machine learning infrastructure, AI security, and MLOps, translating experimental NLP and generative AI workflows into robust, observable, and governable production services.
---
Core Responsibilities
LLM Platform Architecture & Systems Engineering
ยท Architect and implement a GPU-accelerated, cloud-native LLM serving platform using containerized microservices deployed on Kubernetes.
ยท Design systems that support low-latency, high-throughput inference while maintaining fault tolerance, horizontal scalability, and isolation across dev, test, and production clusters.
ยท Abstract infrastructure primitives to expose self-service model lifecycle APIs for data scientists and ML engineers.
Model Hosting, Fine-Tuning & Lifecycle Management
ยท Deploy and manage fine-tuned and parameter-efficient LLMs using techniques such as PEFT and LoRA.
ยท Implement end-to-end model versioning, promotion, rollback, and deprecation workflows.
ยท Support integration of multiple LLM backends (open-source and commercial) behind standardized inference interfaces.
AI Safety, Security & Runtime Guardrails
ยท Engineer real-time request/response inspection pipelines to analyze user prompts and model outputs for:
o Prompt injection
o Data exfiltration
o Hallucination risk
o Policy and compliance violations
ยท Implement multi-layer security controls embedded at ingress, orchestration, and model-serving layers.
ยท Ensure all model interactions are traceable, auditable, and reproducible.
Advanced Prompting, RAG & Model Evaluation
ยท Build and operationalize retrieval-augmented generation (RAG) pipelines integrating LLMs with enterprise document repositories and vector search backends.
ยท Standardize prompt engineering frameworks, contextual grounding strategies, and evaluation methodologies.
ยท Enable enterprise use cases including contextual Q&A, semantic search, summarization, redaction, and knowledge extraction.
Distributed Orchestration & Workflow Management
ยท Use workflow orchestration frameworks (e.g., Temporal.io) to manage long-running, stateful AI pipelines, including inference orchestration, evaluation, and post-processing.
ยท Implement asynchronous, event-driven AI workflows using gRPC-based service communication.
Infrastructure Automation & MLOps
ยท Standardize infrastructure provisioning using Infrastructure-as-Code (IaC) principles to ensure deterministic, repeatable deployments.
ยท Automate CI/CD pipelines for model artifacts, prompts, and platform services.
ยท Enable dynamic resource allocation, GPU scheduling, and zero/low-downtime upgrades.
Observability, Monitoring & Reliability Engineering
ยท Design and implement observability pipelines collecting:
o Model latency and throughput
o Token usage and cost metrics
o Security violations and guardrail triggers
o Drift, degradation, and anomalous behavior
ยท Establish Service Level Objectives (SLOs) and reliability targets for LLM inference services.
ยท Enable proactive debugging, capacity planning, and performance optimization.
Enterprise Governance & Access Control
ยท Integrate the platform with internal policy enforcement systems, IAM, and role-based access controls (RBAC).
ยท Ensure generative outputs comply with enterprise governance frameworks, regulatory requirements, and ethical guidelines.
ยท Maintain detailed audit logs to support compliance and validation in regulated environments.
Framework Reusability & Cross-Functional Enablement
ยท Develop reusable platform components enabling collaboration across data science, DevOps, and product teams.
ยท Provide standardized interfaces and SDKs for downstream applications to consume AI services.
ยท Serve as a technical bridge between AI research experimentation and enterprise-grade production systems.
---
Technical Requirements
ยท Strong experience designing and operating distributed cloud-native systems.
ยท Hands-on expertise deploying LLMs in production with performance, scalability, and security constraints.
ยท Deep understanding of container orchestration (Kubernetes) and GPU-enabled workloads.
ยท Experience implementing real-time inference services, API gateways.
ยท Proven ability to design systems meeting compliance, auditability, and governance requirements.
---
Preferred Experience
ยท Experience in pharmaceutical, healthcare, or highly regulated enterprise environments.
ยท Exposure to AI security, prompt-risk mitigation, and regulated AI deployment.
ยท Experience translating NLP research and generative modeling techniques into production platforms.
ยท Strong collaboration skills with data scientists, ML engineers, SREs, and product teams.
---
Technology Stack
ยท Languages: Python, Go
ยท AI / ML: LLMs (AWS Bedrock, Azure OpenAI, Google Vertex ), Prompt Engineering, RAG, PEFT, LoRA
ยท Platform & Infrastructure: Kubernetes, GPU acceleration, Infrastructure-as-Code
ยท Distributed Systems: gRPC, Temporal.io, Argo, Flux
ยท Storage: S3 / S3-compatible object storage
ยท AI Tooling: OpenAI Agents SDK, Langgraph
ยท Observability: Prometheus, Grafana
ยท Libraries: Redis, Langchain, FastAPI
Kaleidoscope, an Infosys Company, is an equal opportunity employer, and all qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, spouse of protected veteran, or disability.