Job Summary:
Kaleidoscope Innovation is seeking a Lead Engineer responsible for the architecture and development of an enterprise-scale Generative AI platform. The role involves managing and operationalizing Large Language Models (LLMs) in regulated environments, focusing on secure and compliant AI inference for pharmaceutical and life sciences applications.
Responsibilities:
• Architect and implement a GPU-accelerated, cloud-native LLM serving platform using containerized microservices deployed on Kubernetes.
• Design systems that support low-latency, high-throughput inference while maintaining fault tolerance, horizontal scalability, and isolation across dev, test, and production clusters.
• Abstract infrastructure primitives to expose self-service model lifecycle APIs for data scientists and ML engineers.
• Deploy and manage fine-tuned and parameter-efficient LLMs using techniques such as PEFT and LoRA.
• Implement end-to-end model versioning, promotion, rollback, and deprecation workflows.
• Support integration of multiple LLM backends (open-source and commercial) behind standardized inference interfaces.
• Engineer real-time request/response inspection pipelines to analyze user prompts and model outputs for:
• Prompt injection
• Data exfiltration
• Hallucination risk
• Policy and compliance violations
• Implement multi-layer security controls embedded at ingress, orchestration, and model-serving layers.
• Ensure all model interactions are traceable, auditable, and reproducible.
• Build and operationalize retrieval-augmented generation (RAG) pipelines integrating LLMs with enterprise document repositories and vector search backends.
• Standardize prompt engineering frameworks, contextual grounding strategies, and evaluation methodologies.
• Enable enterprise use cases including contextual Q&A, semantic search, summarization, redaction, and knowledge extraction.
• Use workflow orchestration frameworks (e.g., Temporal.io) to manage long-running, stateful AI pipelines, including inference orchestration, evaluation, and post-processing.
• Implement asynchronous, event-driven AI workflows using gRPC-based service communication.
• Standardize infrastructure provisioning using Infrastructure-as-Code (IaC) principles to ensure deterministic, repeatable deployments.
• Automate CI/CD pipelines for model artifacts, prompts, and platform services.
• Enable dynamic resource allocation, GPU scheduling, and zero/low-downtime upgrades.
• Design and implement observability pipelines collecting:
• Model latency and throughput
• Token usage and cost metrics
• Security violations and guardrail triggers
• Drift, degradation, and anomalous behavior
• Establish Service Level Objectives (SLOs) and reliability targets for LLM inference services.
• Enable proactive debugging, capacity planning, and performance optimization.
• Integrate the platform with internal policy enforcement systems, IAM, and role-based access controls (RBAC).
• Ensure generative outputs comply with enterprise governance frameworks, regulatory requirements, and ethical guidelines.
• Maintain detailed audit logs to support compliance and validation in regulated environments.
• Develop reusable platform components enabling collaboration across data science, DevOps, and product teams.
• Provide standardized interfaces and SDKs for downstream applications to consume AI services.
• Serve as a technical bridge between AI research experimentation and enterprise-grade production systems.
Qualifications:
Required:
• Applicants must be authorized to work for ANY employer in the U.S.
• Strong experience designing and operating distributed cloud-native systems.
• Hands-on expertise deploying LLMs in production with performance, scalability, and security constraints.
• Deep understanding of container orchestration (Kubernetes) and GPU-enabled workloads.
• Experience implementing real-time inference services, API gateways.
• Proven ability to design systems meeting compliance, auditability, and governance requirements.
Preferred:
• Experience in pharmaceutical, healthcare, or highly regulated enterprise environments.
• Exposure to AI security, prompt-risk mitigation, and regulated AI deployment.
• Experience translating NLP research and generative modeling techniques into production platforms.
• Strong collaboration skills with data scientists, ML engineers, SREs, and product teams.
Company:
When clients come to us for product design and development, they get a full range of technical expertise and laboratory resources, but they also get a team that’s relentless when it comes to solving problems and creating designs that are the ideal combination of function and form. Founded in 1989, the company is headquartered in Cincinnati, USA, with a team of 201-500 employees. The company is currently Growth Stage.