TechVirtue

8 jobs near Columbus, OH

LLM Inference & GPU Systems Consultant

TechVirtue LLC

Charlotte, NC

Other

Posted 3 days ago


Job description

Job Title: LLM Inference & GPU Systems Consultant
Location: Charlotte-NC  Local candidates only
Duration: Long Term

Must have :   RunAI /LLM Inference & GPU / vLLM and TensorRT-LLM.

Required Skills & Experience
Required Qualifications
8+ years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.