Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte-NC Local candidates only Duration: Long Term Must have : RunAI /LLM Inference & GPU / vLLM and TensorRT-LLM. Required Skills ...
TechVirtue
8 jobs near Columbus, OH
Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte-NC Local candidates only Duration: Long Term Must have : RunAI /LLM Inference & GPU / vLLM and TensorRT-LLM. Required Skills ...
Role:- SAP S4 Testing Lead - Lead SIT and UAT cycles Location:- Fremont, CA(locals) Mode of Hire:- FTE/Subcon Responsibilities: Lead SIT and UAT cycles Coordinate testing activities Manage defects ...
Role:- SAP S4 Testing Lead - Lead SIT and UAT cycles Location:- Fremont, CA(locals) Mode of Hire:- FTE/Subcon Responsibilities: Lead SIT and UAT cycles Coordinate testing activities Manage defects ...
.Net Fullstack developer (MDM) Location: San Diego CA (Day 1 onsite) Job Overview: Need a .Net Fullstack Developer with experience in MDM custom application development using AWS Cloud Serverless ...
New
.Net Fullstack developer (MDM) Location: San Diego CA (Day 1 onsite) Job Overview: Need a .Net Fullstack Developer with experience in MDM custom application development using AWS Cloud Serverless ...
New
Job Title: Senior SQL/BI/Database Developer Location: Washington, DC Duration: Long Term Contract Face to Face client interview in washington, DC Required Qualifications: Developing custom SQL Server ...
Job Title: Senior SQL/BI/Database Developer Location: Washington, DC Duration: Long Term Contract Face to Face client interview in washington, DC Required Qualifications: Developing custom SQL Server ...
Title: Salesforce Architect (Sales & Service Cloud) Location: Denver, CO (Onsite - Day 1) Duration: Long term Job Summary We are looking for an experienced Salesforce Architect with deep expertise in ...
Title: Salesforce Architect (Sales & Service Cloud) Location: Denver, CO (Onsite - Day 1) Duration: Long term Job Summary We are looking for an experienced Salesforce Architect with deep expertise in ...
Sr SQL Server Developer/ Sr Database Application Developer - Contract - Washington - DC - Hybrid
Washington, DC · Hybrid
Role: Senior DB Application Developer/SQL Server Developer - Contract Location: Washington, DC (Hybrid - 3 days onsite) Eligible Candidates : Washington, DC | Maryland | Virginia Engagement: Contract ...
Sr SQL Server Developer/ Sr Database Application Developer - Contract - Washington - DC - Hybrid
Washington, DC · Hybrid
Role: Senior DB Application Developer/SQL Server Developer - Contract Location: Washington, DC (Hybrid - 3 days onsite) Eligible Candidates : Washington, DC | Maryland | Virginia Engagement: Contract ...
Job Title: Senior SharePoint and PowerApps Developer Location: Atlanta, GA OR Texas - Onsite work in EST hrs Duration: longterm Design, develop, and deploy solutions in SharePoint and Power Apps ...
Job Title: Senior SharePoint and PowerApps Developer Location: Atlanta, GA OR Texas - Onsite work in EST hrs Duration: longterm Design, develop, and deploy solutions in SharePoint and Power Apps ...
MuleSoft Support Lead / Architect - Wilmington DE - Onsite - W2 -
Wilmington, DE · Hybrid
$53.50 - $73.50/hr
Role: MuleSoft Support Lead / Architect Wilmington DE (Onsite mandate) Fulltime only Please find the JD for the onsite role. Solution Design & Architecture: Create end to end integration architecture ...
MuleSoft Support Lead / Architect - Wilmington DE - Onsite - W2 -
Wilmington, DE · Hybrid
$53.50 - $73.50/hr
Role: MuleSoft Support Lead / Architect Wilmington DE (Onsite mandate) Fulltime only Please find the JD for the onsite role. Solution Design & Architecture: Create end to end integration architecture ...
Other
Posted 3 days ago
Job description
Job Title: LLM Inference & GPU Systems Consultant
Location: Charlotte-NC Local candidates only
Duration: Long Term
Must have : RunAI /LLM Inference & GPU / vLLM and TensorRT-LLM.
Required Skills & Experience
Required Qualifications
8+ years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.