Ml Inference Jobs in Raleigh, NC (NOW HIRING)

Principal Python Backend Engineer

Deep knowledge of AI/ML is not a prerequisite; the domain knowledge and context can be learned on ... Experience building event-driven systems and/or real-time pipelines for ingestion and inference.

Fidelity Investments

Principal Python Backend Engineer

Durham, NC · On-site

Fidelity Investments

Principal Python Backend Engineer

Durham, NC · On-site

Fidelity Investments

Principal Python Backend Engineer

Durham, NC · On-site

Nvidia

Senior Solutions Architect - AI Factory Deployment

Durham, NC

Solid grasp of collective communication patterns, particularly AllReduce and AllToAll, and how they are applied in contemporary ML/LLM training. * Familiarity with LLM training and/or inference ...

Nvidia

Senior Solutions Architect - AI Factory Deployment

Durham, NC

Instacart

Machine Learning Engineer (PhD Intern)

Raleigh, NC · On-site

Some of the core areas of focus for our team include pricing, online advertising, uplift and long term value modeling, and general causal inference. Search & Discovery ML : The Search and Discovery ...

Instacart

Machine Learning Engineer (PhD Intern)

Raleigh, NC · On-site

Instacart

Machine Learning Engineer (PhD Intern)

Durham, NC · On-site

Instacart

Machine Learning Engineer (PhD Intern)

Durham, NC · On-site

UnitedHealth Group

Senior Software Engineer

Raleigh, NC · Remote

$119K - $157K/yr

Design, develop, and deliver production-grade AI/ML and Generative AI solutions within the OQGA ... inference optimization techniques * Strong foundation in data science concepts, including ...

UnitedHealth Group

Senior Software Engineer

Raleigh, NC · Remote

$119K - $157K/yr

Design, develop, and deliver production-grade AI/ML and Generative AI solutions within the OQGA ... inference optimization techniques * Strong foundation in data science concepts, including ...

Fidelity Investments

Vice President, GenAI Technology

Durham, NC · On-site

$140K/yr

... inference tools * Experience taking an application from research to production and realizing ... Experience guiding business on identifying AI/ML use cases and optimally contributing to ...

Fidelity Investments

Vice President, GenAI Technology

Durham, NC · On-site

$140K/yr

... inference tools * Experience taking an application from research to production and realizing ... Experience guiding business on identifying AI/ML use cases and optimally contributing to ...

Fidelity Investments

Director, GenAI Technology

Durham, NC

$126K/yr

... inference tools Experience taking an application from research to production and realizing ... AI/ML use cases and optimally contributing to brainstorming sessions Desires to create a climate ...

Fidelity Investments

Director, GenAI Technology

Durham, NC

$126K/yr

Nvidia

Senior Manager, Software Engineering - Robotics Manipulation

Durham, NC

... and modern ML infrastructure, and how to architect robotics software to take advantage of it (CUDA, PyTorch, GPU-accelerated simulation, edge inference). * Partner and developer interface.

Nvidia

Senior Manager, Software Engineering - Robotics Manipulation

Durham, NC

... and modern ML infrastructure, and how to architect robotics software to take advantage of it (CUDA, PyTorch, GPU-accelerated simulation, edge inference). * Partner and developer interface.

Western Governors University

Staff AI Engineer

Raleigh, NC · On-site +1

$161K - $249K/yr

... inference infrastructure * Experience in EdTech, personalized learning, or student-facing AI platforms * Published research, conference presentations, or open-source contributions in AI/ML

New

Western Governors University

Staff AI Engineer

Raleigh, NC · On-site +1

$161K - $249K/yr

... inference infrastructure * Experience in EdTech, personalized learning, or student-facing AI platforms * Published research, conference presentations, or open-source contributions in AI/ML

New

Qualcomm

Staff Software Engineer -- CI, Build & Release

Raleigh, NC · On-site

Preferred : • Experience with ML framework build systems -- ONNX Runtime, ExecuTorch, or TFLite ... of DL inference pipelines and on-device performance profiling. • Ability to collaborate ...

Qualcomm

Staff Software Engineer -- CI, Build & Release

Raleigh, NC · On-site

Preferred : • Experience with ML framework build systems -- ONNX Runtime, ExecuTorch, or TFLite ... of DL inference pipelines and on-device performance profiling. • Ability to collaborate ...

SAS

DevOps Engineer

Cary, NC · On-site

$49.25 - $67.50/hr

Support deployment of training and inference pipelines in Cloud environments for Secure, compliance ... Knowledge of Docker or containerized ML workflows * Knowledge CUDA enabled systems and GPU ...

SAS

DevOps Engineer

Cary, NC · On-site

$49.25 - $67.50/hr

SAS

DevOps Engineer

Cary, NC

$49.25 - $67.50/hr

Support deployment of training and inference pipelines in Cloud environments for Secure ... Knowledge of Docker or containerized ML workflows * Knowledge CUDA enabled systems and GPU ...

SAS

DevOps Engineer

Cary, NC

$49.25 - $67.50/hr

Support deployment of training and inference pipelines in Cloud environments for Secure ... Knowledge of Docker or containerized ML workflows * Knowledge CUDA enabled systems and GPU ...

Western Governors University

Decision Scientist

Raleigh, NC · On-site

$118K - $178K/yr

... ML Ops, and technology architects to build decision products that support personalized student ... Apply advanced analytics, experimentation, and causal inference techniques to identify ...

Western Governors University

Decision Scientist

Raleigh, NC · On-site

$118K - $178K/yr

... ML Ops, and technology architects to build decision products that support personalized student ... Apply advanced analytics, experimentation, and causal inference techniques to identify ...

Western Governors University

Decision Scientist

Raleigh, NC · On-site +1

$118K - $178K/yr

... ML Ops, and technology architects to build decision products that support personalized student ... Apply advanced analytics, experimentation, and causal inference techniques to identify ...

Western Governors University

Decision Scientist

Raleigh, NC · On-site +1

$118K - $178K/yr

... ML Ops, and technology architects to build decision products that support personalized student ... Apply advanced analytics, experimentation, and causal inference techniques to identify ...

Showing results 1-20

Ml Inference Jobs in Raleigh, NC

Ml Inference information

See Raleigh, NC salary details

$36.5K

$119.3K

$191K

How much do ml inference jobs pay per year?

As of Jul 17, 2026, the average yearly pay for ml inference in Raleigh, NC is $119,312.00, according to ZipRecruiter salary data. Most workers in this role earn between $95,800.00 and $132,200.00 per year, depending on experience, location, and employer.

What is a $900000 AI job?

A $900,000 AI job typically refers to high-level roles in artificial intelligence, such as senior machine learning engineers or AI research directors, often involving advanced skills in deep learning, data modeling, and programming with tools like Python and TensorFlow. These positions usually require extensive experience, specialized knowledge, and may include leadership responsibilities or strategic decision-making.

What is ML inference?

ML inference refers to the process of using a trained machine learning model to make predictions or decisions based on new data. After a model has been trained on historical data, inference is the phase where that model is deployed and used in real-world applications, such as recognizing speech, detecting objects in images, or recommending products. The focus in ML inference is on speed, efficiency, and scalability to ensure quick predictions, often in real time. This process is critical for practical applications like mobile apps, web services, and embedded systems. Optimizing inference involves reducing latency, memory usage, and computational requirements.

What is the difference between Ml Inference vs Data Scientist?

Aspect	ML Inference	Data Scientist
Required Credentials	Knowledge of machine learning models, programming skills	Degree in data science, statistics, or related fields
Work Environment	Deploying models in production, real-time data processing	Data analysis, model development, research
Industry Usage	AI product deployment, software companies	Research institutions, tech firms, consulting

ML Inference focuses on deploying trained models to make predictions on new data, often in real-time. Data Scientists develop and analyze models, working primarily in research and development. While both roles require understanding of machine learning, ML Inference emphasizes deployment and operationalization, whereas Data Scientists focus on model creation and analysis.

What engineer makes $500,000 a year?

Senior machine learning engineers with extensive experience, advanced skills in deep learning, and expertise in deploying large-scale models can earn salaries approaching or exceeding $500,000 annually, especially in high-cost-of-living areas or top tech companies. Compensation often includes base salary, bonuses, and stock options, reflecting their specialized knowledge and impact on product development.

Which 3 jobs will survive AI?

Jobs involving Ml Inference, such as data scientists, machine learning engineers, and AI system architects, are likely to persist as they require specialized expertise in developing, deploying, and maintaining AI models. These roles demand critical thinking, domain knowledge, and skills in programming and data analysis that are less easily automated. Continuous learning and staying updated with AI tools and frameworks are essential for these professions to remain relevant.

What are some common challenges faced by ML Inference Engineers when deploying models to production?

ML Inference Engineers often encounter challenges such as optimizing model latency and throughput to meet production requirements, ensuring compatibility with diverse hardware environments, and managing model versioning and updates without disrupting service. Additionally, balancing resource utilization and inference accuracy while monitoring real-time performance metrics is crucial. Collaboration with data scientists, DevOps, and software engineers is typically essential to streamline deployment and maintain robust, scalable inference pipelines.

Will MLE be replaced by AI?

Machine Learning Engineers (MLEs) design, develop, and optimize AI models and systems. While AI automation tools can assist with certain tasks, MLEs are essential for building, tuning, and maintaining complex models, making complete replacement unlikely in the near term. Their expertise in data handling, model deployment, and system integration remains critical in AI development environments.

What are the key skills and qualifications needed to thrive in ML Inference, and why are they important?

To thrive in ML Inference, you need a solid background in machine learning principles, programming (Python or C++), and experience with deploying models at scale, often supported by a degree in computer science or a related field. Familiarity with frameworks and tools such as TensorFlow, PyTorch, ONNX, and cloud platforms like AWS SageMaker or Google AI Platform is typically required. Strong problem-solving skills, attention to detail, and effective communication are crucial soft skills for collaborating with multidisciplinary teams and optimizing model performance. These skills ensure efficient, scalable, and reliable deployment of machine learning solutions in real-world applications.

What are popular job titles related to Ml Inference jobs in Raleigh, NC? For Ml Inference jobs in Raleigh, NC, the most frequently searched job titles are:

What job categories do people searching Ml Inference jobs in Raleigh, NC look for? The top searched job categories for Ml Inference jobs in Raleigh, NC are:

What cities near Raleigh, NC are hiring for Ml Inference jobs? Cities near Raleigh, NC with the most Ml Inference job openings:

Ml Inference jobs near you

Principal Python Backend Engineer

Fidelity Investments

Durham, NC • On-site

Apply

Full-time

Medical, Retirement, PTO

Posted 16 days ago

Fidelity Investments rating

8.7

Based on 266 frontline employees who took The Breakroom Quiz

17th of 148 rated financial services

Job description

Note: Fidelity is not providing immigration sponsorship for this position.
Principal Python Backend Engineer
Bring a builder's mindset to Fidelity's Enterprise AI/ML Platform and help us scale the next generation of high-performance, production-grade backend systems. You will work on the core platform that connects tools, agents, data, and models-designing clean service abstractions, building resilient processing pipelines, shipping developer-friendly APIs and SDKs, and turning rapid prototypes into well-engineered, maintainable Python systems.
The Team
We hire exceptional, driven Python engineers first-people who take pride in clean code, fast learning, and high ownership. Deep knowledge of AI/ML is not a prerequisite; the domain knowledge and context can be learned on the job. What cannot be taught is the engineering rigour, the drive, and the instinct for simplicity that we look for.
What you'll do

Build the core AI/ML services running in Kubernetes and locally in 'Playground' mode
Design clean abstractions over vector databases and multistep Search/Information Retrieval pipelines
Own automated real-time data ingestion for RAG: connectors, streaming pipelines, chunking/embedding strategies, parallel processing, retrieval metrics, resilience & restartability while guaranteeing ACID integrity of processed data and elimination of redundant document processing.
Ship developer-friendly APIs/SDKs, CLIs, and templates that make it trivial to develop agents, tools, and information retrieval pipelines at enterprise scale.
Instrument everything: distributed tracing for services & agentic/tool sessions, retrieval quality metrics, performance metrics, resource usage and failure forensics.
Turn rapid prototypes into resilient systems-pragmatic designs that are simple to use, which scale in hardware efficient manner, and above all as simple as possible.
Read and distill open-source frameworks, keep what's valuable, replace the bloated with lean, well engineered Python modules.

Team Culture:

Lead through code, productivity and knowledge sharing.
Ask sharp questions, challenge complexity, and encourage others to do the same.
We embrace a flat hierarchy where the best ideas win, regardless of seniority.

What you bring
Engineering perspective:
7+ years of professional software engineering experience, with the majority spent building and operating production-grade Python backend systems.
This is not an entry-level role and we expect a track record of owning complex systems end to end.

Strong Python service engineering: sound OOP, clear interfaces, thorough tests, and an obsession with readability and maintainability.
Real-world performance tuning across services and data stores: concurrency, async I/O, queues, caching, SQL/NoSQL indexing, pagination, and backpressure.
Experience building event-driven systems and/or real-time pipelines for ingestion and inference.
Mastery of debugging complex, distributed behavior-reproducible experiments, simulations, and evidence-driven conclusions.
Comfort reading open-source code and producing simplified alternatives to minimize code legacy and cognitive load.
Effective use of developer-assist tools to amplify output while keeping quality high and code bloat at minimum.
Produce services metrics that help us understand parallelism services can support in stable fashion, ensuring efficient hardware utilization. Propose scaling approaches based on application hardware utilization footprint & metrics.
Familiarity with key Data Science, Machine Learning, or AI libraries is a bonus, but not mandatory, as long as the candidate can demonstrate the ability to quickly learn new concepts and paradigms.

Product and Ownership perspective:

Fast learning across new domains, with a knack for spotting and reducing unnecessary complexity. Team works on new products, understanding and implementing latest tech is paramount, learning fast.
Product sensibility: start from a blank slate, ask the right questions, and design primitives that feel "Apple-like" in usability.
Produce functional picture & design of a product, based on that write requirements, epics and come up with stories which cover entire scope. Aim is that most of the risks are identified at start, not during implementation.
Collaborative communication, healthy debate, and leading by example. Comfortable switching hats to do what the projects need.
Ability to work in a highly dynamic environment
Attention to detail, being thorough in tests and questioning assumptions, being productive without a need for supervision. Our team takes pride in quality of our products.

Relevant & Nice to have skills:

DevOps practices (CI/CD, Docker, Kubernetes) and infrastructure as code.
AWS skills: EC2, S3, RDS, Lambda, IAM etc.
Understanding Data, performant ETL, Analytics.

Why this role

Shape the core platform for AI/ML services and retrieval used across Fidelity.
High ownership, fast iteration, and the chance to influence design.
Work on hard, high-impact problems with a team that values simplicity, experimentation, pace and teaching.

Is this role for you?

If reading this job spec brought a smile to your face or sparked a jolt of enthusiasm, you should apply (assuming adequate skill/capability levels).
To enjoy and thrive in our team, you should be a highly motivated & productive individual who genuinely loves what they do.

Fidelity's Onsite Working Model
Fidelity is transitioning to a full-time onsite working model through a phased rollout across regions and roles. Currently, some roles and locations require 100% onsite presence, while others require less. Onsite expectations are likely to evolve as the rollout continues. This transition does not apply to fully remote roles.
The base salary range for this position is $107,000-216,000 USD per year.
Placement in the range will vary based on job responsibilities and scope, geographic location, candidate's relevant experience, and other factors.
Base salary is only part of the total compensation package. Depending on the position and eligibility requirements, the offer package may also include bonus or other variable compensation.
We offer a wide range of benefits to meet your evolving needs and help you live your best life at work and at home. These benefits include comprehensive health care coverage and emotional well-being support, market-leading retirement, generous paid time off and parental leave, charitable giving employee match program, and educational assistance including student loan repayment, tuition reimbursement, and learning resources to develop your career. Note, the application window closes when the position is filled or unposted.
Please be advised that Fidelity's business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.
Certifications:
Category:
Information Technology

What Fidelity Investments employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom

About Fidelity

Sourced by ZipRecruiter

Company size

10,000+ Employees

Headquarters location

Boston, MA, US

Year founded

1946

Website

fidelity.com

Social media

View All Fidelity Jobs

Apply

Ml Inference Jobs in Raleigh, NC (NOW HIRING)

Principal Python Backend Engineer

Principal Python Backend Engineer

Principal Python Backend Engineer

Principal Python Backend Engineer

Senior Solutions Architect - AI Factory Deployment

Senior Solutions Architect - AI Factory Deployment

Machine Learning Engineer (PhD Intern)

Machine Learning Engineer (PhD Intern)

Machine Learning Engineer (PhD Intern)

Machine Learning Engineer (PhD Intern)

Senior Software Engineer

Senior Software Engineer

Vice President, GenAI Technology

Vice President, GenAI Technology

Director, GenAI Technology

Director, GenAI Technology

Senior Manager, Software Engineering - Robotics Manipulation

Senior Manager, Software Engineering - Robotics Manipulation

Staff AI Engineer

Staff AI Engineer

Staff Software Engineer -- CI, Build & Release

Staff Software Engineer -- CI, Build & Release

DevOps Engineer

DevOps Engineer

DevOps Engineer

DevOps Engineer

Decision Scientist

Decision Scientist

Decision Scientist

Decision Scientist

Ml Inference information

See Raleigh, NC salary details

How much do ml inference jobs pay per year?

Principal Python Backend Engineer

Share this job

Fidelity Investments rating

Get the real story on frontline employers

Job description

What Fidelity Investments employees say

Get the real story on frontline employers

Pay

Most people get paid breaks

Most people get paid when they’re sick

The job rarely spills into unpaid time

Benefits

Sick days use up paid time off

Most people say they can afford the health insurance

Most people get paid time off

Hours and flexibility

Less than 4 weeks notice of work schedule

Most people don’t worry about their hours

Only some people can choose their shifts

Workplace

Most people feel treated with respect

Most people get breaks without interruption

Most people are stressed out

About Fidelity

Company size

Headquarters location

Year founded

Website

Social media

Share this job