Ml Inference Jobs (NOW HIRING)

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

The Inference Enablement and Acceleration team is at the forefront of running a wide range of models and supporting novel architecture alongside maximizing their performance for AWS's custom ML ...

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

Advanced Micro Devices, Inc

Senior Product Manager - ROCm & AI/ML Inference Software

Santa Clara, CA · On-site

$179K/yr

... inference requirements and translates market signals into actionable product strategy. Open-Source Community Engagement * Serve as AMD's active presence in the open-source AI/ML community: monitor ...

Advanced Micro Devices, Inc

Senior Product Manager - ROCm & AI/ML Inference Software

Santa Clara, CA · On-site

$179K/yr

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

Amazon

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

Advanced Micro Devices, Inc

Senior Product Manager -ROCm& AI/ML Inference Software

Santa Clara, CA · On-site

$149K - $197K/yr

Advanced Micro Devices, Inc

Senior Product Manager -ROCm& AI/ML Inference Software

Santa Clara, CA · On-site

$149K - $197K/yr

Amazon

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

$128K - $177K/yr

Amazon

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

$128K - $177K/yr

Surge InfoTech LLC

AI/ML Platform Engineer

Alexandria, VA · On-site

FastAPI and microservices for ML inference * InfrastructureasCode (Terraform) * Kubernetes and Docker for scalable ML workloads * Distributed/cloud systems design with AWS * Edgetocloud system ...

Surge InfoTech LLC

AI/ML Platform Engineer

Alexandria, VA · On-site

Apple

ML Framework (MetalLM) Engineer, Graphics, Game and ML

Cupertino, CA

$150K - $277K/yr

Apple's Server ML Frameworks team in GPU, Graphics and Machine Learning works on enabling Apple Intelligence through high-performance, distributed inference of GenAI applications (such as LLMs) on ...

Apple

ML Framework (MetalLM) Engineer, Graphics, Game and ML

Cupertino, CA

$150K - $277K/yr

Amazon

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

$128K - $177K/yr

Amazon

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA · On-site

$128K - $177K/yr

Apple

ML Software Engineer

Seattle, WA

$142K - $263K/yr

Our team builds ML-inference applications and services on Apple Silicon in the datacenter, specifically focusing in recent years on generative AI as part of the Private Cloud Compute component of ...

Apple

ML Software Engineer

Seattle, WA

$142K - $263K/yr

Apple

ML Software Engineer

Seattle, WA

$175K - $263K/yr

Apple

ML Software Engineer

Seattle, WA

$175K - $263K/yr

Amazon

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA

$128K - $177K/yr

Amazon

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Cupertino, CA

$128K - $177K/yr

PICTOR LABS INC

$89K - $123K/yr

About the Role We are seeking an experienced Senior ML Inference Engineer to join our team, focusing on optimizing and deploying our production virtual staining models at scale. The ideal candidate ...

PICTOR LABS INC

$89K - $123K/yr

Parafin, Inc

Senior Software Engineer, ML Platform

San Francisco, CA · On-site

$220K - $265K/yr

Decompose data scientist training/inference notebooks into reusable, tested components (libraries, pipelines, templates) with clear interfaces and documentation. * Create developer-friendly ML ...

Parafin, Inc

Senior Software Engineer, ML Platform

San Francisco, CA · On-site

$220K - $265K/yr

Parafin

Senior Software Engineer, ML Platform

San Francisco, CA · Remote

$220K - $265K/yr

Quick apply

Parafin

Senior Software Engineer, ML Platform

San Francisco, CA · Remote

$220K - $265K/yr

E-Space

AI / Embedded ML Engineer

Saratoga, CA · On-site

$145K - $190K/yr

... inference latency • Use frameworks including TensorFlow Lite Micro, Edge Impulse, ONNX Runtime, and ExecuTorch • Integrate ML inference into embedded firmware written in C, C++, or Rust • ...

E-Space

AI / Embedded ML Engineer

Saratoga, CA · On-site

$145K - $190K/yr

Figure

Staff AI Inference and Acceleration Engineer

San Jose, CA · On-site

$180K - $275K/yr

Partner closely with the AI/ML team to define model architecture constraints that are hardware ... Deep understanding of AI/ML inference - model formats (ONNX, TFLite, etc.), inference runtimes, and ...

Figure

Staff AI Inference and Acceleration Engineer

San Jose, CA · On-site

$180K - $275K/yr

E-Space

AI / Embedded ML Engineer

Saratoga, CA · Hybrid

$145K - $190K/yr

... inference latency Use frameworks including TensorFlow Lite Micro, Edge Impulse, ONNX Runtime, and ExecuTorch Integrate ML inference into embedded firmware written in C, C++, or Rust Profile and ...

E-Space

AI / Embedded ML Engineer

Saratoga, CA · Hybrid

$145K - $190K/yr

Showing results 1-20

Ml Inference Jobs

Ml Inference information

See salary details

$37.5K

$122.7K

$196.5K

How much do ml inference jobs pay per year?

As of Jul 16, 2026, the average yearly pay for ml inference in the United States is $122,738.00, according to ZipRecruiter salary data. Most workers in this role earn between $98,500.00 and $136,000.00 per year, depending on experience, location, and employer.

What is a $900000 AI job?

A $900,000 AI job typically refers to high-level roles in artificial intelligence, such as senior machine learning engineers or AI research directors, often involving advanced skills in deep learning, data modeling, and programming with tools like Python and TensorFlow. These positions usually require extensive experience, specialized knowledge, and may include leadership responsibilities or strategic decision-making.

What is ML inference?

ML inference refers to the process of using a trained machine learning model to make predictions or decisions based on new data. After a model has been trained on historical data, inference is the phase where that model is deployed and used in real-world applications, such as recognizing speech, detecting objects in images, or recommending products. The focus in ML inference is on speed, efficiency, and scalability to ensure quick predictions, often in real time. This process is critical for practical applications like mobile apps, web services, and embedded systems. Optimizing inference involves reducing latency, memory usage, and computational requirements.

What is the difference between Ml Inference vs Data Scientist?

Aspect	ML Inference	Data Scientist
Required Credentials	Knowledge of machine learning models, programming skills	Degree in data science, statistics, or related fields
Work Environment	Deploying models in production, real-time data processing	Data analysis, model development, research
Industry Usage	AI product deployment, software companies	Research institutions, tech firms, consulting

ML Inference focuses on deploying trained models to make predictions on new data, often in real-time. Data Scientists develop and analyze models, working primarily in research and development. While both roles require understanding of machine learning, ML Inference emphasizes deployment and operationalization, whereas Data Scientists focus on model creation and analysis.

What engineer makes $500,000 a year?

Senior machine learning engineers with extensive experience, advanced skills in deep learning, and expertise in deploying large-scale models can earn salaries approaching or exceeding $500,000 annually, especially in high-cost-of-living areas or top tech companies. Compensation often includes base salary, bonuses, and stock options, reflecting their specialized knowledge and impact on product development.

Which 3 jobs will survive AI?

Jobs involving Ml Inference, such as data scientists, machine learning engineers, and AI system architects, are likely to persist as they require specialized expertise in developing, deploying, and maintaining AI models. These roles demand critical thinking, domain knowledge, and skills in programming and data analysis that are less easily automated. Continuous learning and staying updated with AI tools and frameworks are essential for these professions to remain relevant.

What are some common challenges faced by ML Inference Engineers when deploying models to production?

ML Inference Engineers often encounter challenges such as optimizing model latency and throughput to meet production requirements, ensuring compatibility with diverse hardware environments, and managing model versioning and updates without disrupting service. Additionally, balancing resource utilization and inference accuracy while monitoring real-time performance metrics is crucial. Collaboration with data scientists, DevOps, and software engineers is typically essential to streamline deployment and maintain robust, scalable inference pipelines.

Will MLE be replaced by AI?

Machine Learning Engineers (MLEs) design, develop, and optimize AI models and systems. While AI automation tools can assist with certain tasks, MLEs are essential for building, tuning, and maintaining complex models, making complete replacement unlikely in the near term. Their expertise in data handling, model deployment, and system integration remains critical in AI development environments.

What are the key skills and qualifications needed to thrive in ML Inference, and why are they important?

To thrive in ML Inference, you need a solid background in machine learning principles, programming (Python or C++), and experience with deploying models at scale, often supported by a degree in computer science or a related field. Familiarity with frameworks and tools such as TensorFlow, PyTorch, ONNX, and cloud platforms like AWS SageMaker or Google AI Platform is typically required. Strong problem-solving skills, attention to detail, and effective communication are crucial soft skills for collaborating with multidisciplinary teams and optimizing model performance. These skills ensure efficient, scalable, and reliable deployment of machine learning solutions in real-world applications.

More about Ml Inference jobs

The 10 Top Types Of Ml Inference Jobs

What cities are hiring for Ml Inference jobs? Cities with the most Ml Inference job openings:

What states have the most Ml Inference jobs? States with the most job openings for Ml Inference jobs include:

What job categories do people searching Ml Inference jobs look for? The top searched job categories for Ml Inference jobs are:

Ml Inference jobs near you

Infographic showing various Ml Inference job openings in the United States as of July 2026, with employment types broken down into 81% Full Time, 18% Part Time, and 1% Contract. Highlights an 71% Physical, 2% Hybrid, and 27% Remote job distribution, with an average salary of $122,738 per year, or $59 per hour.

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Amazon

Cupertino, CA • On-site

Apply

Full-time

Posted 29 days ago

Amazon rating

7.4

Based on 6,972 frontline employees who took The Breakroom Quiz

6th of 39 rated national retailers

Job description

The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.
The AWS Neuron SDK, developed by the Annapurna Labs team at AWS, is the backbone for accelerating deep learning and GenAI workloads on Amazon's Inferentia and Trainium ML accelerators. This comprehensive toolkit includes an ML compiler, runtime, and application framework that seamlessly integrates with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance.
The Inference Enablement and Acceleration team is at the forefront of running a wide range of models and supporting novel architecture alongside maximizing their performance for AWS's custom ML accelerators. Working across the stack from PyTorch till the hardware-software boundary, our engineers build systematic infrastructure, innovate new methods and create high-performance kernels for ML functions, ensuring every compute unit is fine tuned for optimal performance for our customers' demanding workloads

We combine deep hardware knowledge with ML expertise to push the boundaries of what's possible in AI acceleration.
As part of the broader Neuron organization, our team works across multiple technology layers - from frameworks and kernels and collaborate with compiler to runtime and collectives. We not only optimize current performance but also contribute to future architecture designs, working closely with customers to enable their models and ensure optimal performance. This role offers a unique opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures, where you'll help shape the future of AI acceleration technology
You will architect and implement business critical features, and mentor a brilliant team of experienced engineers

We operate in spaces that are very large, yet our teams remain small and agile. There is no blueprint. We're inventing.

We're experimenting. It is a very unique learning culture. The team works closely with customers on their model enablement, providing direct support and optimization expertise to ensure their machine learning workloads achieve optimal performance on AWS ML accelerators.

The team collaborates with open source ecosystems to provide seamless integration and bring peak performance at scale for customers and developers.
This role is responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side with compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trainium and Inferentia. Experience optimizing inference performance for both latency and throughput on such large models across the stack from system level optimizations through to Pytorch or JAX is a must have

You can learn more about Neuron
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-cc/index.html
https://aws.amazon.com/machine-learning/neuron/
https://github.com/aws/aws-neuron-sdk
https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success
Key job responsibilities
This role will help lead the efforts in building distributed inference support for Pytorch in the Neuron SDK. This role will tune these models to ensure highest performance and maximize the efficiency of them running on the customer AWS Trainium and Inferentia silicon and servers. Strong software development using Python, System level programming and ML knowledge are both critical to this role

Our engineers collaborate across compiler, runtime, framework, and hardware teams to optimize machine learning workloads for our global customer base. Working at the intersection of software, hardware, and machine learning systems, you'll bring expertise in low-level optimization, system architecture, and ML model acceleration. In this role, you will:
* Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators.
* Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment

* Build infrastructure to systematically analyze and onboard multiple models with diverse architecture.
* Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models
* Analyze and optimize system-level performance across multiple generations of Neuron hardware
* Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
* Implement optimizations such as fusion, sharding, tiling, and scheduling
* Conduct comprehensive testing, including unit and end-to-end model testing with continuous deployment and releases through pipelines.
* Work directly with customers to enable and optimize their ML models on AWS accelerators
* Collaborate across teams to develop innovative optimization techniques
A day in the life
You will collaborate with a cross-functional team of applied scientists, system engineers, and product managers to deliver state-of-the-art inference capabilities for Generative AI applications. Your work will involve debugging performance issues, optimizing memory usage, and shaping the future of Neuron's inference stack across Amazon and the Open Source Community

As you design and code solutions to help our team drive efficiencies in software architecture, you'll create metrics, implement automation and other improvements, and resolve the root cause of software defects.
You will also build high-impact solutions to deliver to our large customer base and participate in design discussions, code review, and communicate with internal and external stakeholders. You will work cross-functionally to help drive business decisions with your technical input.

You will work in a startup-like development environment, where you're always working on the most important initiative.
About the team
The Inference Enablement and Acceleration team fosters a builder's culture where experimentation is encouraged, and impact is measurable. We emphasize collaboration, technical ownership, and continuous learning. Our team is dedicated to supporting new members

We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.

Join us to solve some of the most interesting and impactful infrastructure challenges in AI/ML today.

What Amazon employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom

About Amazon

Sourced by ZipRecruiter

Amazon.com, Inc., commonly known as Amazon, is an American multinational technology company. It was founded by Jeff Bezos in 1994 and initially started as an online marketplace for books. Since then, Amazon has expanded its operations and become one of the largest e-commerce companies in the world. Amazon's primary business is its online retail platform, where customers can purchase a vast array of products, including electronics, clothing, books, home goods, and much more. The company offers a convenient and user-friendly shopping experience, with features such as fast shipping, customer reviews, and personalized recommendations. In addition to its e-commerce platform, Amazon has diversified its business into various other areas. One of its notable ventures is Amazon Web Services (AWS), a comprehensive cloud computing platform that provides services such as storage, compute power, and database management to individuals and businesses. AWS has become a leader in the cloud computing industry, powering many websites and applications worldwide. Amazon has also developed its own consumer electronics, including the popular Amazon Kindle e-reader, Fire tablets, Fire TV streaming devices, and the Alexa-powered Echo smart speakers. The Alexa voice assistant, integrated into these devices, allows users to interact with their devices using voice commands, perform tasks, and access information. Furthermore, Amazon has expanded into media and entertainment. It operates Prime Video, a streaming service that offers a wide range of movies, TV shows, and original content. Amazon Music provides a platform for streaming and purchasing digital music, while Audible offers audiobooks and other audio content. The company's commitment to customer satisfaction and convenience is demonstrated by its membership program, Amazon Prime. Prime members receive various benefits, including free two-day shipping, access to streaming services, exclusive deals, and more.

Industry

It services, book publishers, retail, real estate and computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Seattle, WA, US

Website

amazon.com

Social media

View All Amazon Jobs

Apply

Ml Inference Jobs (NOW HIRING)

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Senior Product Manager - ROCm & AI/ML Inference Software

Senior Product Manager - ROCm & AI/ML Inference Software

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Senior Product Manager -ROCm& AI/ML Inference Software

Senior Product Manager -ROCm& AI/ML Inference Software

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

AI/ML Platform Engineer

AI/ML Platform Engineer

ML Framework (MetalLM) Engineer, Graphics, Game and ML

ML Framework (MetalLM) Engineer, Graphics, Game and ML

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

ML Software Engineer

ML Software Engineer

ML Software Engineer

ML Software Engineer

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Sr. Machine Learning Engineer

Sr. Machine Learning Engineer

Senior Software Engineer, ML Platform

Senior Software Engineer, ML Platform

Senior Software Engineer, ML Platform

Senior Software Engineer, ML Platform

AI / Embedded ML Engineer

AI / Embedded ML Engineer

Staff AI Inference and Acceleration Engineer

Staff AI Inference and Acceleration Engineer

AI / Embedded ML Engineer

AI / Embedded ML Engineer

Ml Inference information

See salary details

How much do ml inference jobs pay per year?

Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Share this job

Amazon rating

Get the real story on frontline employers

Job description

What Amazon employees say

Get the real story on frontline employers

Pay

Most people get paid breaks

Most people don’t get paid when they’re sick

The job rarely spills into unpaid time

Benefits

Sick days use up paid time off

Only some part-timers can get health insurance

Most part-timers get paid time off

Hours and flexibility

Less than 4 weeks notice of work schedule

Some people worry about their hours

Only some people can choose their shifts

Workplace

Most people feel treated with respect

Most people get breaks without interruption

Some people are stressed out

About Amazon

Industry

Company size

Headquarters location

Website

Social media

Share this job