1

Reinforcement Learning Engineer Jobs (NOW HIRING)

Reinforcement Learning Engineer

New York, NY ยท On-site

$87.50K - $118.20K/yr

Reinforcement Learning (RL) Engineer Location: New York (Office) On-site Full-time Compensation: Competitive Our client is an elite development firm and a high-growth software company responsible for ...

Senior Reinforcement Learning Engineer

Austin, TX ยท On-site

$103.60K - $142.20K/yr

JOB SUMMARY The Senior Reinforcement Learning Engineer is a key, hands-on role focused on achieving state-of-the-art performance on our humanoid robots. This engineer will leverage their deep ...

Applied Reinforcement Learning Engineer Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote) About the Team Centific AI Research advances foundational AI models and applications through ...

Position Overview We are looking for a Machine Learning Engineer to be responsible for designing and implementing cutting-edge reinforcement learning algorithms, conducting experiments, and ...

next page

Showing results 1-20

Reinforcement Learning Engineer information

See salary details

$38K

$115.9K

$191.5K

How much do reinforcement learning engineer jobs pay per year?

As of May 29, 2026, the average yearly pay for reinforcement learning engineer in the United States is $115,864.00, according to ZipRecruiter salary data. Most workers in this role earn between $83,000.00 and $151,500.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Reinforcement Learning Engineer, and why are they important?

To thrive as a Reinforcement Learning Engineer, you need a strong background in machine learning, mathematics (especially probability and statistics), and programming languages like Python, often supported by a relevant degree in computer science or engineering. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), RL libraries (like OpenAI Gym), and cloud computing platforms is typically required. Problem-solving skills, creativity, and effective collaboration help set outstanding engineers apart in this field. These competencies enable the design and deployment of advanced RL solutions that address real-world challenges and drive innovation.

What are some common challenges faced by Reinforcement Learning Engineers when deploying models in real-world environments?

One of the main challenges Reinforcement Learning (RL) Engineers face is bridging the gap between simulation and real-world deployment. Models that perform well in controlled environments may struggle with unpredictable data, safety constraints, or limited feedback in production. Additionally, RL algorithms often require significant computational resources and careful tuning to avoid instability. Collaboration with domain experts and software engineers is essential to address these issues and ensure successful integration of RL solutions into existing systems.

What are Reinforcement Learning Engineers?

Reinforcement Learning Engineers are specialized professionals who design, develop, and implement algorithms based on reinforcement learning, a type of machine learning where agents learn to make decisions by receiving rewards or penalties. They work on building models that enable machines to learn optimal actions through trial and error in complex environments. Their responsibilities often include developing RL architectures, tuning hyperparameters, running simulations, and applying RL methods to real-world problems like robotics, gaming, or recommendation systems. RL Engineers typically have strong backgrounds in computer science, mathematics, and deep learning, along with experience in programming languages like Python and frameworks such as TensorFlow or PyTorch.

What is the difference between Reinforcement Learning Engineer vs Machine Learning Engineer?

AspectReinforcement Learning EngineerMachine Learning Engineer
CredentialsBachelor's/Master's in CS, AI, or related; experience with RL frameworksBachelor's/Master's in CS, Data Science, or related; experience with ML algorithms
Work EnvironmentResearch labs, AI startups, tech companies focusing on RL applicationsTech companies, data-driven firms, AI departments across industries
Industry UsageSpecialized in RL projects like robotics, game AI, autonomous systemsBroader applications including predictive modeling, NLP, computer vision

Reinforcement Learning Engineers focus on developing algorithms that learn through interactions with environments, often in robotics or gaming. Machine Learning Engineers work on a wider range of models and applications. While both roles require strong programming and math skills, RL Engineers specialize in sequential decision-making, whereas ML Engineers handle diverse data-driven tasks across industries.

More about Reinforcement Learning Engineer jobs
What cities are hiring for Reinforcement Learning Engineer jobs? Cities with the most Reinforcement Learning Engineer job openings:
What states have the most Reinforcement Learning Engineer jobs? States with the most job openings for Reinforcement Learning Engineer jobs include:
Infographic showing various Reinforcement Learning Engineer job openings in the United States as of May 2026, with employment types broken down into 67% Full Time, and 33% Contract. Highlights an 100% In-person job distribution, with an average salary of $115,864 per year, or $55.7 per hour.
Reinforcement Learning Engineer

Reinforcement Learning Engineer

Weights & Biases

Bellevue, WA โ€ข On-site

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 12 days ago


Job description

CoreWeave, the AI Hyperscalerโ„ข, acquired Weights & Biases to create the most powerful end-to-end platform to develop, deploy, and iterate AI faster. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe, and was ranked as one of the TIME100 most influential companies of 2024. By bringing together CoreWeave's industry-leading cloud infrastructure with the best-in-class tools AI practitioners know and love from Weights & Biases, we're setting a new standard for how AI is built, trained, and scaled.
The integration of our teams and technologies is accelerating our shared mission: to empower developers with the tools and infrastructure they need to push the boundaries of what AI can do. From experiment tracking and model optimization to high-performance training clusters, agent building, and inference at scale, we're combining forces to serve the full AI lifecycle - all in one seamless platform.
Weights & Biases has long been trusted by over 1,500 organizations - including AstraZeneca, Canva, Cohere, OpenAI, Meta, Snowflake, Square,Toyota, and Wayve - to build better models, AI agents and applications. Now, as part of CoreWeave, that impact is amplified across a broader ecosystem of AI innovators, researchers, and enterprises.
As we unite under one vision, we're looking for bold thinkers and agile builders who are excited to shape the future of AI alongside us. If you're passionate about solving complex problems at the intersection of software, hardware, and AI, there's never been a more exciting time to join our team.
Our Team
The OpenPipe team at CoreWeave is building tools to help agents learn from experience. This is a critical step to make agents reliable enough to perform long tasks autonomously, in the same way human employees are. We're systematically identifying and solving the major bottlenecks between today's tech and those future self-improving agents. So far, we've:
  • Released ART, the easiest library for getting started with RL.
  • Developed RULER, a general-purpose reward function that works across many diverse tasks.
  • Built Serverless RL, an elegant API that gives RL practitioners full control over their data, environment and reward function while letting them outsource the headaches of managing GPU infrastructure.

These releases have a theme: we're systematically tackling each major roadblock to successfully training self-improving agents. Several serious challenges remain. Building simulated environments often requires substantial human labor, and existing training methods are not data efficient enough. We're laser-focused on solving these problems and making self-improvement a reality for agent developers.
In startup terms, this is a classic hard-tech bet. Our roadmap involves substantial technical risk; there are still major technical problems we're facing without a proven solution. However, there is very little market risk. We've worked closely with the teams building agents at many of the top AI-native startups as well as large enterprises. If we can build this, everyone will want it. A self-improving agent that learns from experience the way a human employee would could quickly capture a large fraction of the total inference market, which is worth tens of billions of dollars today and will be worth hundreds of billions in a few years.
About the Role
You have trained LLMs to be SOTA on specific tasks. You have opinions on whether sequence-level or token-level importance ratios are more effective. You probably shared the ScaleRL paper in your group chats, and kicked off a few ablations after you read it.
This is an applied research role. You will be expected to generate and investigate research ideas towards solving the remaining obstacles to continuous learning in production. You will work with the broader OpenPipe team to validate these research directions across real customer tasks. We are very GPU rich and are ready to direct an enormous amount of compute at this effort.
Beyond your role's specific qualifications, we're looking for strong engineers with great taste. The most important qualification by far is that you learn fast and can ship. This role will inevitably involve a lot of learning on the job; we're building this airplane as we fly it. Engineers on our team touch everything from CUDA kernels to high-performance LLM tracing dashboards, and you will have an opportunity to touch many parts of this stack.
Although we operate as part of a larger company, the OpenPipe team is small, has a large degree of autonomy and drives our own roadmap and priorities. This is an excellent role for someone looking to found their own company in the future.
Required Qualifications
  • Bachelor's or Master's degree in Computer Science, Machine Learning, PhD in Robotics, or a related field
  • 5+ years of experience in machine learning, with a strong focus on reinforcement learning or PhD + 2 years experience
  • Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow, or JAX)
  • Strong understanding of RL fundamentals: MDPs, policy optimization, value functions, exploration/exploitation trade-offs
  • Experience building and deploying ML models in production environments
  • Strong problem-solving skills and ability to work in ambiguous, research-driven environments
Preferred Qualifications
  • Publications in top-tier ML/AI conferences (NeurIPS, ICML, ICLR)
  • Familiarity with distributed training, GPU/TPU acceleration, and large-scale data pipelines
  • Knowledge of MLOps practices, CI/CD for ML, and model monitoring
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Experience leading projects or small teams

Our Stack
We strive to use the best tool for the job when building and deploying our production services. Sometimes that means writing our own custom code, and often it means leaning on the work of others. As part of building Serverless RL, we depend on the following libraries and frameworks (among many others):
  • Kubernetes
  • Megatron
  • Unsloth
  • Temporal
  • Postgres
  • FastAPI
Why CoreWeave?
We work hard, have fun, and move fast! We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:
  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best-in-Class Client Experiences
  • Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for takeoff, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!
The base salary range for this role is $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.
In addition to a competitive salary, we offer a variety of benefits to support your needs, including:
  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption

Our Workplace
While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration
California Consumer Privacy Act - California applicants only
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
As part of this commitment and consistent with the Americans with Disabilities Act (ADA), CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: careers@coreweave.com.
Export Control Compliance
This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. ยง 1157, or (iv) asylee under 8 U.S.C. ยง 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.