$102K - $139.60K/yr
This role is fully remote-friendly, with team members distributed across the US and Canada ... Identify and resolve bottlenecks in data, compute, orchestration, and observability layers * Mentor ...
$102K - $139.60K/yr
This role is fully remote-friendly, with team members distributed across the US and Canada ... Identify and resolve bottlenecks in data, compute, orchestration, and observability layers * Mentor ...
$102K - $139.60K/yr
This role is fully remote-friendly, with team members distributed across the US and Canada ... Identify and resolve bottlenecks in data, compute, orchestration, and observability layers * Mentor ...
This role is fully remote-friendly, with team members distributed across the US and Canada ... Define scalable approaches for model deployment, inference services, monitoring, and observability ...
This role is fully remote-friendly, with team members distributed across the US and Canada ... Define scalable approaches for model deployment, inference services, monitoring, and observability ...
US or Canada Remote Responsibilities * Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability * Design scalable systems for ...
US or Canada Remote Responsibilities * Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability * Design scalable systems for ...
Montgomery, AL · On-site +1
$62.25 - $82/hr
Familiarity with observability tools such as Datadog, Prometheus, or CloudWatch. * Ability to work ... Paid Time Off (vacation, sick leave, parental leave, and holidays). * 100% remote work. * The ...
Montgomery, AL · On-site +1
$62.25 - $82/hr
Familiarity with observability tools such as Datadog, Prometheus, or CloudWatch. * Ability to work ... Paid Time Off (vacation, sick leave, parental leave, and holidays). * 100% remote work. * The ...
Experience implementing comprehensive observability (Datadog/Prometheus) and testing strategies ... Paid Time Off (vacation, sick leave, parental leave, and holidays) * 100% remote work. * The ...
Experience implementing comprehensive observability (Datadog/Prometheus) and testing strategies ... Paid Time Off (vacation, sick leave, parental leave, and holidays) * 100% remote work. * The ...
Montgomery, AL · On-site +1
$133.50K - $179K/yr
Experience implementing observability practices using tools such as Datadog, Prometheus, CloudWatch ... Paid Time Off (vacation, sick leave, parental leave, and holidays). * 100% remote work. * The ...
Montgomery, AL · On-site +1
$133.50K - $179K/yr
Experience implementing observability practices using tools such as Datadog, Prometheus, CloudWatch ... Paid Time Off (vacation, sick leave, parental leave, and holidays). * 100% remote work. * The ...
United States (Remote) Interested applicants must reside in one of the following approved states ... Establish principles and patterns for platform scalability, reliability, security, observability ...
United States (Remote) Interested applicants must reside in one of the following approved states ... Establish principles and patterns for platform scalability, reliability, security, observability ...
Huntsville, AL · On-site +1
$116.80K - $154K/yr
Posting Type Hybrid/Remote Job Overview Who We Are Relativity is a leading legal data intelligence ... Solid understanding of CI/CD, observability, and incident response. * Experience guiding and ...
New
Huntsville, AL · On-site +1
$116.80K - $154K/yr
Posting Type Hybrid/Remote Job Overview Who We Are Relativity is a leading legal data intelligence ... Solid understanding of CI/CD, observability, and incident response. * Experience guiding and ...
New
AL · On-site +1
$120K/yr
Monitor system health and performance using observability stacks such as Prometheus, Grafana, and the ELK stack, and proactively resolve issues. * Design and implement secure remote access, user ...
AL · On-site +1
$120K/yr
Monitor system health and performance using observability stacks such as Prometheus, Grafana, and the ELK stack, and proactively resolve issues. * Design and implement secure remote access, user ...
Montgomery, AL · Remote
$225.10K - $264.50K/yr
Remote, United States Employment Type: FullTime Location Type: Remote Department Engineering ... Partner with cross-functional teams--including Platform, Kafka, Observability, Developer ...
Montgomery, AL · Remote
$225.10K - $264.50K/yr
Remote, United States Employment Type: FullTime Location Type: Remote Department Engineering ... Partner with cross-functional teams--including Platform, Kafka, Observability, Developer ...
Huntsville, AL · On-site +1
$140K - $220K/yr
... observability. Education/Qualifications Minimum Requirements: * Must be a U.S. citizen and be ... Remote
Huntsville, AL · On-site +1
$140K - $220K/yr
... observability. Education/Qualifications Minimum Requirements: * Must be a U.S. citizen and be ... Remote
Huntsville, AL · Remote
$140K - $220K/yr
... and observability. Minimum Requirements: * Must be a U.S. citizen and be willing to obtain and ... Remote
Huntsville, AL · Remote
$140K - $220K/yr
... and observability. Minimum Requirements: * Must be a U.S. citizen and be willing to obtain and ... Remote
Huntsville, AL · Remote
$140K - $220K/yr
... observability. Education/Qualifications Minimum Requirements: * Must be a U.S. citizen and be ... Remote Employment Type: FULL_TIME
Huntsville, AL · Remote
$140K - $220K/yr
... observability. Education/Qualifications Minimum Requirements: * Must be a U.S. citizen and be ... Remote Employment Type: FULL_TIME
Huntsville, AL · Remote
$140K - $220K/yr
... observability. Education/Qualifications Minimum Requirements: * Must be a U.S. citizen and be ... Remote Employment Type: FULL_TIME
Huntsville, AL · Remote
$140K - $220K/yr
... observability. Education/Qualifications Minimum Requirements: * Must be a U.S. citizen and be ... Remote Employment Type: FULL_TIME
Huntsville, AL · On-site +1
$55 - $73.50/hr
This position can be performed remote from anywhere, but may require up to 15% travel. As a skilled ... Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, Datadog)
Huntsville, AL · On-site +1
$55 - $73.50/hr
This position can be performed remote from anywhere, but may require up to 15% travel. As a skilled ... Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, Datadog)
Huntsville, AL · On-site +1
... days remote Position Description: This position focuses on AI/ML and data execution within the ... observability platforms About PingWind PingWind is focused on delivering outstanding services to ...
Huntsville, AL · On-site +1
... days remote Position Description: This position focuses on AI/ML and data execution within the ... observability platforms About PingWind PingWind is focused on delivering outstanding services to ...
Huntsville, AL · On-site +1
Posting Type Remote/Hybrid Job Overview Relativity is aprivate equity-backed, legal data ... Continuous delivery, strong observability, and end-to-end ownership. * Inclusive Environment
Huntsville, AL · On-site +1
Posting Type Remote/Hybrid Job Overview Relativity is aprivate equity-backed, legal data ... Continuous delivery, strong observability, and end-to-end ownership. * Inclusive Environment
... and observability. For more information, visit www.enterprisedb.com Candidate Note: This role is 100% remote for candidates based in the US We're looking for a Product Manager who can help ...
... and observability. For more information, visit www.enterprisedb.com Candidate Note: This role is 100% remote for candidates based in the US We're looking for a Product Manager who can help ...
| Aspect | Remote Observability | Remote Monitoring |
|---|---|---|
| Focus | Comprehensive system insights, including logs, metrics, and traces | Tracking specific system metrics and alerts |
| Tools | OpenTelemetry, Grafana, Jaeger | Nagios, Zabbix, Datadog |
| Work Environment | DevOps, SRE teams managing complex distributed systems | IT operations teams overseeing system health |
| Credentials | Knowledge of cloud platforms, scripting, and monitoring tools | Basic networking, system administration skills |
Remote Observability provides a holistic view of system health through logs, metrics, and traces, enabling proactive troubleshooting. Remote Monitoring focuses on tracking specific metrics and alerts to detect issues. While both roles involve system oversight, observability offers deeper insights for complex environments, whereas monitoring emphasizes real-time alerts for system stability.
$102K - $139.60K/yr
Full-time
Posted 23 days ago
9.5
Based on 5 frontline employees who took The Breakroom Quiz
5th of 183 rated software companies
Job Requisition ID #
POSITION OVERVIEW
The work we do at Autodesk touches nearly every person on the planet. By creating software tools for making buildings, machines, and even the latest movies, we influence and empower some of the most creative people in the world to solve problems that matter.
Autodesk is seeking a Senior ML Engineer, ML Systems and Infrastructure to design and scale the systems that enable machine learning across research and product development. You will help build the infrastructure behind large-scale data pipelines, distributed training systems, evaluation frameworks, and production ML workflows that support foundation models and ML-powered product features.
This role is ideal for an engineer who is deeply interested in scalable systems and production-grade ML infrastructure. You will operate independently across multiple parts of the stack and help define strong engineering practices for reliability, performance, and maintainability.
This role is fully remote-friendly, with team members distributed across the US and Canada.
Location: US or Canada Remote, East Coast
RESPONSIBILITIES
Design and build scalable systems for ML training, evaluation, deployment, and monitoring
Develop and improve data pipelines that process large-scale structured and semi-structured technical datasets
Optimize distributed workflows for performance, reliability, resource utilization, and cost efficiency
Build platform capabilities such as experiment tracking, model versioning, checkpointing, reproducibility, and observability
Contribute to model deployment, inference services, and production monitoring workflows
Improve data quality, lineage, provenance, and operational transparency across ML pipelines
Contribute to architecture and design discussions across the team
Identify and resolve bottlenecks in data, compute, orchestration, and observability layers
Mentor engineers through code reviews, design guidance, and knowledge sharing
Collaborate closely with researchers, product engineers, and platform partners to turn ML workflows into robust engineering systems
MINIMUM QUALIFICATIONS
Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent industry experience
At least 3 to 4 years of industry experience building and operating production software, ML systems, distributed infrastructure, or large-scale data pipelines
Strong experience in software engineering, distributed systems, backend systems, or ML infrastructure
Strong proficiency in Python and experience delivering production-quality systems
Experience designing and operating scalable data or compute pipelines
Experience with cloud platforms such as AWS, Azure, or GCP
Familiarity with containers, CI/CD, observability, and release quality practices
Ability to independently drive technical execution on complex work with limited oversight
PREFERRED QUALIFICATIONS
Experience building data pipelines for large-scale structured and semi-structured technical datasets
Experience with data lineage, provenance, governance, and responsible data usage in ML systems
Experience with distributed data processing and orchestration systems such as Ray, Airflow, Spark, or similar platforms
Experience with model deployment, inference services, monitoring, and observability for production ML systems
Experience building ML-ready representations for geometry, graph, hierarchical, or multimodal data
Experience with distributed ML frameworks such as PyTorch, Lightning, DeepSpeed, FSDP, Megatron, or similar
Familiarity with AEC workflows, design data, BIM/CAD formats, or Autodesk products
THE IDEAL CANDIDATE
Thinks like a systems engineer and executes like a strong software developer
Can balance short-term delivery with long-term platform health
Brings strong technical judgment and ownership
Improves team effectiveness through mentoring and engineering rigor
Enjoys solving scaling, performance, and reliability challenges
At Autodesk, we're building a diverse workplace and an inclusive culture to give more people the chance to imagine, design, and make a better world. Autodesk is proud to be an equal opportunity employer and considers all qualified applicants for employment without regard to race, color, religion, age, sex, sexual orientation, gender, gender identity, national origin, disability, veteran status or any other legally protected characteristic. We also consider for employment all qualified applicants regardless of criminal histories, consistent with applicable law.
Are you an existing contractor or consultant with Autodesk? Please search for open jobs and apply internally (not on this external site). If you have any questions or require support, contact Autodesk Careers.Sourced by ZipRecruiter
Autodesk is changing how the world is designed and made. Our technology spans architecture, engineering, construction, product design, manufacturing, media, and entertainment, empowering innovators everywhere to solve challenges big and small. From greener buildings to smarter products to more mesmerizing blockbusters, Autodesk software helps our customers to design and make a better world for all. For more information visit autodesk.com or follow @autodesk.
Software development
10,000+ Employees
San Rafael, CA, US
1982