US or Canada Remote Responsibilities * Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability * Design scalable systems for ...
US or Canada Remote Responsibilities * Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability * Design scalable systems for ...
Experience with monitoring and observability platforms such as Splunk, Dynatrace, CloudWatch, or ... Remote Opportunity. Note: Selected candidates will be required to complete fingerprinting at a ...
Experience with monitoring and observability platforms such as Splunk, Dynatrace, CloudWatch, or ... Remote Opportunity. Note: Selected candidates will be required to complete fingerprinting at a ...
US or Canada Remote Responsibilities * Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability * Design scalable systems for ...
US or Canada Remote Responsibilities * Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability * Design scalable systems for ...
Site Reliability Engineer
Columbia, MD · On-site +1
$55.50 - $73.75/hr
Hybrid Columbia MD 3 times per week OR Remote (as applicable to role) Work Authorization ... This role is responsible for implementing observability and automation practices, supporting ...
Site Reliability Engineer
Columbia, MD · On-site +1
$55.50 - $73.75/hr
Hybrid Columbia MD 3 times per week OR Remote (as applicable to role) Work Authorization ... This role is responsible for implementing observability and automation practices, supporting ...
... data management, data observability, or technical B2B SaaS company Preferred Qualifications ... Remote (Regardless of Location): $187,300 - $213,700 for Director, Marketing McLean, VA: $206,000 ...
... data management, data observability, or technical B2B SaaS company Preferred Qualifications ... Remote (Regardless of Location): $187,300 - $213,700 for Director, Marketing McLean, VA: $206,000 ...
DevSecOps Architect (Remote)
Falls Church, VA · Remote
$69.25 - $89.50/hr
This remote contract-to-hire position will be originated in Falls Church, VA. * SELECTED CANDIDATES ... Observability requirements: Deep knowledge of eBPF, Prometheus, and AI-powered logging/monitoring ...
DevSecOps Architect (Remote)
Falls Church, VA · Remote
$69.25 - $89.50/hr
This remote contract-to-hire position will be originated in Falls Church, VA. * SELECTED CANDIDATES ... Observability requirements: Deep knowledge of eBPF, Prometheus, and AI-powered logging/monitoring ...
DevSecOps Architect (Remote)
Falls Church, VA · On-site +1
$69.25 - $89.50/hr
This remote contract-to-hire position will be originated in Falls Church, VA. * SELECTED CANDIDATES ... Observability requirements: Deep knowledge of eBPF, Prometheus, and AI-powered logging/monitoring ...
DevSecOps Architect (Remote)
Falls Church, VA · On-site +1
$69.25 - $89.50/hr
This remote contract-to-hire position will be originated in Falls Church, VA. * SELECTED CANDIDATES ... Observability requirements: Deep knowledge of eBPF, Prometheus, and AI-powered logging/monitoring ...
Distinguished Engineer (Remote - Eligible)
Mclean, VA · On-site +1
... Observability tooling (e.g., OpenTelemetry, Prometheus, Tracing) Capital One will consider ... Remote (Regardless of Location): $244,700 - $279,200 for Distinguished Engineer Cambridge, MA: $269 ...
Distinguished Engineer (Remote - Eligible)
Mclean, VA · On-site +1
... Observability tooling (e.g., OpenTelemetry, Prometheus, Tracing) Capital One will consider ... Remote (Regardless of Location): $244,700 - $279,200 for Distinguished Engineer Cambridge, MA: $269 ...
Senior Software Engineer (Command & Control)
Reston, VA · On-site +1
$127K - $168K/yr
Remote Sensing (the data), Space Systems (the components), and Mission Solutions (the platforms ... Strong understanding of system reliability, observability, and failure modes. * Experience with ...
Senior Software Engineer (Command & Control)
Reston, VA · On-site +1
$127K - $168K/yr
Remote Sensing (the data), Space Systems (the components), and Mission Solutions (the platforms ... Strong understanding of system reliability, observability, and failure modes. * Experience with ...
Senior Software Engineer (Command & Control)
Arlington, VA · On-site +1
$141K - $186K/yr
Remote Sensing (the data), Space Systems (the components), and Mission Solutions (the platforms ... Strong understanding of system reliability, observability, and failure modes. * Experience with ...
Senior Software Engineer (Command & Control)
Arlington, VA · On-site +1
$141K - $186K/yr
Remote Sensing (the data), Space Systems (the components), and Mission Solutions (the platforms ... Strong understanding of system reliability, observability, and failure modes. * Experience with ...
Remote Job opening for Sr. Software Engineer with our Federal cl with Security Clearance
Washington, DC · Remote
$70 - $75/hr
Remote Duration: Full-Time Contract (6-12 Months | P ossibility of Extension/Conversion) Pay range ... Develop scalable observability strategies using Datadog for applications, infrastructure, cloud ...
Remote Job opening for Sr. Software Engineer with our Federal cl with Security Clearance
Washington, DC · Remote
$70 - $75/hr
Remote Duration: Full-Time Contract (6-12 Months | P ossibility of Extension/Conversion) Pay range ... Develop scalable observability strategies using Datadog for applications, infrastructure, cloud ...
This remote contract-to-hire position will be originated in Falls Church, VA. * SELECTED CANDIDATES ... Observability requirements: Deep knowledge of eBPF, Prometheus, and AI-powered logging/monitoring ...
This remote contract-to-hire position will be originated in Falls Church, VA. * SELECTED CANDIDATES ... Observability requirements: Deep knowledge of eBPF, Prometheus, and AI-powered logging/monitoring ...
Platform Systems Manager
Bethesda, MD · On-site +1
This opportunity is full time and onsite/remote at the NCBI in Bethesda, MD and/or remote. NCBI is ... Develops and continuously improves DevSecOps, DataOps and Observability platform. * Develops and ...
Platform Systems Manager
Bethesda, MD · On-site +1
This opportunity is full time and onsite/remote at the NCBI in Bethesda, MD and/or remote. NCBI is ... Develops and continuously improves DevSecOps, DataOps and Observability platform. * Develops and ...
Develop observability, tracing, and monitoring systems for AI workloads using tools such as ... Remote-first flexibility and offsite team gatherings * Strong emphasis on wellness, learning, and ...
Develop observability, tracing, and monitoring systems for AI workloads using tools such as ... Remote-first flexibility and offsite team gatherings * Strong emphasis on wellness, learning, and ...
*This position is mostly remote, however it does require occasional travel to the customer site in ... leveraging observability metrics. Experience with Data Visualization, RESTful APIs, RESTful Web ...
New
*This position is mostly remote, however it does require occasional travel to the customer site in ... leveraging observability metrics. Experience with Data Visualization, RESTful APIs, RESTful Web ...
New
Full Stack Developer (Remote)
Washington, DC · Remote
$130K - $160K/yr
Remote/Hybrid Employment Type: Full-Time About USBC CEDC The US Black Chambers Community Economic ... observability and system health monitoring * Designing and maintaining messaging systems between ...
Quick apply
Full Stack Developer (Remote)
Washington, DC · Remote
$130K - $160K/yr
Remote/Hybrid Employment Type: Full-Time About USBC CEDC The US Black Chambers Community Economic ... observability and system health monitoring * Designing and maintaining messaging systems between ...
TA-0031 Intermediate Developer - 10 to 15 years LCAT 27
Chantilly, VA · On-site +1
$97.54/hr
Location: 100% Remote Work Authorization: U.S. Citizen or Green Card Holder Required Positions ... Performance tuning & observability principles * Multi-threading * Java, SQL, Python, React ...
TA-0031 Intermediate Developer - 10 to 15 years LCAT 27
Chantilly, VA · On-site +1
$97.54/hr
Location: 100% Remote Work Authorization: U.S. Citizen or Green Card Holder Required Positions ... Performance tuning & observability principles * Multi-threading * Java, SQL, Python, React ...
... observability and monitoring tools * 3 or more years of experience integrating CI/CD and pipeline tools * Internet: Will prioritize and maintain access to strong, reliable internet for the remote ...
... observability and monitoring tools * 3 or more years of experience integrating CI/CD and pipeline tools * Internet: Will prioritize and maintain access to strong, reliable internet for the remote ...
Consulting Architect | Public Sector | DC Preferred with Security Clearance
Washington, DC · Remote
$74.25 - $98.50/hr
Elastic is a free and open search company that powers enterprise search, observability, and ... Work effectively in a remote, highly distributed team environment, with periodic on-site ...
Consulting Architect | Public Sector | DC Preferred with Security Clearance
Washington, DC · Remote
$74.25 - $98.50/hr
Elastic is a free and open search company that powers enterprise search, observability, and ... Work effectively in a remote, highly distributed team environment, with periodic on-site ...
Consulting Architect | Public Sector | DC Preferred with Security Clearance
Washington, DC · Remote
$74.25 - $98.50/hr
Elastic is a free and open search company that powers enterprise search, observability, and ... Work effectively in a remote, highly distributed team environment, with periodic on-site ...
Consulting Architect | Public Sector | DC Preferred with Security Clearance
Washington, DC · Remote
$74.25 - $98.50/hr
Elastic is a free and open search company that powers enterprise search, observability, and ... Work effectively in a remote, highly distributed team environment, with periodic on-site ...
Remote Observability information
What are some common challenges faced by professionals in a Remote Observability role, and how can they be addressed?
How can I make 2000 a week working from home?
What is the difference between Remote Observability vs Remote Monitoring?
| Aspect | Remote Observability | Remote Monitoring |
|---|---|---|
| Focus | Comprehensive system insights, including logs, metrics, and traces | Tracking specific system metrics and alerts |
| Tools | OpenTelemetry, Grafana, Jaeger | Nagios, Zabbix, Datadog |
| Work Environment | DevOps, SRE teams managing complex distributed systems | IT operations teams overseeing system health |
| Credentials | Knowledge of cloud platforms, scripting, and monitoring tools | Basic networking, system administration skills |
Remote Observability provides a holistic view of system health through logs, metrics, and traces, enabling proactive troubleshooting. Remote Monitoring focuses on tracking specific metrics and alerts to detect issues. While both roles involve system oversight, observability offers deeper insights for complex environments, whereas monitoring emphasizes real-time alerts for system stability.
How can I make $100,000 a year working from home?
What jobs pay $10,000 a month without a degree?
What are the key skills and qualifications needed to thrive as a Remote Observability Engineer, and why are they important?
How can I make $70,000 a year working from home?
What is remote observability?
Principal ML Engineer, Machine Learning Platform and Systems Architecture
AutodeskHerndon, VA • Remote
Full-time
Posted 16 days ago
Autodesk rating
9.5
Based on 5 frontline employees who took The Breakroom Quiz
7th of 191 rated software companies
Job description
Job Requisition ID #
26WD97132, Principal Machine Learning Engineer, ML Platform and Systems Architecture
French translation to follow!/Traduction francaise a suivre!
Position Overview
The work we do at Autodesk touches nearly every person on the planet. By creating software tools for making buildings,machines, and even the latest movies, we influence and empower some of the most creative people in the world to solve problems that matter.
Autodesk is looking for a Principal ML Engineer, ML Platform and Systems Architecture to lead the design and evolution of large-scale machine learning platforms. In this role, you will own high-impact technical initiatives that span ML infrastructure,data systems, model lifecycle tooling, and production architecture. You will work closely with researchers, product teams, andengineering leadership to build the systems that bring advanced machine learning into reliable, scalable product experiences.This is a senior technical leadership role for an engineer who excels at system architecture, distributed computing, and end-to-end platform thinking. You will help define the technical direction for ML systems and drive execution across ambiguous, cross-functional, high-value initiatives.This role is fully remote-friendly, with team members distributed across the US and Canada.
Location: US or Canada Remote
Responsibilities
Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability
Design scalable systems for distributed training, data processing, feature and model lifecycle management, and production inference
Own platform-level technical outcomes from design through deployment, operations, and continuous improvement
Drive the design and scaling of data pipelines for large-scale structured and semi-structured technical datasets
Lead architecture for distributed data processing and orchestration systems such as Ray, Airflow, Spark, or similar platforms
Establish strong practices for data lineage, provenance, governance, and responsible data usage in ML systems
Guide the design of model deployment, inference services, monitoring, and observability for production ML workloads
Contribute to the development of ML-ready representations for geometry, graph, hierarchical, or multimodal data
Clarify ambiguous problem spaces, define solution approaches, and lead execution across multiple engineers and teams
Establish and improve engineering standards, operational practices, and architectural patterns for ML systems
Lead incident response for critical platform issues and drive lasting improvements across system health and supportability
Mentor engineers and act as a force multiplier through design leadership, coaching, and technical reviews
Communicate technical strategy, tradeoffs, and execution plans clearly to technical and non-technical stakeholders
Minimum Qualifications
Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent industry experience
Typically 6 to 8 years of industry experience in software engineering, ML infrastructure, distributed systems, or platform engineering, including experience leading design and delivery of complex technical systems
Deep experience in software architecture, distributed systems, large-scale data platforms, or ML infrastructure
Strong proficiency in Python and strong command of production software engineering practices
Experience leading complex technical initiatives that span multiple engineers or cross-functional teams
Strong experience with large-scale data pipelines, distributed data processing, and cloud-native platform architectures
Experience with model deployment, inference systems, and production observability
Demonstrated ability to make architecture decisions that balance performance, scalability, reliability, and cost
Strong communication and stakeholder management skills
Preferred Qualifications
Experience building data governance, lineage, and provenance capabilities for ML platforms
Experience building ML-ready representations for geometry, graph, hierarchical, or multimodal data
Deep experience with distributed ML frameworks and large-scale training infrastructure
Experience with Kubernetes, workflow orchestration systems, and modern ML platform tooling
Experience with production incident leadership, service reviews, resiliency practices, and operational readiness
Familiarity with AEC data, computational design workflows, BIM/CAD ecosystems, or Autodesk products
The Ideal Candidate
Is a strong architect and hands-on engineer
Drives clarity and momentum in ambiguous spaces
Thinks at platform level and acts with strong product and business awareness
Raises the engineering bar for system design, quality, and operational excellence
Builds trust through technical depth, calm judgment, and execution leadership
______________________________________________________________________________________________________________
26WD97132, Ingenieur principal en apprentissage automatique, Architecture des plateformes et des systemes d'apprentissage automatique
Presentation du poste
Le travail que nous accomplissons chez Autodesk touche pratiquement chaque habitant de la planete. En creant des outils logiciels destines a la conception de batiments, de machines et meme des films les plus recents, nous influencons et donnons les moyens a certaines des personnes les plus creatives au monde de resoudre des problemes qui comptent.
Autodesk recherche un ingenieur principal en apprentissage automatique, architecture de plateformes et de systemes ML, pour diriger la conception et l'evolution de plateformes d'apprentissage automatique a grande echelle. A ce poste, vous serez responsable d'initiatives techniques a fort impact couvrant l'infrastructure ML, les systemes de donnees, les outils de gestion du cycle de vie des modeles et l'architecture de production. Vous travaillerez en etroite collaboration avec les chercheurs, les equipes produit et la direction technique pour construire les systemes qui transforment l'apprentissage automatique avance en experiences produit fiables et evolutives. Il s'agit d'un poste de direction technique senior destine a un ingenieur excellant dans l'architecture systeme, le calcul distribue et la reflexion sur les plateformes de bout en bout. Vous contribuerez a definir l'orientation technique des systemes d'apprentissage automatique et piloterez la mise en uvre d'initiatives ambigues, transversales et a forte valeur ajoutee. Ce poste est entierement compatible avec le teletravail, les membres de l'equipe etant repartis aux Etats-Unis et au Canada.
Lieu : Etats-Unis ou Canada (teletravail)
Responsabilites
Diriger l'architecture et la mise en uvre des principales fonctionnalites de la plateforme d'apprentissage automatique (ML) en matiere de formation, d'evaluation, de deploiement et d'observabilite
Concevoir des systemes evolutifs pour la formation distribuee, le traitement des donnees, la gestion du cycle de vie des caracteristiques et des modeles, ainsi que l'inference en production
Assumer la responsabilite des resultats techniques au niveau de la plateforme, de la conception au deploiement, en passant par l'exploitation et l'amelioration continue
Piloter la conception et la mise a l'echelle de pipelines de donnees pour des ensembles de donnees techniques structures et semi-structures a grande echelle
Diriger l'architecture des systemes de traitement et d'orchestration de donnees distribues tels que Ray, Airflow, Spark ou des plateformes similaires
Mettre en place des pratiques rigoureuses en matiere de tracabilite des donnees, de provenance, de gouvernance et d'utilisation responsable des donnees dans les systemes d'apprentissage automatique
Guider la conception du deploiement des modeles, des services d'inference, de la surveillance et de l'observabilite pour les charges de travail d'apprentissage automatique en production
Contribuer au developpement de representations pretes pour l'apprentissage automatique pour les donnees geometriques, graphiques, hierarchiques ou multimodales
Clarifier les problematiques ambigues, definir des approches de solution et diriger la mise en uvre en collaboration avec plusieurs ingenieurs et equipes
Etablir et ameliorer les normes d'ingenierie, les pratiques operationnelles et les modeles architecturaux pour les systemes d'apprentissage automatique
Diriger la gestion des incidents pour les problemes critiques de la plateforme et piloter des ameliorations durables en matiere de sante et de maintenabilite du systeme
Encadrer les ingenieurs et agir comme un multiplicateur de force par le biais du leadership en conception, du coaching et des revues techniques
Communiquer clairement la strategie technique, les compromis et les plans d'execution aux parties prenantes techniques et non techniques
Qualifications minimales
Licence ou master en informatique, ingenierie ou dans un domaine connexe, ou experience professionnelle equivalente
Generalement 6 a 8 ans d'experience professionnelle en genie logiciel, infrastructure ML, systemes distribues ou ingenierie de plateformes, y compris une experience dans la direction de la conception et de la mise en uvre de systemes techniques complexes
Experience approfondie en architecture logicielle, systemes distribues, plateformes de donnees a grande echelle ou infrastructure ML
Maitrise approfondie de Python et solide connaissance des pratiques d'ingenierie logicielle en production
Experience dans la direction d'initiatives techniques complexes impliquant plusieurs ingenieurs ou des equipes interfonctionnelles
Solide experience des pipelines de donnees a grande echelle, du traitement distribue des donnees et des architectures de plateformes cloud-native
Experience du deploiement de modeles, des systemes d'inference et de l'observabilite en production
Capacite averee a prendre des decisions architecturales qui concilient performances, evolutivite, fiabilite et cout
Solides competences en communication et en gestion des parties prenantes
Qualifications souhaitees
Experience dans la mise en place de capacites de gouvernance des donnees, de tracabilite et de provenance pour les plateformes d'apprentissage automatique
Experience dans la creation de representations pretes pour l'apprentissage automatique pour les donnees geometriques, graphiques, hierarchiques ou multimodales
Experience approfondie des frameworks d'apprentissage automatique distribues et des infrastructures de formation a grande echelle
Experience avec Kubernetes, les systemes d'orchestration de workflows et les outils modernes des plateformes d'apprentissage automatique
Experience dans la gestion des incidents en production, les revues de services, les pratiques de resilience et la preparation operationnelle
Connaissance des donnees AEC, des workflows de conception computationnelle, des ecosystemes BIM/CAO ou des produits Autodesk
Le candidat ideal
Est un architecte chevronne et un ingenieur de terrain
Apporte clarte et dynamisme dans des contextes ambigus
Pense a l'echelle de la plateforme et agit avec une forte conscience des produits et des enjeux commerciaux
Releve le niveau d'exigence en matiere d'ingenierie pour la conception des systemes, la qualite et l'excellence operationnelle
Instaure la confiance grace a ses connaissances techniques approfondies, son jugement serein et son leadership en matiere d'execution
Learn More
About Autodesk
Welcome to Autodesk! Amazing things are created every day with our software - from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.
We take great pride in our culture here at Autodesk - it's at the core of everything we do. Our culture guides the way we work and treat each other, informs how we connect with customers and partners, and defines how we show up in the world.
When you're an Autodesker, you can do meaningful work that helps build a better world designed and made for all. Ready to shape the world and your future? Join us!
Benefits
From health and financial benefits to time away and everyday wellness, we give Autodeskers the best, so they can do their best work. Learn more about our benefits in the U.S. by visiting https://benefits.autodesk.com/
Salary transparency
Equal Employment Opportunity
At Autodesk, we're building a diverse workplace and an inclusive culture to give more people the chance to imagine, design, and make a better world. Autodes...
About Autodesk
Sourced by ZipRecruiter
Autodesk is changing how the world is designed and made. Our technology spans architecture, engineering, construction, product design, manufacturing, media, and entertainment, empowering innovators everywhere to solve challenges big and small. From greener buildings to smarter products to more mesmerizing blockbusters, Autodesk software helps our customers to design and make a better world for all. For more information visit autodesk.com or follow @autodesk.
Industry
Software development
Company size
10,000+ Employees
Headquarters location
San Rafael, CA, US
Year founded
1982