2

Remote Hpc System Engineer Jobs in Nevada (NOW HIRING)

Monitor system performance and troubleshoot infrastructure issues across compute, storage, and ... Flexibility & Remote Opportunities - Whether in-office, hybrid, or fully remote, we offer the ...

Machine Learning Systems Engineer

Las Vegas, NV ยท On-site +1

$144K - $192K/yr

... be fully remote. The salary range for this role is an estimate based on a wide range of ... All newly-hired employees are queried through this electronic system established by the DHS and the ...

Machine Learning Systems Engineer

Las Vegas, NV ยท On-site

$144K - $192K/yr

... be fully remote. The salary range for this role is an estimate based on a wide range of ... All newly-hired employees are queried through this electronic system established by the DHS and the ...

We are open to remote candidates. In this role, the Environmental Engineer is responsible for ... Oversee and provide technical expertise for environmental systems, including wastewater treatment ...

We are open to remote candidates. In this role, the Environmental Engineer is responsible for ... Oversee and provide technical expertise for environmental systems, including wastewater treatment ...

next page

Showing results 1-20

Remote Hpc System Engineer information

What are the key skills and qualifications needed to thrive as a Remote HPC System Engineer, and why are they important?

To thrive as a Remote HPC System Engineer, you need expertise in Linux system administration, parallel computing, networking, and a degree in computer science or related field. Familiarity with job schedulers (like Slurm), cluster management tools, scripting languages (such as Python or Bash), and certifications like CompTIA Linux+ or Red Hat Certified Engineer are highly valuable. Strong problem-solving abilities, effective communication, and self-motivation are essential soft skills for remote collaboration and troubleshooting. These skills ensure the reliable operation, optimization, and scalability of HPC systems in distributed environments.

What are some common challenges faced by Remote HPC System Engineers, and how can they be managed effectively?

Remote HPC System Engineers often encounter challenges such as troubleshooting complex hardware or software issues without physical access, ensuring seamless system performance, and coordinating with geographically dispersed teams. These can be managed by leveraging strong remote monitoring tools, maintaining clear documentation, and establishing effective communication channels with on-site staff. Proactively scheduling regular system health checks and participating in virtual team meetings can also help address problems quickly and maintain high system reliability.

What is the difference between Remote Hpc System Engineer vs Remote Cloud Infrastructure Engineer?

AspectRemote Hpc System EngineerRemote Cloud Infrastructure Engineer
CredentialsTypically requires Linux certifications, HPC-specific trainingOften requires cloud platform certifications (AWS, Azure, GCP)
Work EnvironmentHigh-performance computing clusters, research labsCloud platforms, data centers, virtualized environments
Industry UsageResearch, scientific computing, academiaTech, finance, enterprise IT
Search/Comparison IntentUnderstanding HPC-specific roles vs cloud rolesComparing on-premise HPC vs cloud infrastructure

The Remote Hpc System Engineer focuses on managing and optimizing high-performance computing clusters, often in research or scientific environments. In contrast, the Remote Cloud Infrastructure Engineer specializes in designing and maintaining cloud-based infrastructure across various industries. While both roles require technical expertise in system management, their environments and certifications differ, catering to distinct operational needs.

What are Remote HPC System Engineers?

Remote HPC (High Performance Computing) System Engineers are IT professionals who design, implement, manage, and troubleshoot HPC systems and clusters from a remote location. They work with advanced computing infrastructure that supports scientific research, complex simulations, and large-scale data processing. Their responsibilities include configuring hardware and software, monitoring system performance, ensuring security, and providing technical support to users, all while working off-site. This role requires strong expertise in HPC technologies, operating systems like Linux, networking, and scripting, as well as effective communication skills for collaborating with distributed teams.
What cities in Nevada are hiring for Remote Hpc System Engineer jobs? Cities in Nevada with the most Remote Hpc System Engineer job openings:
Machine Learning Systems Engineer

Machine Learning Systems Engineer

Motional

Las Vegas, NV โ€ข On-site, Remote

Other

Posted 14 days ago


Job description

Mission Summary:

We are looking for a Machine Learning Systems Engineer to join our ML Acceleration team. In this role, you will be responsible for the core systems that enable our researchers to train frontier models at scale, focusing obsessively on speed, cost, reliability, and throughput. You will work at the intersection of machine learning research and high-performance systems engineering. Your work will directly impact our ability to scale large-scale distributed model training and reduce the time-to-convergence for our next generation of models.

What you'll be doing:

  • Performance Profiling & Optimization: Utilize profiling tools (e.g., Nsight, PyTorch Profiler) to identify bottlenecks in data loading, gradient computation, and communication. Implement optimizations like kernel fusion, sharding, and tiling to improve step time.
  • Distributed Training: Optimize distributed training pipelines using frameworks such as PyTorch Distributed.
  • Kernel Development: Design and maintain high-performance GPU kernels in Triton or CUDA for state-of-the-art ML workloads.
  • Data Pipeline Engineering: Optimize robust data loading pipelines that maximize training throughput.

What we're looking for:

  • Education: Bachelor's, Master's degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Software Engineering: Strong proficiency in Python.
  • ML Frameworks: Extensive hands-on experience with PyTorch.
  • ML Knowledge: Experience optimizing machine learning model execution during training and inference, alongside a strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Problem Solving: Exceptional analytical and problem-solving skills, with a bias for action and a data-driven approach to technical challenges.

We encourage a hybrid schedule with in-office time at one of our locations in Boston, Pittsburgh, or Las Vegas to support collaboration, or this role can be fully remote.