Tensor parallelism and large model scaling * CUDA, NCCL, GPU architecture * GPU partitioning & optimization (MIG) * Kubernetes & ML Serving * Kubernetes-based ML serving platforms * KServe, OpenShift ...
Tensor parallelism and large model scaling * CUDA, NCCL, GPU architecture * GPU partitioning & optimization (MIG) * Kubernetes & ML Serving * Kubernetes-based ML serving platforms * KServe, OpenShift ...
Tensor parallelism and large model scaling * CUDA, NCCL, GPU architecture * GPU partitioning & optimization (MIG) Kubernetes & ML Serving * Kubernetes-based ML serving platforms * KServe, OpenShift ...
New
Tensor parallelism and large model scaling * CUDA, NCCL, GPU architecture * GPU partitioning & optimization (MIG) Kubernetes & ML Serving * Kubernetes-based ML serving platforms * KServe, OpenShift ...
New
Senior GenAI Research Engineer - Optimization and Kernels
San Francisco, CA · On-site
$123K - $169K/yr
... tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations • Profile, debug, and optimize end-to-end training workflows to identify ...
Senior GenAI Research Engineer - Optimization and Kernels
San Francisco, CA · On-site
$123K - $169K/yr
... tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations • Profile, debug, and optimize end-to-end training workflows to identify ...
FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance * Nsight profiling, Triton/CUDA kernels, fused ops. * Flash-attention-style speedups, sequence packing, KV-cache tricks.
FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance * Nsight profiling, Triton/CUDA kernels, fused ops. * Flash-attention-style speedups, sequence packing, KV-cache tricks.
On-prem Platform Engineer
Charlotte, NC · Hybrid
$80/hr
Tensor parallelism and large model scaling * CUDA, NCCL, GPU architecture * GPU partitioning & optimization (MIG) Kubernetes & ML Serving * Kubernetes-based ML serving platforms * KServe, OpenShift ...
On-prem Platform Engineer
Charlotte, NC · Hybrid
$80/hr
Tensor parallelism and large model scaling * CUDA, NCCL, GPU architecture * GPU partitioning & optimization (MIG) Kubernetes & ML Serving * Kubernetes-based ML serving platforms * KServe, OpenShift ...
Scientific Software Engineer - Emulation & Application
Boston, MA · On-site
$120/hr
Develop, test, and optimize core components of QuEra's in-house quantum hardware emulator, including state-vector, tensor-network, and pulse-level simulation backends. * Write performance-critical ...
Scientific Software Engineer - Emulation & Application
Boston, MA · On-site
$120/hr
Develop, test, and optimize core components of QuEra's in-house quantum hardware emulator, including state-vector, tensor-network, and pulse-level simulation backends. * Write performance-critical ...
Data Scientist
Passaic, NJ · On-site
Data Scientist With Pytorch / Tensor Flow / Bert, Deep Learning/Neural Network 7 to 11 years of experience in data science or engineering. Develop and maintain scalable data pipelines. Build ...
Data Scientist
Passaic, NJ · On-site
Data Scientist With Pytorch / Tensor Flow / Bert, Deep Learning/Neural Network 7 to 11 years of experience in data science or engineering. Develop and maintain scalable data pipelines. Build ...
Research Assistant Professor Earthquake Source
Austin, TX · On-site
$85K/yr
Comparing different methods and tools for moment tensor inversion and first motion to calculate fault plane solutions. - Engaging in outside funding activities and promote programs with stakeholders ...
Research Assistant Professor Earthquake Source
Austin, TX · On-site
$85K/yr
Comparing different methods and tools for moment tensor inversion and first motion to calculate fault plane solutions. - Engaging in outside funding activities and promote programs with stakeholders ...
Advanced experience in a tensor/array computation library like PyTorch, TensorFlow, Jax, or similar * A detailed understanding of transformer training parallelism strategies like data parallelism ...
Advanced experience in a tensor/array computation library like PyTorch, TensorFlow, Jax, or similar * A detailed understanding of transformer training parallelism strategies like data parallelism ...
Research Engineer Graduate (AI Training Systems & RL Infrastructure - Seed Infra) - 2026 Start (PhD)
San Jose, CA · On-site
$126K - $165K/yr
... data/model/tensor/pipeline/expert parallelism, computation-communication overlap, and large-scale GPU cluster scaling. • Prototype and improve end-to-end reinforcement learning (RL) training ...
Research Engineer Graduate (AI Training Systems & RL Infrastructure - Seed Infra) - 2026 Start (PhD)
San Jose, CA · On-site
$126K - $165K/yr
... data/model/tensor/pipeline/expert parallelism, computation-communication overlap, and large-scale GPU cluster scaling. • Prototype and improve end-to-end reinforcement learning (RL) training ...
AI Architect - Poway, CA
Poway, CA · On-site
Python (PyTorch/Tensor Flow) deployed on Red Hat Enterprise Linux (RHEL) tactical edge servers. * Specific GA Tech: Familiarity with DDS (Data Distribution Service) middleware (RTI Connext) used in ...
AI Architect - Poway, CA
Poway, CA · On-site
Python (PyTorch/Tensor Flow) deployed on Red Hat Enterprise Linux (RHEL) tactical edge servers. * Specific GA Tech: Familiarity with DDS (Data Distribution Service) middleware (RTI Connext) used in ...
Scientific Software Engineer - Emulation & Application
Boston, MA · On-site
$120/hr
Develop, test, and optimize core components of QuEra's in-house quantum hardware emulator, including state-vector, tensor-network, and pulse-level simulation backends. * Write performance-critical ...
Scientific Software Engineer - Emulation & Application
Boston, MA · On-site
$120/hr
Develop, test, and optimize core components of QuEra's in-house quantum hardware emulator, including state-vector, tensor-network, and pulse-level simulation backends. * Write performance-critical ...
Post-Training Research Engineer
San Francisco, CA · On-site
$200K - $275K/yr
Advanced experience in a tensor/array computation library like PyTorch, TensorFlow, Jax, or similar * A detailed understanding of transformer training parallelism strategies like data parallelism ...
Post-Training Research Engineer
San Francisco, CA · On-site
$200K - $275K/yr
Advanced experience in a tensor/array computation library like PyTorch, TensorFlow, Jax, or similar * A detailed understanding of transformer training parallelism strategies like data parallelism ...
Your work will span the entire stack-from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization. Strong candidates will have a track record of delivering ...
Your work will span the entire stack-from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization. Strong candidates will have a track record of delivering ...
Advanced Machine Learning skills in Python or GCP products like Tensor Flow (Mandatory.) * Dashboard development skills (Mandatory.) * Autonomy and independence (Strongly desired.) * Communication ...
Advanced Machine Learning skills in Python or GCP products like Tensor Flow (Mandatory.) * Dashboard development skills (Mandatory.) * Autonomy and independence (Strongly desired.) * Communication ...
Machine Learning Performance Engineer
New York, NY · On-site
$153K/yr
Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores and the memory hierarchy * Debugging and optimisation experience using tools like CUDA GDB, NSight Systems, NSight ...
Machine Learning Performance Engineer
New York, NY · On-site
$153K/yr
Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores and the memory hierarchy * Debugging and optimisation experience using tools like CUDA GDB, NSight Systems, NSight ...
Machine Learning Performance Engineer
New York, NY · On-site
$153K/yr
Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy * Debugging and optimization experience using tools like CUDA GDB, NSight Systems, NSight ...
Machine Learning Performance Engineer
New York, NY · On-site
$153K/yr
Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy * Debugging and optimization experience using tools like CUDA GDB, NSight Systems, NSight ...
LLM Pre-training & Distributed Engineer (AI Infrastructure)
Seattle, WA · On-site
$122K - $160K/yr
Deep expertise in 3D parallelism (Data, Tensor, Pipeline). * Experience managing SLURM or Kubernetes-based GPU clusters. * Strong systems engineering background (C++, CUDA, Python)
LLM Pre-training & Distributed Engineer (AI Infrastructure)
Seattle, WA · On-site
$122K - $160K/yr
Deep expertise in 3D parallelism (Data, Tensor, Pipeline). * Experience managing SLURM or Kubernetes-based GPU clusters. * Strong systems engineering background (C++, CUDA, Python)
Machine Learning Performance Engineer
New York, NY · On-site
$153K/yr
Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy * Debugging and optimization experience using tools like CUDA GDB, NSight Systems, NSight ...
Machine Learning Performance Engineer
New York, NY · On-site
$153K/yr
Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy * Debugging and optimization experience using tools like CUDA GDB, NSight Systems, NSight ...
LLM Pre-training & Distributed Engineer (AI Infrastructure)
San Francisco, CA · On-site
$126K - $166K/yr
Deep expertise in 3D parallelism (Data, Tensor, Pipeline). * Experience managing SLURM or Kubernetes-based GPU clusters. * Strong systems engineering background (C++, CUDA, Python)
LLM Pre-training & Distributed Engineer (AI Infrastructure)
San Francisco, CA · On-site
$126K - $166K/yr
Deep expertise in 3D parallelism (Data, Tensor, Pipeline). * Experience managing SLURM or Kubernetes-based GPU clusters. * Strong systems engineering background (C++, CUDA, Python)
Tensor information
See salary details
$46K - $64K
1% of jobs
$64K - $81.9K
2% of jobs
$81.9K - $99.9K
4% of jobs
$99.9K - $117.8K
9% of jobs
$133.1K is the 25th percentile. Wages below this are outliers.
$117.8K - $135.8K
11% of jobs
$135.8K - $153.7K
7% of jobs
The median wage is $159.5K / yr.
$153.7K - $171.7K
50% of jobs
$171.7K - $189.6K
2% of jobs
$189.6K - $207.6K
1% of jobs
$207.6K - $225.5K
0% of jobs
$225.5K - $243.5K
13% of jobs
$46K
$165K
$243.5K
How much do tensor jobs pay per year?
What is a Tensor job?
What are the key skills and qualifications needed to thrive as a Tensor?
What are some common challenges faced by TensorFlow Developers when working on large-scale machine learning projects?
What are Tensor jobs?
What is the difference between Tensor vs Data Scientist?
| Aspect | Tensor | Data Scientist |
|---|---|---|
| Required Credentials | Knowledge of machine learning, programming skills, often a degree in computer science or related fields | Degree in statistics, computer science, or related fields; strong analytical skills |
| Work Environment | Tech companies, AI research labs, software development teams | Business, finance, healthcare, and tech industries analyzing data to inform decisions |
| Industry Usage | Primarily in AI, machine learning, and deep learning projects | Across industries for data analysis, predictive modeling, and insights |
While a Tensor is a fundamental data structure used in machine learning frameworks like TensorFlow, a Data Scientist analyzes data to extract insights and build models. Tensors are tools that Data Scientists often work with, but they are not roles themselves. Understanding tensors is essential for Data Scientists involved in AI and machine learning projects.

Other
This job post has expired today. Applications are no longer accepted.
Job description
Location: Charlotte, NC
Key SkillsMust-Have Skills (Mandatory Keywords)
- LLM Inference & Optimization
- vLLM, TensorRT-LLM, Triton Inference Server, SGLang
- Inference optimization techniques:
- Continuous batching
- Speculative decoding
- KV cache / Prefix caching
- Model optimization:
- FP8, AWQ, GPTQ
- Distributed & GPU Systems
- Tensor parallelism and large model scaling
- CUDA, NCCL, GPU architecture
- GPU partitioning & optimization (MIG)
- Kubernetes & ML Serving
- Kubernetes-based ML serving platforms
- KServe, OpenShift AI
- Helm charts, Operators, platform automation
- GPU Orchestration
- Run:AI or similar GPU scheduling/orchestration platforms
- Multi-tenant GPU workload management
- Platform Engineering
- Experience building internal AI/ML platforms (on-prem or hybrid)
- Strong automation and system design mindset
- Observability & Performance
- Prometheus, Grafana
- ML observability (model latency, throughput, drift, resource utilization)
- Performance benchmarking and tuning
Good to Have / Preferred Skills
- Experience with LLMOps / GenAI pipelines
- Exposure to hybrid cloud (on-prem + GCP/Azure integration)
- Familiarity with Inferentia / alternative accelerators
- Knowledge of service mesh / networking in GPU clusters
Build, configure, and operate on‐prem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.
Design and optimize high‐performance inference stacks using vLLM, TensorRT‐LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).
Manage GPU orchestration and capacity using Run:AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.
Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.
Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.
Implement observability and ML monitoring using Prometheus, Grafana, Arize AI, ensuring SLA/SLO compliance for GenAI services.
Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize GenAI use cases.