Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Senior ML Ops Engineer
Dallas, TX · On-site
$103K - $142K/yr
ML model optimization: quantization, pruning, speculative decoding, etc. * System-level performance ... Infrastructure as Code (IaC) * Big data technologies: Apache Spark, Hadoop * Awareness of ethical ...
Senior ML Ops Engineer
Dallas, TX · On-site
$103K - $142K/yr
ML model optimization: quantization, pruning, speculative decoding, etc. * System-level performance ... Infrastructure as Code (IaC) * Big data technologies: Apache Spark, Hadoop * Awareness of ethical ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Proficient with Claude Code at an advanced level -- fluent with sub-agents, MCP servers, hooks ... Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training ...
Message encoding/decoding. Encryption/decryption. Building third-party libraries, frameworks ... Java and C/C++ coding skills. Unit testing. Additional Information All your information will be ...
Message encoding/decoding. Encryption/decryption. Building third-party libraries, frameworks ... Java and C/C++ coding skills. Unit testing. Additional Information All your information will be ...
AI Research Engineer
New York, NY · On-site
$300K - $400K/yr
... decoding, and program synthesis. What Makes You A Great Fit: * PhD in CS/AI/ML (or equivalent research experience) with publications ideally in multi‑agent RL, agentic AI, or RL for language/code.
AI Research Engineer
New York, NY · On-site
$300K - $400K/yr
... decoding, and program synthesis. What Makes You A Great Fit: * PhD in CS/AI/ML (or equivalent research experience) with publications ideally in multi‑agent RL, agentic AI, or RL for language/code.
Sr. Firmware Engineer
Katy, TX · On-site
$103K - $136K/yr
... encoding and decoding, sensor and measurement integration, power management, and system-level ... Establish firmware development standards, coding practices, version control, and design ...
Quick apply
Sr. Firmware Engineer
Katy, TX · On-site
$103K - $136K/yr
... encoding and decoding, sensor and measurement integration, power management, and system-level ... Establish firmware development standards, coding practices, version control, and design ...
Powertrain Reverse Engineer
Foster City, CA · On-site
$120K - $135K/yr
Familiarity with AI-assisted coding tools (e.g., Cursor/Claude) - we use these to accelerate decoding * Prior work decoding OEM "black box" control modules Tools & Resources Provided TIS laptop ...
Powertrain Reverse Engineer
Foster City, CA · On-site
$120K - $135K/yr
Familiarity with AI-assisted coding tools (e.g., Cursor/Claude) - we use these to accelerate decoding * Prior work decoding OEM "black box" control modules Tools & Resources Provided TIS laptop ...
Member of Technical Staff - Inference
Palo Alto, CA · On-site
$180K - $440K/yr
GPU kernels, code generation. * Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics. * Experience with testing, benchmarking, and reliability ...
Member of Technical Staff - Inference
Palo Alto, CA · On-site
$180K - $440K/yr
GPU kernels, code generation. * Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics. * Experience with testing, benchmarking, and reliability ...
Research Engineer, Frontier Speculative Decoding
San Francisco, CA · On-site
$190K - $270K/yr
You are comfortable navigating complex code and contributing to its improvement. * Strong attention-to-detail in evaluating model checkpoints to ensure they meet strict quality, performance, and ...
Research Engineer, Frontier Speculative Decoding
San Francisco, CA · On-site
$190K - $270K/yr
You are comfortable navigating complex code and contributing to its improvement. * Strong attention-to-detail in evaluating model checkpoints to ensure they meet strict quality, performance, and ...
Powertrain Reverse Engineer
Foster City, CA · Hybrid
$120K - $135K/yr
Familiarity with AI-assisted coding tools (e.g., Cursor/Claude) -- we use these to accelerate decoding * Prior work decoding OEM "black box" control modules Tools & Resources Provided TIS laptop ...
Quick apply
Powertrain Reverse Engineer
Foster City, CA · Hybrid
$120K - $135K/yr
Familiarity with AI-assisted coding tools (e.g., Cursor/Claude) -- we use these to accelerate decoding * Prior work decoding OEM "black box" control modules Tools & Resources Provided TIS laptop ...
Member of Technical Staff - Inference
Palo Alto, CA · On-site
$180K - $440K/yr
GPU kernels, code generation. * Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics. * Experience with testing, benchmarking, and reliability ...
Quick apply
Member of Technical Staff - Inference
Palo Alto, CA · On-site
$180K - $440K/yr
GPU kernels, code generation. * Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics. * Experience with testing, benchmarking, and reliability ...
Inference Software Engineer
Cupertino, CA · On-site
Etched is building AI chips that are hard-coded for individual model architectures. They are ... such as speculative decoding, tree search, KV cache sharing, etc. • Implement distributed ...
Inference Software Engineer
Cupertino, CA · On-site
Etched is building AI chips that are hard-coded for individual model architectures. They are ... such as speculative decoding, tree search, KV cache sharing, etc. • Implement distributed ...
Senior Software Engineer, ML Infrastructure
$200K - $275K/yr
The engineer will own end-to-end systems spanning request scheduling, advanced decoding algorithms ... Contribute to technical design discussions, code reviews, and architectural decisions as a senior ...
Senior Software Engineer, ML Infrastructure
$200K - $275K/yr
The engineer will own end-to-end systems spanning request scheduling, advanced decoding algorithms ... Contribute to technical design discussions, code reviews, and architectural decisions as a senior ...
Hadoop Developer
Maryland City, MD · On-site
Whatever we do, whether it is decoding our clients' technology requirements, configuring the most ... code, test, and debug new software or provide complex enhancements to existing software using ...
Hadoop Developer
Maryland City, MD · On-site
Whatever we do, whether it is decoding our clients' technology requirements, configuring the most ... code, test, and debug new software or provide complex enhancements to existing software using ...
Senior Quantum AI Research Scientist, Applied Research
Santa Clara, CA · Hybrid
$115K - $147K/yr
... correction, decoding, calibration, and beyond. You will research and develop open AI models ... codes and hardware platforms, while collaborating with multi-functional teams across Product ...
Senior Quantum AI Research Scientist, Applied Research
Santa Clara, CA · Hybrid
$115K - $147K/yr
... correction, decoding, calibration, and beyond. You will research and develop open AI models ... codes and hardware platforms, while collaborating with multi-functional teams across Product ...
Senior ML Engineer - Agentic AI
Waltham, MA · On-site
$112K - $154K/yr
Terminal-Based AI Coding & Development * Work extensively inside AI-powered coding terminals ... Explore inference optimizations such as speculative decoding, constraint decoding, structured ...
Senior ML Engineer - Agentic AI
Waltham, MA · On-site
$112K - $154K/yr
Terminal-Based AI Coding & Development * Work extensively inside AI-powered coding terminals ... Explore inference optimizations such as speculative decoding, constraint decoding, structured ...
Coding Decoding information
See salary details
$16.83 - $18.84
7% of jobs
$18.84 - $20.85
12% of jobs
$21.81 is the 25th percentile. Wages below this are outliers.
$20.85 - $22.86
13% of jobs
$22.86 - $24.87
14% of jobs
The median wage is $25.62 / hr.
$24.87 - $26.88
13% of jobs
$26.88 - $28.89
9% of jobs
$28.89 - $30.90
6% of jobs
$31.32 is the 75th percentile. Wages above this are outliers.
$30.90 - $32.91
6% of jobs
$32.91 - $34.92
5% of jobs
$34.92 - $36.93
8% of jobs
$36.93 - $38.94
6% of jobs
$16
$27
$38
How much do coding decoding jobs pay per hour?
What are the key skills and qualifications needed to thrive in the Coding Decoding position, and why are they important?
To excel in a Coding Decoding role, candidates generally need strong programming skills, logical reasoning abilities, and a solid understanding of algorithms and data structures. Familiarity with languages such as Python, Java, or C++, along with experience using coding platforms and technical certifications like a Computer Science degree or coding bootcamp completion, are valuable assets. Attention to detail, problem-solving aptitude, and clear communication help individuals stand out in this position. These skills are crucial for effectively translating complex requirements into functional code and efficiently troubleshooting decoding challenges in a collaborative work environment.
What are the typical daily responsibilities of a Coding Decoding professional?
Coding Decoding professionals are commonly tasked with analyzing coding problems, developing algorithmic solutions, and translating requirements into efficient, readable code. On a daily basis, you may participate in code reviews, collaborate with team members to resolve technical issues, and optimize existing code for performance and scalability. You will also likely interact with project managers and other developers to ensure that your solutions align with overall project goals. This variety of responsibilities provides a dynamic work environment and frequent opportunities to grow your technical and collaborative skills.
What jobs pay $500,000 a year in the US?
Are coders still in demand?
Which 3 jobs will survive AI?
What is the hottest job in tech pays $775000 and has nothing to do with coding?
What is a Coding Decoding job?
A Coding Decoding job typically involves analyzing patterns, encrypting or decrypting data, and solving logical reasoning problems. It is commonly found in cybersecurity, software development, and competitive exams. Professionals in this field work on algorithms, cryptography, or logical puzzles to encode and interpret information. Strong problem-solving skills and logical reasoning are key to excelling in this role.
Other
Posted 4 days ago
Job description
About Us
GMI Cloud is a fast-growing AI infrastructure company backed by Headline VC and one of only seven cloud providers worldwide to earn NVIDIA's prestigious Reference Platform Cloud Partner designation. We operate 8 of our own GPU clusters across the U.S. and Asia, delivering a full spectrum of services from GPU compute to AI model inference API solutions. As an NVIDIA Reference Platform Cloud Partner, our infrastructure meets the highest standards for performance, security, and scalability in AI deployments. We empower AI startups and enterprises to "build AI without limits," providing everything they need to prototype, train, and deploy AI models quickly and reliably.
About this role
GMI Cloud is building the leading inference optimization solution and the most advanced token platform in the global token market — and we are hiring world-class Machine Learning Engineers to make GMI the new industry benchmark for LLM serving performance, cost efficiency, and production reliability.
This role is for engineers who want to live at the frontier of LLM inference systems. You will drive the research, validation, and productionization of the most advanced inference optimization techniques, and turn them into real competitive advantage over top open-source baselines (vLLM, SGLang, and so on). Our charter is not just to adopt what's published — it is to define the recipes, ship the optimizations, and contribute back to the community that the rest of the industry follows.
You will focus on B200-first optimization, with support for H200 evolution, across core domains including quantization, speculative decoding, KV cache and memory management, prefill/decode disaggregation, and system-level inference optimization. You will work closely with platform and infrastructure teams to transform cutting-edge ideas into measurable gains in latency, throughput, cost efficiency, and production scalability.
Key Responsibilities
- Drive frontier research and engineering in LLM inference optimization across one of the four focus tracks (Speculative Decoding, Quantization, PD Disaggregation, KV Cache & Memory) while contributing across the full optimization stack.
- Develop next-generation optimization strategies for large-scale LLM serving across model execution, runtime systems, and production inference platforms — with B200 as the primary target and H200 as a continuing platform.
- Advance state-of-the-art techniques in quantization (NVFP4 / MXFP4 / FP8, QAT), speculative decoding (EAGLE-3, MTP, DFlash, ModelOpt, SpecForge), KV cache & memory management (LMCache / HiCache / NV KVBM, paged attention, prefix-aware routing), and PD disaggregation (NVIDIA Dynamo, KV-aware router/planner, fault recovery).
- Drive system-level optimization across scheduling, batching, routing, gateway orchestration, adapter serving, and end-to-end inference efficiency.
- Build scalable optimization frameworks, performance methodologies, and benchmark infrastructure that allow GMI to stay ahead of the industry as models, hardware, and serving patterns evolve.
- Productionize cutting-edge ideas into real customer workloads — measured by TTFT, ITL, throughput, goodput, tail latency, quality, and unit token cost.
- Engage with and contribute to the open-source community (vLLM, SGLang, TensorRT-LLM, NVIDIA Dynamo / ModelOpt, FlashInfer, LMCache, etc.) — read upstream code, file issues, send PRs, and publish tech blogs and case studies.
- Collaborate closely with platform, infrastructure, and product teams to make inference optimization a core technical advantage of GMI Cloud.
Required Skills
- Strong hands-on experience with LLM inference systems and performance optimization on modern GPUs.
- Solid understanding of inference metrics and tradeoffs, including TTFT, ITL, throughput, goodput, tail latency, GPU utilization, memory efficiency, and quality/cost tradeoffs.
- Experience with one or more modern serving stacks such as SGLang, vLLM, TensorRT-LLM, NVIDIA Dynamo, or Triton.
- Deep familiarity with GPU-based inference, model serving architecture, and production bottlenecks around compute, memory bandwidth, KV-cache behavior, and scheduling.
- Demonstrable depth in at least one of the four focus areas: speculative decoding, quantization & precision, PD disaggregation, or KV cache & memory management.
- Strong experimentation skills: able to design benchmarks, interpret results, debug regressions, and produce actionable conclusions rather than isolated microbenchmark wins.
- Proficient with Claude Code at an advanced level — fluent with sub-agents, MCP servers, hooks, custom slash commands, and skills — with practical experience leveraging them for rapid iteration, profiling, observability, and performance debugging.
- Clear communication — able to explain technical tradeoffs to engineers and cross-functional stakeholders, and willing to publish results externally.
Preferred Qualifications
- 2+ years of hands-on experience in LLM inference optimization, ML systems optimization, or PhD degree in related areas.
- Track record of large-scale model serving optimization (latency reduction, throughput improvement, memory efficiency, cost-performance tuning) in production.
- Specific track depth in one or more of:
- Speculative Decoding: EAGLE-3 / MTP / DFlash / Medusa / SpecForge / ModelOpt; experience training and shipping draft models for production.
- Quantization & Precision: NVFP4 / MXFP4 / FP8 / INT4-AWQ / GPTQ; QAT pipelines on Blackwell or Hopper; rigorous accuracy benchmarking.
- PD Disaggregation: NVIDIA Dynamo, KV-aware router/planner, large MoE serving (DeepSeek-V3/V4, Kimi, GLM, Minimax), fault recovery, autoscaling.
- KV Cache & Memory: LMCache / HiCache / NV KVBM, paged attention internals, prefix-aware routing, long-context and agentic workloads.
- Familiarity with FlashInfer, Blackwell MLA, FA4, TRT-LLM MLA, or NSA is a strong plus.
- Open-source contributions to vLLM, SGLang, TensorRT-LLM, NVIDIA Dynamo / ModelOpt, FlashInfer, LMCache, or related projects.
- Experience publishing technical blogs, case studies, or papers on inference optimization.