... T-caliber test development, and deep ownership of quality metrics. * Partner directly with hardware, firmware, and software engineers to ensure quality is embedded at every stage of development.
... T-caliber test development, and deep ownership of quality metrics. * Partner directly with hardware, firmware, and software engineers to ensure quality is embedded at every stage of development.
Software Analyst Intern Rail Pass Type: Internship/Co-op(Full-time/Hybrid) Departure and Arrival ... Test Infrastructure team to perform design, analysis, programming and integration activities in the ...
Software Analyst Intern Rail Pass Type: Internship/Co-op(Full-time/Hybrid) Departure and Arrival ... Test Infrastructure team to perform design, analysis, programming and integration activities in the ...
Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ... We are seeking highly motivated Software Engineering intern/co-op to join our team at AMD. In this ...
Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ... We are seeking highly motivated Software Engineering intern/co-op to join our team at AMD. In this ...
161 Bay Street (93021), Canada, Toronto,Toronto, Ontario, Intern, Full Stack Software Engineer ... You will also be involved in building infrastructure, feature development, testing (unit test ...
161 Bay Street (93021), Canada, Toronto,Toronto, Ontario, Intern, Full Stack Software Engineer ... You will also be involved in building infrastructure, feature development, testing (unit test ...
Software Engineer - AI-Native Product Builder Location : Toronto (Hybrid) or Remote within Canada ... in a fast-moving environment What Success Looks Like By the end of the internship, a strong intern ...
Quick apply
Software Engineer - AI-Native Product Builder Location : Toronto (Hybrid) or Remote within Canada ... in a fast-moving environment What Success Looks Like By the end of the internship, a strong intern ...
Aviya has an opening for a Software Engineer to join our engineering group in Montreal, Longueuil ... Expert-level experience in test authoring and execution using NI TestStand (National Instruments ...
Aviya has an opening for a Software Engineer to join our engineering group in Montreal, Longueuil ... Expert-level experience in test authoring and execution using NI TestStand (National Instruments ...
Do you enjoy working on scalable services in a collaborative team environment. Do you want to see ... Design, develop and test software components that interact with fulfillment center technologies ...
Do you enjoy working on scalable services in a collaborative team environment. Do you want to see ... Design, develop and test software components that interact with fulfillment center technologies ...
Software Engineer
Scarborough, ON · Hybrid
Is this role right for you? In this role you will: Software Development * Develop clean ... Build and test APIs using Swagger/OpenAPI, Postman, and automated tests. * Work with Node.js for ...
Software Engineer
Scarborough, ON · Hybrid
Is this role right for you? In this role you will: Software Development * Develop clean ... Build and test APIs using Swagger/OpenAPI, Postman, and automated tests. * Work with Node.js for ...
Proficient in context engineering principles and specification-driven development, with practical ... test generation and legacy code modernization * Excellent teamwork record * Strong leadership ...
Proficient in context engineering principles and specification-driven development, with practical ... test generation and legacy code modernization * Excellent teamwork record * Strong leadership ...
Software Engineer
Toronto, ON · On-site +1
Is this role right for you? In this role, you will: * Develop clean, maintainable, and ... Build and test APIs using Swagger/OpenAPI, Postman, and automated tests. * Work with Node.js for ...
Software Engineer
Toronto, ON · On-site +1
Is this role right for you? In this role, you will: * Develop clean, maintainable, and ... Build and test APIs using Swagger/OpenAPI, Postman, and automated tests. * Work with Node.js for ...
Software Analyst Intern (Fall 2026, 8 months)
Toronto, ON · On-site
CA$23 - CA$30/hr
Proficient in context engineering principles and specification-driven development, with practical ... test generation and legacy code modernization * Excellent teamwork record * Strong leadership ...
Software Analyst Intern (Fall 2026, 8 months)
Toronto, ON · On-site
CA$23 - CA$30/hr
Proficient in context engineering principles and specification-driven development, with practical ... test generation and legacy code modernization * Excellent teamwork record * Strong leadership ...
Software Engineer
Toronto, ON · On-site +1
Software Engineer - AI-Native Product Builder Location : Toronto (Hybrid) or Remote within Canada ... in a fast-moving environment What Success Looks Like By the end of the internship, a strong intern ...
Software Engineer
Toronto, ON · On-site +1
Software Engineer - AI-Native Product Builder Location : Toronto (Hybrid) or Remote within Canada ... in a fast-moving environment What Success Looks Like By the end of the internship, a strong intern ...
Promote reusable solutions, clear documentation, and collaboration across teams * 8+ years in DevOps, SDET, SRE, or Software Engineering * Strong understanding of enterprise SDLC and CI/CD practices
Promote reusable solutions, clear documentation, and collaboration across teams * 8+ years in DevOps, SDET, SRE, or Software Engineering * Strong understanding of enterprise SDLC and CI/CD practices
Software Engineer
Hamilton, ON · On-site
Headquartered in Hamilton, Ontario, Canada, we are a privately held company focused on building ... Develop and maintain testing strategies, including unit tests, integration tests, and test tooling ...
Quick apply
Software Engineer
Hamilton, ON · On-site
Headquartered in Hamilton, Ontario, Canada, we are a privately held company focused on building ... Develop and maintain testing strategies, including unit tests, integration tests, and test tooling ...
Software Analyst Intern Rail Pass Type: Internship/Co-op(Full-time/Hybrid) Departure and Arrival ... Experience working in a DevOps environment, including tools such as Git, Bitbucket, Jira, Jenkins ...
Software Analyst Intern Rail Pass Type: Internship/Co-op(Full-time/Hybrid) Departure and Arrival ... Experience working in a DevOps environment, including tools such as Git, Bitbucket, Jira, Jenkins ...
Bachelor's degree in electrical engineering, computer science, or a related field; advanced training in test software platforms preferred. * Minimum of 5 years of hands-on test engineering experience ...
Bachelor's degree in electrical engineering, computer science, or a related field; advanced training in test software platforms preferred. * Minimum of 5 years of hands-on test engineering experience ...
Bachelor's degree in electrical engineering, computer science, or a related field; advanced training in test software platforms preferred. * Minimum of 5 years of hands-on test engineering experience ...
Quick apply
Bachelor's degree in electrical engineering, computer science, or a related field; advanced training in test software platforms preferred. * Minimum of 5 years of hands-on test engineering experience ...
Write unit tests and help improve test coverage and overall code quality * Assist in debugging and ... Bachelor's degree in Computer Science, Software Engineering, or a related field, OR equivalent ...
Write unit tests and help improve test coverage and overall code quality * Assist in debugging and ... Bachelor's degree in Computer Science, Software Engineering, or a related field, OR equivalent ...
Perception Software Engineer
Toronto, ON · On-site
In this role, you will be responsible for designing, implementing, optimizing, and validating ... Develop, optimize, test, and maintain computer vision algorithms and tools related to camera ...
Perception Software Engineer
Toronto, ON · On-site
In this role, you will be responsible for designing, implementing, optimizing, and validating ... Develop, optimize, test, and maintain computer vision algorithms and tools related to camera ...
... engineer to analyze, design, develop, and test software using continuous integration methods available in our organization. Joining Hitachi Rail as an intern is a fantastic opportunity to kickstart ...
... engineer to analyze, design, develop, and test software using continuous integration methods available in our organization. Joining Hitachi Rail as an intern is a fantastic opportunity to kickstart ...
Software Engineer In Test Intern information

Director of Software Validation Engineering - ROCm
Thornhill, ON • On-site
Full-time
Posted 2 days ago
Advanced Micro Devices rating
8.4
Based on 7 frontline employees who took The Breakroom Quiz
22nd of 139 rated electronics manufacturers
Job description
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
THE TEAM
The ROCm software organization at AMD builds and maintains the open-source GPU software stack powering AI training, inference, and HPC workloads across AMD's data center and consumer GPU portfolio. ROCm is the foundation on which developers, researchers, and enterprises run their most demanding AI and HPC workloads. Quality and reliability are existential to our success. We operate at the intersection of cutting-edge hardware and software — and we move fast. Our team is deeply invested in open-source, community-driven development, and engineering excellence at every layer of the stack.
THE ROLEWe're looking for a hands-on Director of Test Engineering to lead and transform the quality function for ROCm. This is not a program management role — it's a deeply technical leadership position for someone who understands the hardware/software interface of GPUs, has built test engineering organizations from the ground up, and is ready to lead the next wave of AI-native, agentic quality engineering.
You will own the vision, strategy, and execution of test engineering for ROCm — from kernel-level driver validation to user-space ML framework testing. Critically, you will be the driving force behind scaling your team's impact through AI and agentic tooling, building a modern, autonomous quality organization that moves faster than any traditional QA team could.
THE IMPACT YOU WILL HAVE- Define and own the test engineering strategy for ROCm across the full HW/SW stack, from driver interfaces to ML framework validation.
- Transform the quality organization into an AI-first, agentic team — scaling coverage, speed, and reliability without proportional headcount growth.
- Build and operate continuous testing and validation infrastructure including long-running soak, stress, failure/recovery, and staging environments for product reliability.
- Raise the bar on test engineering discipline: shift-left practices, SDET-caliber test development, and deep ownership of quality metrics.
- Partner directly with hardware, firmware, and software engineers to ensure quality is embedded at every stage of development.
- Drive adoption of AI-assisted testing workflows, intelligent test selection, automated root cause analysis, and agentic CI/CD pipelines across the organization.
The ideal candidate is a technical leader who has built and scaled test engineering teams in complex, hardware-adjacent software environments. You are hands-on when it matters — able to prototype a test framework, debug a GPU driver failure, or design a validation architecture. You also understand how customers actually use the product: the AI inference and training workloads they run, the parallelism strategies they deploy, the performance they expect, and the failure modes they hit. That customer-workload knowledge is what separates a QA team that writes blackbox sanity checks from one that designs tests targeting the exact code paths real users exercise. You see AI agents not as a novelty but as the primary lever for scaling your team's output. You are impatient with manual, reactive QA and energized by building systems that catch bugs before humans even see them.
KEY RESPONSIBILITIES- Own the overall test engineering strategy and architecture for ROCm, spanning driver validation, runtime testing, compiler/toolchain quality, and ML framework integration — with test coverage designed around real customer workload patterns, not synthetic benchmarks.
- Lead, grow, and mentor a team of SDETs and test engineers, instilling SDET-level engineering discipline and a culture of automation-first quality.
- Architect and operate continuous testing/validation infrastructure: staging environments for soak testing, stress testing, failure injection, recovery validation, and long-duration reliability runs.
- Champion AI-first and agentic test engineering: drive adoption of LLM-assisted test generation, autonomous failure triage, intelligent test prioritization, and agentic CI/CD workflows.
- Hands-on prototyping of new test frameworks, validation tooling, and agentic testing pipelines — especially in early-stage or high-ambiguity situations.
- Define, track, and improve quality KPIs: test coverage, defect escape rate, time-to-detection, device utilization, and validation cycle time.
- Collaborate closely with hardware, firmware, and software engineering teams to ensure quality is integrated from design through release.
- Partner with DevOps and infrastructure teams to evolve the CI/CD pipeline with robust, scalable, GPU-aware test automation.
- Engage with the open-source ROCm community and external customers on quality feedback loops and reliability expectations, translating their workload patterns and failure reports into structured test coverage.
- Partner with compiler, runtime, and framework integration teams on numerical correctness validation — understanding shared scope boundaries and ensuring the test organization contributes meaningfully to catching precision regressions across floating-point formats and parallelism configurations.
- Establish and maintain HW/SW test automation for both Linux and Windows platforms across AMD's GPU product lines.
- 12+ years of experience in software engineering or test engineering, with significant experience in hardware-adjacent or systems-level software.
- 5+ years of engineering management, including building and scaling test engineering or SDET organizations.
- Deep hands-on expertise in test automation at scale — framework design, CI/CD pipeline development, and continuous validation systems.
- Demonstrated experience with hardware + software test automation, including HW bring-up, driver validation, or firmware/software co-testing.
- Strong understanding of GPU architecture or hardware/software interfaces (PCIe, memory subsystems, compute kernels, or equivalent).
- Experience designing and operating always-on test infrastructure: soak/stress environments, failure injection, and reliability/recovery validation pipelines.
- Proven track record of adopting and scaling AI or automation tooling to multiply team throughput.
- Python proficiency: able to write test automation, tooling, and scripted validation workflows independently.
- Practical understanding of how AI inference and training workloads are deployed on GPU hardware — including common parallelism strategies (tensor parallel, pipeline parallel, data parallel), serving configurations, and performance expectations — sufficient to translate customer use cases into targeted test coverage.
- Hands-on software development skills sufficient to prototype test frameworks, write automation tooling, and review SDET-level code.
- Direct experience with ROCm, CUDA, or GPU compute software stacks (runtime, compiler, ML frameworks).
- Experience integrating LLMs, AI agents, or agentic workflows into software development or test engineering processes.
- Expertise in open-source development practices and community-facing quality processes (GitHub Actions, open CI, etc.).
- Background in SDET or test engineering in a semiconductor, HPC, or AI infrastructure company.
- Experience with GPU-specific test challenges: non-determinism, thermal behavior, multi-device coordination, driver stability.
- Track record of shipping test frameworks or validation tools used across large engineering organizations.
- Familiarity with ML training/inference workload validation: throughput, latency, numerical stability across precision formats (FP32/BF16/FP8), and multi-GPU collective communication correctness.
- Experience with GPU profiling and trace analysis tooling (e.g., rocprof, omniperf, PyTorch profiler) to identify kernel-level performance and correctness anomalies.
- Familiarity with HIP, CUDA, or low-level GPU programming — sufficient to understand what is being tested at the runtime and kernel level, even if not writing kernels directly.
#LI-G11
#LI-HYBRID
Note: This role is intentionally scoped as a hands-on technical leadership position. Candidates whose primary background is program management or traditional QA management without deep engineering execution experience may not be the right fit.
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Qualifications:Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Education:UNAVAILABLEEmployment Type: FULL_TIMEAbout Advanced Micro Devices (AMD)
Sourced by ZipRecruiter
Industry
Computer and electronic product manufacturing and manufacturing
Company size
5,001 - 10,000 Employees
Headquarters location
Sunnyvale, CA, US