Job Summary:
MeshyAI is a leading 3D generative AI company headquartered in Silicon Valley, focused on transforming the content creation pipeline. The AI Infrastructure Engineer role involves ensuring the reliability and scalability of the AI model serving stack, while developing core engineering infrastructure to connect models to product systems.
Responsibilities:
• Responsible for the design, development, and optimization of core capabilities for the AI inference platform, including key modules such as inference services, task scheduling, service orchestration, elastic scaling, and release governance.
• Participate in the development of CPU/GPU resource management systems to optimize stability, resource utilization, and cost efficiency in scenarios where online inference and training tasks are run on the same cluster.
• Drive the unified management and scheduling of GPU resources, and explore the practical implementation of capabilities such as MIG, MPS, time-sharing, and virtualization in real-world business operations.
• Continuously optimize the throughput, latency, and availability of the inference pipeline, refining engineering quality in complex inference pipelines, multi-model collaboration, and high-concurrency scenarios.
• Focus on R&D efficiency, resource and cost management, online stability, and disaster recovery architecture design to drive the company’s continuous evolution in performance, reliability, and maintainability.
• Explore AI-native infrastructure and automated operations to make infrastructure smarter and more user-friendly, supporting the company’s rapid expansion during its startup phase.
Qualifications:
Required:
• Bachelor’s degree or higher; majors in Computer Science, Software Engineering, Artificial Intelligence, Telecommunications, or related fields are preferred.
• 1 to 3 years of experience in backend development, infrastructure, cloud-native platforms, machine learning platforms, or AI platforms.
• Proficiency in at least one of Go or Python, with solid software engineering skills and a strong commitment to code quality.
• Understanding of fundamental principles in Linux, operating systems, computer networks, and distributed systems; ability to independently identify and resolve complex engineering issues.
• Practical development experience with Kubernetes, Docker, microservices, or distributed systems, with a basic understanding of production system stability.
• Real-world project experience in areas such as model inference, task orchestration, resource scheduling, and service stability—beyond mere conceptual understanding.
• Self-motivated, curious, and a fast learner; willing to take on greater ownership and broader responsibilities in a startup environment, while continuously learning and quickly adopting new technologies.
Preferred:
• Experience with GPU inference platforms, Kubernetes schedulers, Device Plugins, or related platform development.
• Familiarity with frameworks such as Ray and Ray Serve, or experience in developing and optimizing model serving, distributed inference, and task orchestration frameworks.
• Familiarity with solutions related to MIG, MPS, vGPU, partitioned GPUs, or GPU resource reuse, and experience balancing performance and stability.
• Engineering experience in observability, SRE, capacity planning, cost governance, canary deployments, and automated rollbacks.
• Open-source projects, technical blogs, personal projects (side projects), or other achievements that demonstrate learning agility and growth potential.
• Ongoing interest and hands-on experience in emerging areas such as AI infrastructure (AI Infra), inference systems, and AI agent toolchains.
Company:
Meet the world's most popular and intuitive free AI 3D model generator. Founded in , the company is headquartered in Sunnyvale, California, US, , with a team of 51-200 employees. The company is currently Growth Stage.