Job Summary:
Neurophos is a pioneering company focused on redefining AI computing through innovative optical architecture. They are seeking a Staff Machine Learning Architect to lead the adaptation and optimization of machine learning models for their advanced optical inference engines, bridging the gap between cutting-edge research and practical hardware applications.
Responsibilities:
โข Lead the porting of LLM applications, diffusion models, and visual ML applications to Neurophos optical inference engines
โข Adapt models from diverse sources, including GitHub, Hugging Face, other open-source repositories, and customer private models
โข Work with models in various formats, including PyTorch, Triton, JAX, and emerging frameworks
โข Develop and implement quantization strategies to migrate models from higher precision formats (FP8, INT8, and above) to our optimized 4-bit precision (FP4/INT4) for weights and activations
โข Design and execute re-quantization, retraining, and other model adaptation techniques to minimize accuracy loss during precision reduction
โข Create or integrate third-party tools and workflows for efficient model porting and optimization
โข Optimize GEMM operations for high-throughput execution
โข Develop benchmarking methodologies to measure and validate model quality post-porting, including perplexity metrics and other quality indicators
โข Collaborate with hardware and software teams to co-optimize model architectures for optical compute characteristics
โข Publish research papers on novel optimization techniques and methodologies (with appropriate IP protection)
Qualifications:
Required:
โข MS or PhD in Computer Science, Data Science, Machine Learning, Mathematics, or related field
โข 7+ years of experience in machine learning engineering with at least 3 years focused on model optimization and deployment
โข Deep expertise in neural network quantization techniques, including post-training quantization (PTQ) and quantization-aware training (QAT)
โข Strong proficiency in PyTorch and familiarity with other ML frameworks (JAX, Triton, TensorFlow)
โข Hands-on experience with transformer architectures, LLMs, and diffusion models
โข Experience with low-precision inference optimization (INT8, FP8, or lower)
โข Strong understanding of GEMM operations and linear algebra optimizations for deep learning
โข Experience with model evaluation metrics, including perplexity, accuracy, and benchmark suites
โข Track record of successfully deploying ML models on specialized hardware accelerators
โข Excellent communication skills with the ability to collaborate across hardware and software teams
Preferred:
โข Experience with sub-8-bit quantization (INT4, FP4) and mixed-precision inference
โข Familiarity with Hugging Face Transformers library and model hub ecosystem
โข Experience with ONNX, TensorRT, or other model optimization frameworks
โข Background in analog or optical computing architectures
โข Knowledge of in-memory computing paradigms and matrix-vector multiplication acceleration
โข Published research in model compression, quantization, or efficient inference
โข Experience with large-scale batch inference optimization
โข Familiarity with prefill vs. decode optimization strategies in LLM inference
Company:
Neurophos develops photonic AI processing technology that focuses on hardware solutions for accelerating artificial intelligence inference. Founded in 2020, the company is headquartered in Austin, USA, with a team of 11-50 employees. The company is currently Early Stage.