Job Summary:
The Associated Press is an independent global news organization dedicated to factual reporting. They are seeking a Machine Learning Engineer to shape how they build and scale machine learning systems, focusing on developing and optimizing ML inference systems that process large volumes of media data.
Responsibilities:
โข Design, build, and scale ML-powered inference systems that process large volumes of text, image, and video data to power news-based intelligence products.
โข Productionize and optimize state of the art models and inference pipelines. These models include, but are not limited to:
โข + DistilBERT for Named Entity Recognition (NER) over hundreds of thousands of search queries/day
โข + TransNetV2 for video shot boundary detection at scale for archival video as well as real-time
โข + SBERT for embedding generation from textual descriptions
โข + External multimodal APIs for image/video captioning
โข Support hybrid search architectures by defining embedding/re-ranking interfaces, evaluation metrics, and inference performance requirements; partner with search/platform engineers on index configuration, sharding, and cluster tuning.
โข Design and implement scalable data processing pipelines across hybrid CPU/GPU environments to handle millions of media assets.
โข Partner with MLOps and platform engineering to enable the deployment and operation of ML systems reliably, contributing to:
โข + Distributed inference architectures
โข + Cloud-based execution (e.g., AWS EC2, Batch, Lambda, SageMaker)
โข + Efficient resource utilization across workloads
โข Optimize inference latency and throughput across distributed workloads using cloud-based resources (AWS EC2, Batch, Lambda, SageMaker, etc.)
โข Build resilient asynchronous processing systems for large-scale workloads, ensuring:
โข + Reliability (retries, fault tolerance)
โข + Efficiency (caching, deduplication)
โข + Observability (metrics, logging, traceability)
โข Work closely with data scientists and product teams to iterate on models, improve performance, and deliver measurable impact in production.
Qualifications:
Required:
โข 8+ years of experience building production ML inference systems.
โข Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.
โข Experience with both TensorFlow (SavedModel, tf.data, XLA, TFLite) and PyTorch (TorchScript, ONNX, FastAPI/TorchServe)
โข Hands-on experience optimizing inference pipelines on AWS infrastructure, ideally across different types of media assets.
โข Experience with video frameworks/tools (e.g., FFmpeg), and working with large-scale frame-level inference.
โข Demonstrated experience monitoring and debugging model latency, memory, and pipeline throughput.
โข Experience with hybrid search architectures (BM25 + vector search + cross-encoder reranking).
โข Familiarity with OpenAI APIs or other foundation model providers.
โข Familiarity with open source HuggingFace LLMs.
โข Experience with data pipeline and workflow orchestration tools (e.g., Airflow)
Company:
The Associated Press is a source of independent newsgathering, supplying a steady stream of news to its members, and more. Founded in 1846, the company is headquartered in New York, USA, with a team of 1001-5000 employees. The company is currently Late Stage.