The Pinterest Labs group is dedicated to the development and research of applied machine learning. Our initiatives span a diverse range of AI/ML fields, including fundamental computer vision, multimodal large language models, multimodal representation learning, generative modeling, heterogeneous graph neural networks, and recommender systems. By building foundation ML models that utilize our extensive knowledge graph and billions of Pins, we aim to significantly enhance the core Pinterest product.
Our visual modeling team is currently seeking new members to focus on the advancement of vision-centric LLMs. We are building VLMs capable of perceiving intricate visual details and understanding user aesthetics to facilitate communication through visual assets using tools like multimodal search and text-to-image models. This role offers the opportunity to work with Pinterest's unique visual-text datasets to develop large-scale generative models for production. You will join the core visual pod, a collaborative group of approximately six engineers and a product prototyping team, to create specialized evaluation benchmarks and contribute to the broader research community.
What you'll do:
- Prototype new model architectures for Pinterest VLMs. We're looking for hands-on experience working with finetuning open-source LLM models and improve their visual perception and tool using capabilities.
- Develop new evaluation benchmarks that tailors to vision-centric capabilities such as fashion style recommendations.
- Read research papers, participate in group discussions, and help brainstorm our overall visual generative strategy at the company.
- Help with collection of relevant visual training data for Pinterest Canvas, particularly to conduct RLHF, targeted fine-tuning, etc.
- Publish and publicize your work via conferences, paper submissions, blog posts, etc.
- Mentor more junior researchers or research interns within the Pinterest Labs organization.
What we're looking for:
- Research engineers and scientists who have experience working with generative computer vision models, preferably various forms of visual encoders and LLMs.
- 2+ years of industry computer vision experience.
- M.S. or PhD in Machine Learning, Computer Science, or related areas.
Nice to Have:
- Publications at top ML conferences.
- Experience using Cursor, Copilot, Codex, or similar AI coding assistants for development, debugging, testing, and refactoring.
- Familiarity with LLM-powered productivity tools for documentation search, experiment analysis, SQL/data exploration, and engineering workflow acceleration.
In-Office Requirement Statement:
- We let the type of work you do guide the collaboration style. That means we're not always working in an office, but we continue to gather for key moments of collaboration and connection.
- This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.
Relocation Statement:
- This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LI-AK7
#LI-Remote