Job Summary:
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. They are looking for a skilled Software Engineer to join their Data Acquisition team, responsible for all aspects of data collection to support model training operations.
Responsibilities:
• Own and lead engineering projects in the area of data acquisition including web crawling, data ingestion, and search.
• Collaborate with other sub-teams, such as Data Processing, Architecture, and Scaling, to ensure smooth data flow and system operability.
• Work closely with the legal team to handle any compliance or data privacy-related matters.
• Develop and deploy highly scalable distributed systems capable of handling petabytes of data.
• Architect and implement algorithms for data indexing and search capabilities.
• Build and maintain backend services for data storage, including work with key-value databases and synchronization.
• Deploy solutions in a Kubernetes Infrastructure-as-Code environment and perform routine system checks.
• Conduct and analyze experiments on data to provide insights into system performance.
Qualifications:
Required:
• BS/MS/PhD in Computer Science or a related field.
• 4+ years of industry experience in software development.
• Strong expertise in large stateful distributed systems and data processing.
• Proficiency in Kubernetes, and Infrastructure-as-Code concepts.
• Willingness and enthusiasm for trying new approaches and technologies.
• Ability to handle multiple tasks and adapt to changing priorities.
• Strong communication skills, both written and verbal.
Preferred:
• Experience with large web crawlers a plus
Company:
OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation. Founded in 2015, the company is headquartered in San Francisco, USA, with a team of 1001-5000 employees. The company is currently Late Stage.