1

Pinecone Vector Databases Jobs in Indiana (NOW HIRING)

Vector databases such as pgvector, Chroma, Pinecone, Weaviate, or Qdrant. * Docker and containerized deployments. * Kubernetes orchestration. * Azure AI infrastructure and GPU environments. * CI/CD ...

Pinecone Vector Databases information

What is a Pinecone Vector Database?

A Pinecone Vector Database is a cloud-based service designed to efficiently store, index, and search high-dimensional vector data, such as embeddings generated by machine learning models. It enables fast similarity search, making it ideal for use cases like semantic search, recommendation systems, and AI-powered applications. Pinecone handles the complexity of scaling and managing vector data, so developers can focus on building intelligent applications without worrying about infrastructure.

What are the key skills and qualifications needed to thrive as a Pinecone Vector Database Engineer, and why are they important?

To thrive as a Pinecone Vector Database Engineer, you need a strong background in computer science, data engineering, and experience with large-scale distributed systems, often supported by a relevant degree or equivalent experience. Proficiency in Python, REST APIs, cloud platforms (AWS, GCP), and vector search technologies, along with familiarity with Pinecone’s SDK and database management, are commonly required. Strong analytical thinking, problem-solving abilities, and effective communication skills help you collaborate with cross-functional teams and deliver scalable solutions. These skills ensure robust database performance, efficient data retrieval, and successful integration of vector search capabilities into real-world applications.

What are some common challenges faced by engineers working with Pinecone Vector Databases, and how can they be addressed?

Engineers working with Pinecone Vector Databases often encounter challenges such as optimizing vector search performance at scale, ensuring data consistency across distributed systems, and integrating the database with various machine learning pipelines. Addressing these challenges typically involves tuning indexing parameters, monitoring resource utilization, and collaborating closely with data scientists to understand retrieval requirements. Regularly reviewing documentation and participating in community forums can also help engineers stay current with best practices and new features.

What is the difference between Pinecone Vector Databases vs Data Engineers?

AspectPinecone Vector DatabasesData Engineers
Primary RoleManaging and deploying vector database solutions for AI/ML applicationsDesigning, building, and maintaining data pipelines and infrastructure
Skills & CertificationsKnowledge of vector databases, cloud platforms, programming (Python, SQL)Data modeling, ETL processes, cloud services, programming (Python, Java)
Work EnvironmentTech companies, AI startups, cloud providersData-driven organizations, tech firms, finance, healthcare

While Pinecone Vector Databases specialists focus on deploying and managing vector database solutions for AI applications, Data Engineers build and maintain the data infrastructure that supports these systems. Both roles require programming skills and familiarity with cloud platforms, but their core responsibilities differ: one centers on database management, the other on data pipeline development.

What are popular job titles related to Pinecone Vector Databases jobs in Indiana? For Pinecone Vector Databases jobs in Indiana, the most frequently searched job titles are:
Advisor - Data Architect, Data Foundry

Advisor - Data Architect, Data Foundry

Eli Lilly and Company

Indianapolis, IN • On-site

Full-time

Posted 11 days ago


Eli Lilly and Company rating

8.8

Company rating: 8.8 out of 10

Based on 62 frontline employees who took The Breakroom Quiz

10th of 73 rated pharmaceutical


Job description

Job Summary:
Eli Lilly and Company is a global healthcare leader headquartered in Indianapolis, Indiana, focused on making life better for people around the world. They are seeking Data Architects to design and build the data infrastructure necessary for AI-native drug discovery, transforming raw scientific data into actionable insights for both scientists and AI agents.
Responsibilities:
• Design and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio.
• Define and maintain controlled vocabularies, metadata standards, and FAIR-compliant data frameworks in partnership with Preparedness4Insight.
• Implement semantic data standards (RDF, OWL, SPARQL) and ontology engineering practices to create interoperable, machine-readable scientific data.
• Design and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization.
• Build and optimize ETL/ELT pipelines using Spark, dbt, or similar tools to transform raw scientific data into analytical and ML-ready formats.
• Implement real-time and streaming data integration (Kafka, Kinesis, event-driven patterns) connecting LIMS, instruments, and lab automation systems to the data infrastructure.
• Design and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape.
• Architect specialized data solutions: array databases (TileDB) for genomics/imaging, document stores (MongoDB) for experimental records, and vector databases for embedding-based retrieval supporting ML and RAG workflows.
• Build query and traversal patterns that enable scientists and AI agents to ask relational questions across the entire data landscape.
• Partner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented.
• Collaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation.
• Work with Tech@Lilly to define scaling strategies, ensure enterprise compliance, and transition data architectures to production-grade management.
• Contribute to build-versus-buy-versus-adopt decisions by evaluating commercial and open-source data platforms against Data Foundry requirements.
Qualifications:
Required:
• M.S. or PhD in Computer Science, Data Science, Bioinformatics, Computational Biology, Information Science, or related STEM field
• MS (with 6+ years) and PhD (with 2+ years) of data architecture, data engineering, or scientific informatics experience.
• Deep expertise in at least one of the focus areas: relational databases, data modeling and ontology engineering, data platform and lakehouse architecture (Databricks, Snowflake, Spark), or knowledge graph and specialized database systems (Neo4j, Neptune, MongoDB, TileDB)
Preferred:
• Working familiarity with multiple database paradigms — relational, graph, document, columnar, key-value — and strong SQL skills.
• Understanding of scientific data types and experimental workflows in life sciences or pharma (chemical, biological, HTE data).
• Strong communication skills with ability to translate data architecture concepts for both technical and scientific audiences.
• Familiarity with cloud platforms (AWS, Azure, or GCP) and modern data integration patterns.
• Pharmaceutical or biotech research industry experience, particularly in discovery data management or research informatics.
• Experience with semantic web technologies: RDF, OWL, SPARQL, Protégé, or equivalent ontology engineering tools.
• Hands-on experience with graph databases (Neo4j, Neptune, TigerGraph) and knowledge graph design patterns for scientific data.
• Data lakehouse architecture experience: Databricks (Delta Lake, Unity Catalog), Snowflake, or equivalent; ETL/ELT with Spark, dbt.
• Experience with streaming/real-time data platforms (Kafka, Kinesis, Flink) and event-driven architectures.
• Familiarity with LIMS, ELN systems (e.g., Benchling), and laboratory instrument data integration.
• Experience with vector databases (Pinecone, Weaviate, pgvector) and embedding-based retrieval for ML/RAG applications.
• Array database experience (TileDB, Zarr) for genomics, imaging, or high-dimensional scientific data.
• FAIR data principles implementation experience and Data Readiness Level frameworks.
• Scientific data standards and controlled vocabularies in chemistry (InChI, SMILES) or biology (Gene Ontology, UniProt).
• Experience with C, C++, or Rust for performance-critical data processing; familiarity with HPC data I/O patterns for large-scale scientific computations.
Company:
We're a medicine company turning science into healing to make life better for people around the world. Founded in 1876, the company is headquartered in Indianapolis, USA, with a team of 10001+ employees. The company is currently Late Stage.

What Eli Lilly and Company employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


Eli Lilly logo

About Eli Lilly

Sourced by ZipRecruiter

Eli Lilly, based in Indianapolis, IN, US, is one of the pioneers in the pharmaceutical industry with a rich history dating back to 1876. This global pharmaceutical company focuses on discovering, developing, manufacturing and selling pharmaceutical products in approximately 120 countries. The company's product categories include endocrinology, oncology, cardiovascular, neuroscience, and immunology. Having invested over $9 billion in research and development in the past decade, Eli Lilly is also committed to creating high-quality medicines that meet real needs. As a recipient of several awards and recognitions, Eli Lilly is known for its focus on life-saving research and drug development. Their mission is to make medicines that help people live longer, healthier, and more active lives.

Industry

Pharmaceutical product wholesalers

Company size

10,000+ Employees

Headquarters location

Indianapolis, IN, US

Year founded

1876