Data Preprocessing Jobs (NOW HIRING)

Senior Machine Learning Engineer

$133K - $175K/yr

Data Preprocessing: Clean, transform, and prepare large, complex healthcare datasets for machine learning model development. This includes handling missing values, outlier detection, feature ...

C the Signs

Senior Machine Learning Engineer

Boston, MA · On-site +1

$133K - $175K/yr

Data Preprocessing: Clean, transform, and prepare large, complex healthcare datasets for machine learning model development. This includes handling missing values, outlier detection, feature ...

C the Signs

Senior Machine Learning Engineer

Boston, MA · On-site +1

$133K - $175K/yr

Data Preprocessing: Clean, transform, and prepare large, complex healthcare datasets for machine learning model development. This includes handling missing values, outlier detection, feature ...

C the Signs

Senior Machine Learning Engineer

Boston, MA · On-site +1

$133K - $175K/yr

Data Preprocessing: Clean, transform, and prepare large, complex healthcare datasets for machine learning model development. This includes handling missing values, outlier detection, feature ...

BTS Software Solutions

Maven Exploitation Specialist/ Data Scientist (Expert)

Springfield, VA

$210K - $230K/yr

Support data pipeline setup and maintenance for data preprocessing activities across all security domains. Desired Skills * Experience working with NGA IT enterprise solutions (e.g., NGA CORE, NGA ...

BTS Software Solutions

Maven Exploitation Specialist/ Data Scientist (Expert)

Springfield, VA

$210K - $230K/yr

BTS Software Solutions

Maven Exploitation Specialist Data Manager Senior with Security Clearance

Springfield, VA · On-site

$175K - $195K/yr

The Maven Exploitation Specialist/ Data Manager will provide technical and managerial leadership to maintain overhead satellite imagery preprocessing operations. The Data Manager shall provide ...

BTS Software Solutions

Maven Exploitation Specialist Data Manager Senior with Security Clearance

Springfield, VA · On-site

$175K - $195K/yr

Luck Stone Corporation

AI Engineer

VA · On-site

$106K - $128K/yr

Implement and maintain machine learning pipelines, from data preprocessing to model deployment. * Troubleshoot and resolve issues related to AI models, ensuring they meet the desired accuracy and ...

Luck Stone Corporation

AI Engineer

VA · On-site

$106K - $128K/yr

Tata Consultancy Services

Gen AI Developer

Seattle, WA · On-site

$57.25 - $78.75/hr

... data preprocessing, cleaning, and feature engineering using Python libraries (NumPy, Pandas). • Fine-tune and optimize transformer-based models and LLMs for specific use cases. • Evaluate model ...

Tata Consultancy Services

Gen AI Developer

Seattle, WA · On-site

$57.25 - $78.75/hr

Tata Consultancy Service Limited

Gen AI Developer

Seattle, WA · On-site

$120K - $150K/yr

Experience with data preprocessing and model fine-tuning. • Familiarity with evaluation metrics for RAG systems. • Knowledge of transformer architectures and training techniques. • Awareness of ...

Tata Consultancy Service Limited

Gen AI Developer

Seattle, WA · On-site

$120K - $150K/yr

rockITdata

Full Stack Data Scientist

Arlington, VA · Remote

Data Collection and Preprocessing: * Develop robust data pipelines for acquiring, cleaning, and preprocessing large-scale datasets from various sources. * Implement strategies for data quality ...

Quick apply

rockITdata

Full Stack Data Scientist

Arlington, VA · Remote

Spruce Infotech

IT - Technology Lead | Machine Learning | PYTHON

San Jose, CA · On-site

$164K - $202K/yr

Conduct data preprocessing, feature engineering, model evaluation, and performance tuning.Build robust, modular, and Minimum years of experience*: 8-10 years

Spruce Infotech

IT - Technology Lead | Machine Learning | PYTHON

San Jose, CA · On-site

$164K - $202K/yr

Conduct data preprocessing, feature engineering, model evaluation, and performance tuning.Build robust, modular, and Minimum years of experience*: 8-10 years

PDF Solutions

Applications Engineer (ML/Auto Defect Classification)

Milpitas, CA · On-site

$130K - $160K/yr

Data Proficiency: Experience handling large datasets and using tools like Pandas, NumPy and SQL for data preprocessing and feature engineering. * Problem Solving: Strong analytical mindset with the ...

PDF Solutions

Applications Engineer (ML/Auto Defect Classification)

Milpitas, CA · On-site

$130K - $160K/yr

Photon

Sr Data Engineer - Gen AI/ML - Tampa

Coral Springs, FL · On-site

Skills in data preprocessing and feature engineering for AI model training. Strong understanding of neural network architectures and optimization techniques. Experience in deploying AI models into ...

Photon

Sr Data Engineer - Gen AI/ML - Tampa

Coral Springs, FL · On-site

Spruce Infotech

IT - Technology Lead | Machine Learning | PYTHON

San Jose, CA · On-site

$164K - $201K/yr

Conduct data preprocessing, feature engineering, model evaluation, and performance tuning.Build robust, modular, and Minimum years of experience*: 8-10 years

Spruce Infotech

IT - Technology Lead | Machine Learning | PYTHON

San Jose, CA · On-site

$164K - $201K/yr

Conduct data preprocessing, feature engineering, model evaluation, and performance tuning.Build robust, modular, and Minimum years of experience*: 8-10 years

Essnova Solutions, Inc.

Data Engineer / Data Modeler

Supporting data preprocessing, normalization, and rigorous data quality validation processes. * Contributing to cloud migration planning, governance, and evaluation of cloud solutions for security ...

Essnova Solutions, Inc.

Data Engineer / Data Modeler

Jobs for Humanity

Artificial Intelligence Managers

Atlanta, GA

... data preprocessing to model training, evaluation, and deployment. - Stay up to date with the latest research and developments in machine learning and artificial intelligence. - Communicate complex ...

Jobs for Humanity

Artificial Intelligence Managers

Atlanta, GA

Sonatus

Staff AI Engineer, Data Analytics & Modeling - Office of the CTO

Sunnyvale, CA · Hybrid

Expertise in data preprocessing, feature engineering, and advanced model evaluation techniques (e.g., A/B testing, causal inference). Preferred: * Master's or PhD in a Science or Engineering field.

Sonatus

Staff AI Engineer, Data Analytics & Modeling - Office of the CTO

Sunnyvale, CA · Hybrid

Futran Tech Solutions Pvt. Ltd.

Overseas Contractor

Tampa, FL · On-site

... data preprocessing feature engineering and model evaluation Stay uptodate with the latest advancements in machine learning and AI technologies Qualifications Proficiency in Python and SQL is a must ...

Futran Tech Solutions Pvt. Ltd.

Overseas Contractor

Tampa, FL · On-site

Venturesoft

Data Scientist

Pleasanton, CA · On-site

Experiencewith data preprocessing, feature engineering, and model evaluationtechniques. * Knowledgeof deep learning architectures (CNNs, RNNs, Transformers) and theirapplications. * Proficiencyin ...

Venturesoft

Data Scientist

Pleasanton, CA · On-site

Core One

Data Scientist with Security Clearance

Tampa, FL · On-site

Lead and support data science efforts across the full ML lifecycle, including data collection, preprocessing, feature engineering, model development, validation, deployment, and monitoring. * Support ...

Core One

Data Scientist with Security Clearance

Tampa, FL · On-site

Skild AI

Software Engineer, AI Training and Infrastructure

San Mateo, CA · On-site

$100K - $300K/yr

Develop and maintain robust, scalable, and distributed training pipelines (data preprocessing, training orchestration, and model evaluation) and frameworks for large-scale AI models. * Optimize ...

Skild AI

Software Engineer, AI Training and Infrastructure

San Mateo, CA · On-site

$100K - $300K/yr

Launch Consulting

Senior Data Scientist

Chicago, IL · On-site

$140K - $180K/yr

Architect data preprocessing pipelines that ensure clean, high-quality, and well-structured data for training and evaluation. * Apply experimental design best practices (e.g. A/B testing, cross ...

Launch Consulting

Senior Data Scientist

Chicago, IL · On-site

$140K - $180K/yr

Showing results 1-20

Data Preprocessing Jobs

Data Preprocessing information

See salary details

$46K

$165K

$243.5K

How much do data preprocessing jobs pay per year?

As of Jun 29, 2026, the average yearly pay for data preprocessing in the United States is $165,018.00, according to ZipRecruiter salary data. Most workers in this role earn between $133,500.00 and $170,000.00 per year, depending on experience, location, and employer.

What is the highest paying job in data?

In data-related fields, roles such as Data Science Director, Machine Learning Engineer, and Chief Data Officer tend to have the highest salaries, often exceeding six figures annually. These positions typically require advanced skills in data analysis, programming, and leadership, along with extensive experience and relevant certifications.

What is data preprocessing?

Data preprocessing is the process of cleaning, transforming, and organizing raw data into a usable format for analysis or machine learning. It involves steps such as handling missing values, removing duplicates, normalizing or scaling data, and encoding categorical variables. Proper data preprocessing helps improve the quality and performance of predictive models by ensuring the data is accurate, consistent, and suitable for analysis.

What are the key skills and qualifications needed to thrive as a Data Preprocessing Specialist, and why are they important?

To thrive as a Data Preprocessing Specialist, you need a strong background in statistics, data cleaning, and data transformation, often supported by a degree in computer science, data science, or a related field. Proficiency with tools such as Python (pandas, NumPy), SQL, and data visualization platforms is typically essential, along with familiarity with data management systems. Attention to detail, problem-solving abilities, and effective communication are standout soft skills in this position. These skills are crucial for ensuring high-quality, reliable datasets that underpin accurate data analysis and machine learning outcomes.

Is 40 too late for data science?

Data preprocessing is a key step in data science, and individuals can enter the field at any age. Many data scientists start later in life, and acquiring skills in programming, statistics, and tools like Python or R can facilitate entry regardless of age.

What do you do in data preprocessing?

Data preprocessing involves cleaning and transforming raw data to prepare it for analysis or modeling. This includes tasks such as handling missing values, removing duplicates, normalizing data, and encoding categorical variables, often using tools like Python or R. It is a crucial step to ensure data quality and improve model performance.

What is the difference between Data Preprocessing vs Data Analysis?

Aspect	Data Preprocessing	Data Analysis
Primary Focus	Cleaning, transforming, and preparing raw data for analysis	Interpreting data to extract insights and support decision-making
Skills Required	Data cleaning, scripting, understanding of data formats	Statistical analysis, data visualization, critical thinking
Work Environment	Data engineering teams, data science projects	Business intelligence, research, data science teams
Tools Used	Python, R, SQL, ETL tools	Excel, Tableau, R, Python, statistical software

While data preprocessing involves preparing raw data for analysis by cleaning and transforming it, data analysis focuses on interpreting the prepared data to uncover trends and insights. Both roles are essential in the data pipeline but serve different purposes in the data lifecycle.

Will AI replace data analysts?

AI is transforming data analysis by automating routine tasks such as data cleaning and basic reporting, but data analysts are still essential for interpreting complex insights, making strategic decisions, and applying domain knowledge. The role is evolving to include skills in machine learning tools and programming languages like Python or R, but human expertise remains critical for nuanced analysis and contextual understanding.

What are some common challenges faced in a Data Preprocessing role, and how can they be effectively managed?

Professionals in Data Preprocessing often encounter challenges such as handling incomplete or inconsistent data, managing large datasets, and ensuring data quality before analysis. Addressing these issues typically involves using specialized tools to automate data cleaning, establishing clear data validation rules, and collaborating closely with data engineers and analysts. Staying updated with best practices and leveraging scripting languages like Python or R can also streamline the preprocessing workflow, making it easier to deliver reliable and accurate datasets for downstream analysis.

More about Data Preprocessing jobs

The 10 Top Types Of Data Preprocessing Jobs

What cities are hiring for Data Preprocessing jobs? Cities with the most Data Preprocessing job openings:

What states have the most Data Preprocessing jobs? States with the most job openings for Data Preprocessing jobs include:

What job categories do people searching Data Preprocessing jobs look for? The top searched job categories for Data Preprocessing jobs are:

Data Preprocessing jobs near you

Infographic showing various Data Preprocessing job openings in the United States as of June 2026, with employment types broken down into 50% Internship, and 50% Full Time. Highlights an 100% In-person job distribution, with an average salary of $165,018 per year, or $79.3 per hour.

Senior Machine Learning Engineer

C the Signs

Boston, MA • On-site, Remote

Apply

$133K - $175K/yr

Full-time

Posted 2 days ago

Job description

Position Summary
The Machine Learning Engineer will be responsible for the end-to-end development and deployment of Large language and machine learning models, with a primary focus on data preprocessing, model training, and fine-tuning using large-scale healthcare datasets. This role requires a strong understanding of Large language models, machine learning principles, data engineering, and experience working with sensitive healthcare data.
Key Responsibilities

Data Preprocessing: Clean, transform, and prepare large, complex healthcare datasets for machine learning model development. This includes handling missing values, outlier detection, feature engineering, and data normalization. Identify, collect, and curate relevant, industry-specific datasets for model retraining. Format data appropriately for the chosen LLM and training pipeline
Model Training & Fine-Tuning: Design, train, and fine-tune various LLMs on extensive healthcare data to solve specific clinical or operational problems. Set up and manage the training environment, including GPU instances and required software. Train and fine-tune pre-trained LLMs on the custom dataset to achieve specific goals. Experiment with and fine-tune hyperparameters such as learning rate, batch size, and training epochs to optimize model performance. Integration of structured + unstructured data (multi-modal/multi-input models)
Model Evaluation & Optimization: Evaluate model performance using appropriate metrics, identify areas for improvement, and implement optimization strategies.
Pipeline Development: Develop and maintain robust and scalable data and ML pipelines for model training, inference, and deployment.
Collaboration: Work closely with data scientists, clinicians, and software engineers to understand requirements, integrate models into production systems, and ensure data privacy and security compliance.
Research & Development: Stay up-to-date with the latest advancements in machine learning and healthcare AI, and explore new technologies and methodologies to enhance our solutions.
Documentation: Maintain clear and comprehensive documentation of models, data pipelines, and experimental results.

Requirements

Education: Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, or a related quantitative field.
Experience:

5+ years of experience in Machine Learning Engineering or a similar role.
Proven experience with large-scale data preprocessing, LLM/model training, and fine-tuning.
Experience with distributed training (PyTorch Distributed, DeepSpeed, Ray, Hugging Face Accelerate).
Experience with GPU/TPU optimization, memory management for large language models.
Experience working with healthcare data is highly desirable.

Technical Skills:

Proficiency in Python and relevant ML libraries (e.g., TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy).
Strong understanding of various machine learning algorithms,Large Language Models, and deep learning architectures.
Experience with cloud platforms (e.g., GCP, AWS) and distributed computing frameworks (e.g., Spark) is a plus.
Familiarity with MLOps practices and tools.

Soft Skills:

Excellent problem-solving and analytical skills.
Strong communication and collaboration abilities.
Ability to work independently and as part of a team in a fast-paced environment.

Work Authorization:

Must be a US Citizen, Green Card holder, or currently in the US have valid H1B visa

Benefits
Why Join Us?
Joining C the Signs is not just about building AI; it's about shaping the future of healthcare. If you are a technical leader with an unshakable belief in the power of AI to save lives and the ability to make it happen at scale, this is your opportunity to create a tangible, global impact.
Benefits:

Competitive salary and benefits package.
Flexible working arrangements (remote or hybrid options available).
The opportunity to work on life-changing AI technology that directly impacts patient outcomes.
Join a team that combines cutting-edge innovation with a mission to save lives and improve health equity.
Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Apply

Data Preprocessing Jobs (NOW HIRING)

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Maven Exploitation Specialist/ Data Scientist (Expert)

Maven Exploitation Specialist/ Data Scientist (Expert)

Maven Exploitation Specialist Data Manager Senior with Security Clearance

Maven Exploitation Specialist Data Manager Senior with Security Clearance

AI Engineer

AI Engineer

Gen AI Developer

Gen AI Developer

Gen AI Developer

Gen AI Developer

Full Stack Data Scientist

Full Stack Data Scientist

IT - Technology Lead | Machine Learning | PYTHON

IT - Technology Lead | Machine Learning | PYTHON

Applications Engineer (ML/Auto Defect Classification)

Applications Engineer (ML/Auto Defect Classification)

Sr Data Engineer - Gen AI/ML - Tampa

Sr Data Engineer - Gen AI/ML - Tampa

IT - Technology Lead | Machine Learning | PYTHON

IT - Technology Lead | Machine Learning | PYTHON

Data Engineer / Data Modeler

Data Engineer / Data Modeler

Artificial Intelligence Managers

Artificial Intelligence Managers

Staff AI Engineer, Data Analytics & Modeling - Office of the CTO

Staff AI Engineer, Data Analytics & Modeling - Office of the CTO

Overseas Contractor

Overseas Contractor

Data Scientist

Data Scientist

Data Scientist with Security Clearance

Data Scientist with Security Clearance

Software Engineer, AI Training and Infrastructure

Software Engineer, AI Training and Infrastructure

Senior Data Scientist

Senior Data Scientist

Data Preprocessing information

See salary details

How much do data preprocessing jobs pay per year?

Senior Machine Learning Engineer

Share this job

Job description

Share this job