1

Tesseract Jobs (NOW HIRING)

AI/ML Vision Engineer

Middletown, PA

$79K - $106K/yr

OCR-related experience (such as Tesseract, PaddleOCR, EasyOCR, or custom models). * Familiarity with object detection (such as YOLO, Faster R-CNN, SSD, etc.). * Knowledge of classification, feature ...

Gemini, llama, gpt etc. and OCR: textract, tesseract etc. * Strong understanding of the use of neural networks, embeddings, transformers etc. * Cloud platforms (AWS SageMaker, Azure ML; etc)

Gemini, llama, gpt etc. and OCR: textract, tesseract etc. * Strong understanding of the use of neural networks, embeddings, transformers etc. * Cloud platforms (AWS SageMaker, Azure ML; etc)

.NET Developer

New York, NY · On-site

$110K - $175K/yr

Experience with OCR tools (e.g., Azure Document Intelligence, Google Document AI, Tesseract) * Skilled in prompt design (zero-shot, few-shot, chain-of-thought) for reliable outputs * Experience ...

Experience with OCR tools (e.g., Azure Document Intelligence, Google Document AI, Tesseract) * Skilled in prompt design (zero-shot, few-shot, chain-of-thought) for reliable outputs * Experience ...

.NET Developer

New York, NY · On-site

$110K - $175K/yr

Experience with OCR tools (e.g., Azure Document Intelligence, Google Document AI, Tesseract) * Skilled in prompt design (zero-shot, few-shot, chain-of-thought) for reliable outputs * Experience ...

next page

Showing results 1-20

Tesseract information

What are the key skills and qualifications needed to thrive as a Tesseract OCR Specialist, and why are they important?

To thrive as a Tesseract OCR Specialist, you need a solid understanding of optical character recognition principles, image preprocessing, and programming languages like Python or C++. Familiarity with Tesseract OCR engine, related libraries (such as OpenCV), and experience with document digitization tools or APIs is typical. Strong analytical thinking, problem-solving skills, and attention to detail help optimize OCR accuracy and troubleshoot extraction issues. These skills ensure efficient and accurate digitization of documents, enabling organizations to automate data entry and improve information accessibility.

What are some common challenges faced by professionals working with the Tesseract OCR engine, and how can they be addressed?

A common challenge when working with the Tesseract OCR engine is dealing with low-quality or complex images, which can lead to inaccurate text extraction. Addressing this often requires pre-processing steps such as image deskewing, noise reduction, and adjusting contrast. Additionally, customizing Tesseract with language data files and fine-tuning settings can significantly improve accuracy for specific use cases. Collaborating with software developers, data scientists, and QA testers is typical, as integrating OCR effectively usually involves cross-functional teamwork.

What are Tesseract jobs?

Tesseract jobs typically refer to roles related to the Tesseract OCR (Optical Character Recognition) engine, which is an open-source software used to convert images containing text into editable and searchable data. Jobs involving Tesseract can include positions such as software developer, machine learning engineer, or data scientist, where the focus is on integrating, optimizing, or enhancing OCR capabilities in various applications. Professionals in these roles often work on image preprocessing, text extraction, and improving the accuracy of OCR results for different languages and formats.

What is the difference between Tesseract vs OCR Technician?

AspectTesseractOCR Technician
Required CredentialsBasic computer skills, familiarity with OCR softwareTechnical training or certification in OCR or image processing
Work EnvironmentSoftware development, data processingData entry centers, document processing facilities
Industry UsageUsed by developers for OCR projectsEmployed in document digitization and data extraction roles
Common Search/ComparisonYesYes

While Tesseract is an open-source OCR engine used primarily by developers for integrating OCR into applications, OCR Technicians are professionals who operate OCR systems in data entry or document processing environments. Tesseract requires programming knowledge, whereas OCR Technicians focus on manual or semi-automated data extraction tasks.

More about Tesseract jobs
What states have the most Tesseract jobs? States with the most job openings for Tesseract jobs include:
Infographic showing various Tesseract job openings in the United States as of May 2026, with employment types broken down into 95% Full Time, and 5% Temporary. Highlights an 50% Physical, 28% Hybrid, and 22% Remote job distribution.
Lead Data Scientist - Gen AI & Digital Twin

Lead Data Scientist - Gen AI & Digital Twin

Caterpillar Inc.

Chicago, IL • On-site

Full-time

Posted 4 days ago


Caterpillar Inc. rating

7.5

Company rating: 7.5 out of 10

Based on 458 frontline employees who took The Breakroom Quiz

218th of 417 rated machine equipment manufacturers


Job description

Job Summary:
Caterpillar Inc. is the world’s leading manufacturer of construction and mining equipment, committed to building a better, more sustainable world. The Lead Data Scientist will drive the development and integration of digital twins and GenAI-assisted predictive analytics for condition monitoring of Caterpillar equipment.
Responsibilities:
• Design and implement GPU-accelerated machine learning models (e.g., XGBoost, autoencoders, and GANs using Tesseract) to identify fault patterns in timeseries sensor data.
• Partner with engineering teams to develop onboard digital twins using NVIDIA architecture (e.g. PhysicsNeMo) to simulate, predict, and optimize the performance of heavy machinery
• Profile and tune deep learning algorithms for maximum efficiency on NVIDIA GPU architectures, ensuring high throughput and low latency for real-time monitoring.
• Adapt and test algorithms for onboard architecture, leveraging tools like NVIDIA Jetson for ROM generation and real-time edge processing on Cat equipment.
• Collaborate with hardware / simulation engineers to ensure algorithm compatibility with next-generation processors and specialized onboard compute modules.
• Use high-fidelity digital twins to simulate rare failure scenarios, ensuring the GenAI assistant provides accurate troubleshooting steps for edge-case mechanical issues.
• Develop Generative AI agents that synthesize telematics data to generate prioritized repairs for identified machine faults.
• Integrate multi-modal outputs from condition monitoring analytics & asset life history to create a machine-specific context for AI assistant.
Qualifications:
Required:
• Typically, a Bachelors, Masters, or PhD degree in Applied Statistics, Data Science, Business Analytics, Predictive Analytics, Business Intelligence & Analytics, Mathematics, Computer Science, Engineering (Aerospace, Electrical, Mechanical, Computer, Industrial, Agricultural, etc.), or equivalent technical degree
• Extensive experience applying Python (NumPy, SciPy, pandas, etc.) programming to solve business challenges.
• Extensive experience with advanced data analysis, machine learning such as clustering, Log regressions, neural nets and statistical methods such as statistical process control, etc. (typically 8+ years)
• Experience in practical applications of onboard architecture / software (e.g. mini projects using Raspberry Pi or any other architecture is a bonus)
• Working experience with heavy equipment engineering or data analysis.
• Working knowledge with cloud technologies (AWS, Azure, Google Cloud, etc.)
• Advanced experience with version control / repositories such as GitHub
• Experience operating in an Agile environment
• Must demonstrate strong initiative, interpersonal skills, and the ability to communicate effectively.
Preferred:
• Generative AI & LLMs: Proficiency in Fine-tuning and Prompt Engineering for Large Language Models, specifically using Retrieval-Augmented Generation (RAG)
• Condition Monitoring Algorithms: Deep understanding of Anomaly Detection, Time-Series Analysis, and Predictive Maintenance models.
• Telematics: Experience handling high-frequency IoT sensor data, CAN bus protocols (J1939), and integrating with unified data platforms
• Experience with High performance computing
• Business Statistics: Extensive experience with statistical tools, processes, and practices to describe business results in measurable scales; ability to use statistical tools and processes to assist in making business decisions.
• Analytical Thinking: Extensive knowledge of techniques and tools that promote effective analysis; ability to determine the root cause of organizational problems and create alternative solutions that resolve these problems.
• Programming Languages: Extensive knowledge of basic concepts and capabilities of applying Python programming to solve business challenges; ability to use tools, techniques and platforms in order to write and modify programming languages.
• Requirements Analysis: Working knowledge of tools, methods, and techniques of requirement analysis; ability to elicit, analyze and record required business functionality and non-functionality requirements to ensure the success of a system or software development project.
Company:
For 100 years, we’ve been helping customers build a better, more sustainable world. Founded in 1925, the company is headquartered in Peoria Heights, USA, with a team of 10001+ employees. The company is currently Late Stage.

What Caterpillar Inc. employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom