Position Details:
Job Title: Bigdata Engineer
Location: Tampa, FL
Duration: 12+ Months Contract to hire
Job Responsibilities:
Principal Responsibilities
Design interfaces to the data warehouses/data storages and machine learning/Big Data
applications using open source tools such as Scala, Java, Python, Perl and shell scripting.
Design and create data pipelines to maintain stable dataflow to the machine learning models -
both in batch mode and near real-time mode.
Interface with Engineering/Operations/System Admin/Data Scientist teams to ensure data
pipelines and processes fit within the production framework.
Ensure that tools and environments adhere to strict security protocols.
Deploy the machine learning model and serve its outputs as RESTful API calls.
Understand the business needs in close collaborations with subject matter experts (SMEs)
and Data Scientists to do efficient feature engineering for machine learning models.
Maintain the code and libraries in code repository.
Work with system administration team to proactively resolve issues/install tools and libraries
on the AWS platform.
Research and come up with architecture and solutions most appropriate for problems at hand.
Maintain and improve tools to assist Analytics in ETL, retrospective testing, efficiency,
repeatability, and R&D.
Lead by example regarding software best practices, including code style and architecture,
documentation, source control, and testing.
Support the Chief Data Scientist/Data Scientists/Big Data Engineers in creating new and novel
approaches to solve challenging problems using Machine Learning, Big Data and Cloud
technologies.
Handle ADHOC requirements to create reports for the end users.
Required Skills
Strong skills with Apache Spark (Spark SQL) and SCALA with at least 2+ years of experience.
Understanding of AWS Big Data components and tools.
Strong Java skills with experience in web services and web development is required.
Hands on experience with model deployment.
Hands on experience in application deployment on Docker and/or Kubernetes or other similar technology.
Linux scripting is a plus.
Fundamental understanding of AWS cloud components.
2+ years of experience in data ingesting, cleansing/processing, storing and querying large datasets
2+ years of experience in engineering large-scale data solutions with Java/Tomcat/ SQL/Linux
Experience working in a data intensive role including the extraction of data (db/web/api/etc.), transformation and loading (ETL)
Exposure with structured and/or unstructured data contents
Experience with data cleansing/preparation on Hadoop/Apache Spark Ecosystem - MapReduce/Hive/HBase/Spark SQL
Experience with distributed streaming tools like Apache KAFKA.
Experience with multiple file formats (Parquet, Avro, OCR)
Knowledge in AGILE development cycle.
Efficient coding skills to enhance the performance/cost savings of the job running on AWS platform.
Experience in building stable, scalable, and high-speed live streams of data and serving web platforms
Enthusiastic self-starter with ability to work in a team environment.
Graduate (MS) or Undergraduate degree in Computer Science/ Engineering/relevant field
Nice to have:
Strong Software development experience
Machine Learning model deployment experience
Ability to write custom Map/Reduce programs to clean/prepare complex data
Familiarity with Streaming data processing - Experience with distributed real time computation system like Apache STORM/Apache Spark Streaming.