Role : DevOps Engineer - Cloudera AdministrationLocation : Scottsdale AZ (100% Onsite) Must have :
- Strong on-prem Cloudera expertise
- candidates with experience in Ozone, Airflow, Ranger configuration, Atlas
- Strong architectural understanding, ability to guide application teams and implement best practices.
- Good to know - AWS and Cloudera on AWS
Identified candidates should:
- Ensure the reliability and ongoing monitoring of the infrastructure, its stability and maintainability.
- architect solutions, support application teams, and effectively utilize automation (viz just executing routine tasks)
Role SummaryWe are seeking a DevOps Engineer with strong hands-on experience in
Cloudera platform deployment, configuration, and administration. The ideal candidate will manage, automate, and optimize enterprise data platforms built on
Cloudera (CDP/CDH), ensuring availability, performance, security, and cost efficiency. The role requires advanced knowledge of
HDFS, Hive, HBase, Solr (Cloudera Search), Ozone, Cloudera Data Services, and Ranger, along with solid DevOps and automation practices.
Required Qualifications - 3-8 years (adjustable) of hands-on experience administering Cloudera (CDH/CDP) in production.
- Strong administration skills with HDFS, Hive, HBase, Solr (Cloudera Search), Ozone, Cloudera Data Services, and Ranger.
- Proficiency in Linux (RHEL/CentOS/Ubuntu) systems administration and shell scripting (Bash).
- Practical experience with Kerberos, TLS, LDAP/AD, and Ranger policy management.
- Automation with Ansible and scripting in Python for operational tasks.
- Experience with monitoring and logging tools (Cloudera Manager, Grafana/Prometheus/Elastic or equivalents).
- Solid understanding of networking (DNS, load balancing, firewalls), storage, and JVM/GC fundamentals.
- Git-based CI/CD familiarity and change management practices in regulated environments.
Key Responsibilities - Platform Deployment & Configuration
- Install, configure, and upgrade Cloudera clusters using Cloudera Manager across on-prem and/or cloud environments.
- Provision and configure services: HDFS, Hive, HBase, Solr (Cloudera Search), Ozone, Data Services (e.g., Data Engineering, Data Warehouse, Machine Learning), and Ranger.
- Set up and manage Kerberos, TLS/SSL, AD/LDAP integration, and Ranger policies for fine-grained access control.
- Operations & Reliability
- Own day-to-day cluster administration: capacity planning, quota management, service restarts, rolling upgrades/patching, and backup/restore.
- Monitor and tune cluster and service performance (NameNode/ResourceManager health, GC tuning, YARN queues, Hive LLAP/Tez, HBase region servers, Solr cores/collections).
- Implement SLA/SLO monitoring, alerting, and dashboards; drive root-cause analysis and incident response.
- Security & Governance
- Maintain a secure environment via Ranger policies, Kerberos principals/keytabs, TLS certificates, and compliance checks.
- Support data governance and auditing requirements, integrate with enterprise secrets and key management as needed.
- Automation & DevOps
- Build and maintain Infrastructure as Code (IaC) and configuration automation (e.g., Ansible, Terraform).
- Develop operational runbooks and automation in Python/Bash for provisioning, patching, and routine admin tasks.
- Integrate platform workflows with CI/CD (Git, pipelines) for repeatable, version-controlled changes.
- Data Services & Ecosystem Support
- Administer and optimize Hive metastore, ACID tables, compactions, and query engines (Tez/Spark).
- Manage HBase schemas, region splitting/balancing, and performance tuning.
- Operate Solr (Cloudera Search) for indexing, schema management, and query performance.
- Support Ozone object store operations, storage policies, and migration use cases.
- Collaborate with data engineering teams on job orchestration, resource management, and troubleshooting.
- Reliability Engineering
- Implement high availability (HA) for critical components; design DR strategies and execute failover tests.
- Capacity and cost optimization across compute/storage; recommend right-sizing and lifecycle policies.
- Documentation & Collaboration
- Create and maintain architecture diagrams, topology, SOPs/runbooks, and security documentation.
- Partner with platform, security, and data engineering teams; provide L2/L3 support and knowledge transfer.