1

Linux Site Reliability Engineer Jobs in Georgia (NOW HIRING)

SRE Lead

Alpharetta, GA · On-site

$55.75 - $74/hr

Role: SRE Lead Location: Alpharetta, GA Experience Level: Senior / Lead Role Overview We are ... Artifact management using Docker and Helm Linux & Systems Engineering (Must-Have) * Deep hands-on ...

Site Reliability Engineer - SRE

Atlanta, GA

$54.25 - $72/hr

Role: Site Reliability Engineer * Location: Atlanta, GA OR Dallas OR Austin, TX * Duration ... Proficient in a Linux or Unix based environment. * Proficiency in supporting a 24x7 operation.

Site Reliability Engineer - SRE

Atlanta, GA · On-site +1

$54.25 - $72/hr

Role: Site Reliability Engineer * Location: Atlanta, GA OR Dallas OR Austin, TX * Duration ... Proficient in a Linux or Unix based environment. * Proficiency in supporting a 24x7 operation.

Site Reliability Engineer

Alpharetta, GA · On-site

$55.75 - $74/hr

I have an opportunity for a " Site Reliability Engineer " - Alpharetta, GA (Onsite). and I am ... Linux, Windows servers. • Experience with Web service technologies, including REST, SOAP, JSON ...

Site Reliability Engineer

Atlanta, GA · On-site

$54.75 - $72.75/hr

Position : SRE Duration : 6 to 12 Months Location : Atlanta or St. Louis - Day Onsite Job ... Linux, Windows servers. • Experience with Web service technologies, including REST, SOAP, JSON ...

Site Reliability engineer (SRE)

Atlanta, GA · Hybrid

$54.75 - $72.75/hr

Site Reliability engineer(SRE) Location: Atlanta, GA ( Hybrid - 3days Office - 2 days WFH) Duration: C2H : Dynatrace App dynamics ACI (Advanced Computing International) is a Global Technology ...

next page

Showing results 1-20

Linux Site Reliability Engineer information

What are the key skills and qualifications needed to thrive as a Linux Site Reliability Engineer, and why are they important?

To thrive as a Linux Site Reliability Engineer, you need deep expertise in Linux system administration, scripting (such as Bash or Python), and a solid understanding of networking concepts, usually backed by a computer science degree or equivalent experience. Familiarity with configuration management tools (like Ansible, Puppet, or Chef), containerization (Docker, Kubernetes), and cloud platforms (AWS, GCP, or Azure) is typically required, along with relevant certifications like RHCE or AWS Certified SysOps Administrator. Strong problem-solving skills, effective communication, and the ability to work under pressure are crucial soft skills for this role. These competencies ensure the reliability, scalability, and security of complex infrastructure, minimizing downtime and supporting seamless operations.

What are some common challenges faced by Linux Site Reliability Engineers when scaling infrastructure, and how can they be addressed?

Linux Site Reliability Engineers often encounter challenges related to maintaining system stability and performance as infrastructure scales. Issues such as configuration drift, automation bottlenecks, and monitoring gaps can arise when managing numerous servers or services. Addressing these challenges typically involves implementing robust configuration management tools, investing in automated deployment pipelines, and enhancing observability through comprehensive monitoring and alerting solutions. Collaboration with development and operations teams is essential to ensure that scalability solutions align with business needs and technical requirements.

What is a Linux Site Reliability Engineer?

A Linux Site Reliability Engineer (SRE) is an IT professional responsible for ensuring the reliability, scalability, and performance of systems running on the Linux operating system. They bridge the gap between software development and operations by automating processes, monitoring infrastructure, and managing incidents. Linux SREs focus on system availability, building tools for deployment and monitoring, and improving system robustness through best practices and automation. Their work helps organizations deliver reliable online services and quickly recover from outages or system failures.

What is the difference between Linux Site Reliability Engineer vs Linux DevOps Engineer?

AspectLinux Site Reliability EngineerLinux DevOps Engineer
CredentialsLinux certifications, SRE-specific trainingLinux certifications, DevOps tools certifications
Work EnvironmentFocus on system reliability, monitoring, incident responseFocus on automation, CI/CD pipelines, deployment
Employer & IndustryTech companies, cloud providers, large enterprisesStartups, tech firms, software development teams
Search & Comparison IntentUnderstanding reliability roles, incident managementAutomation, deployment, continuous integration

While both roles involve Linux expertise, a Linux Site Reliability Engineer primarily focuses on maintaining system reliability, monitoring, and incident response. In contrast, a Linux DevOps Engineer emphasizes automation, continuous integration, and deployment processes. Both roles require Linux skills and often overlap, but their core responsibilities differ based on organizational needs.

What are popular job titles related to Linux Site Reliability Engineer jobs in Georgia? For Linux Site Reliability Engineer jobs in Georgia, the most frequently searched job titles are:
What job categories do people searching Linux Site Reliability Engineer jobs in Georgia look for? The top searched job categories for Linux Site Reliability Engineer jobs in Georgia are:
What cities in Georgia are hiring for Linux Site Reliability Engineer jobs? Cities in Georgia with the most Linux Site Reliability Engineer job openings:

$55.75 - $74/hr

Full-time

Posted 13 days ago


Job description

Overview:
Role: SRE Lead
Location: Alpharetta, GA
Experience Level: Senior / Lead
Role Overview
We are seeking an experienced Site Reliability Engineering (SRE) Lead to own and drive the reliability, scalability, and operational excellence of cloud-native platforms. This role combines hands-on technical depth with people leadership, responsible for managing the SRE team while setting best practices across reliability engineering, automation, observability, and incident management.
The SRE Lead will work closely with engineering, security, and platform teams to ensure systems are resilient, secure, and performant at scale.
Key Responsibilities
Leadership & Ownership
  • Lead and manage the SRE team, owning end-to-end SRE responsibilities.
  • Define SRE standards, reliability goals (SLIs/SLOs), and operational best practices.
  • Mentor engineers and drive a culture of automation, resilience, and continuous improvement.
  • Act as a key escalation point during critical incidents and outages.
Cloud & Platform Engineering
  • Design, implement, and manage cloud infrastructure using Google Cloud Platform (GCP) services:
    • Compute Engine, GKE, VPC, Cloud IAM, Cloud Storage, Cloud SQL.
  • Ensure high availability, fault tolerance, and scalability across environments.
Networking & Connectivity
  • Architect and manage:
    • VPC peering, Shared VPCs
    • Firewall rules, Load Balancers, DNS
    • VPN tunnels and secure hybrid connectivity
Security & Identity
  • Debug and manage IAM policies and service accounts.
  • Implement Workload Identity Federation and least-privilege access models.
  • Partner with security teams to enforce cloud security best practices.
Infrastructure as Code & Automation
  • Develop and maintain Terraform modules with strong state management and dependency handling.
  • Apply DRY principles across infrastructure code.
  • Lead infrastructure automation initiatives to reduce manual intervention.
CI/CD & Deployment Strategies
  • Design and maintain pipelines using:
    • Jenkins (Declarative & Scripted)
    • GitHub Actions (YAML workflows)
  • Implement advanced deployment strategies:
    • Canary releases
    • Blue/Green deployments
    • Artifact management using Docker and Helm
Linux & Systems Engineering (Must-Have)
  • Deep hands-on expertise with RHEL, Ubuntu, and CentOS.
  • Kernel tuning, systemd, storage management (LVM).
  • OS-level performance optimization and troubleshooting.
Observability & Debugging
  • Diagnose and resolve CPU, memory, disk, and I/O bottlenecks.
  • Analyze system and application logs.
  • Troubleshoot boot issues and low-level system failures.
  • Drive root cause analysis and post-incident reviews.
Programming & Scripting (Must-Have)
  • Strong proficiency in Python, Go (Golang), or Java for automation and tooling.
Required Skills
  • Proven experience leading SRE or Platform Engineering teams.
  • Strong expertise in GCP infrastructure and Kubernetes (GKE).
  • Advanced Linux systems knowledge.
  • Infrastructure-as-Code and CI/CD mastery.
  • Strong debugging, incident response, and reliability engineering skills.
Preferred Qualifications
  • Certifications:
    • Google Professional Cloud DevOps Engineer
    • Google Cloud Architect
    • CKA (Certified Kubernetes Administrator)
  • Experience with large-scale distributed systems and microservices.
  • Familiarity with:
    • ITIL processes
    • Change Advisory Board (CAB)
    • Incident and problem management frameworks