1

Linux Site Reliability Engineer Jobs in Georgia (NOW HIRING)

Lead Site Reliability Engineer

Atlanta, GA · Remote

$54.75 - $72.75/hr

Infrastructure : 100% Linux-based cloud infrastructure (AWS, Google Cloud, MongoDB Atlas) and ... Set clear goals for the SRE team and partner with Engineering leadership to align platform ...

Senior SRE I

Atlanta, GA · On-site

$54.75 - $72.75/hr

The Senior Site Reliability Engineer (SRE) is responsible for maintaining and improving the ... • Deep understanding of Linux operating systems and networking fundamentals. • Proven ...

Senior Site Reliability Engineer II

Buford, GA · On-site +1

$104.90K - $174.70K/yr

We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and ... Strong Linux systems, networking, and troubleshooting skills * Experience supporting production ...

Senior Site Reliability Engineer II

Alpharetta, GA · On-site +1

$104.90K - $174.70K/yr

We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and ... Strong Linux systems, networking, and troubleshooting skills * Experience supporting production ...

Senior Site Reliability Engineer II

Atlanta, GA · On-site +1

$104.90K - $174.70K/yr

We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and ... Strong Linux systems, networking, and troubleshooting skills * Experience supporting production ...

Site Reliability Engineer

Atlanta, GA · On-site +1

$100K - $120K/yr

Overview The Site Reliability Engineer is a key force behind improving Origami's time to resolution and advancing overall site reliability and scalability. This person participates in efforts to ...

Title: Senior Site Reliability Engineer Location: Alpharetta, GA Duration: 6-12+ Months About the Role We're seeking an experienced Senior Site Reliability Engineer to join our team and play a ...

Lead Site Reliability Engineer

Alpharetta, GA · On-site

$54.25 - $72/hr

As a Lead Site Reliability Engineer, you will be responsible for ensuring the reliability and ... and Linux-based systems to ensure optimal performance and reliability. • Collaborate with ...

Job Overview: * SRE GCP certification preferred (Associate cloud engineer, Dev Ops or Architect ... Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.

SRE Lead/ Architect

Atlanta, GA · On-site

$54.75 - $72.75/hr

Job Title: SRE Lead/Architect Location: Atlanta, GA - Hybrid (Thur to next wed (Alternate weeks)) Contract Role Role Summary: Mandatory skills are Observability, Resiliency, Chaos engineering, strong ...

Apply SRE principles including SLOs, error budgets, and incident response * Develop Infrastructure as Code using Terraform * Automate CI/CD pipelines and GitOps deployments * Implement secure ...

next page

Showing results 1-20

Linux Site Reliability Engineer information

What are the key skills and qualifications needed to thrive as a Linux Site Reliability Engineer, and why are they important?

To thrive as a Linux Site Reliability Engineer, you need deep expertise in Linux system administration, scripting (such as Bash or Python), and a solid understanding of networking concepts, usually backed by a computer science degree or equivalent experience. Familiarity with configuration management tools (like Ansible, Puppet, or Chef), containerization (Docker, Kubernetes), and cloud platforms (AWS, GCP, or Azure) is typically required, along with relevant certifications like RHCE or AWS Certified SysOps Administrator. Strong problem-solving skills, effective communication, and the ability to work under pressure are crucial soft skills for this role. These competencies ensure the reliability, scalability, and security of complex infrastructure, minimizing downtime and supporting seamless operations.

What are some common challenges faced by Linux Site Reliability Engineers when scaling infrastructure, and how can they be addressed?

Linux Site Reliability Engineers often encounter challenges related to maintaining system stability and performance as infrastructure scales. Issues such as configuration drift, automation bottlenecks, and monitoring gaps can arise when managing numerous servers or services. Addressing these challenges typically involves implementing robust configuration management tools, investing in automated deployment pipelines, and enhancing observability through comprehensive monitoring and alerting solutions. Collaboration with development and operations teams is essential to ensure that scalability solutions align with business needs and technical requirements.

What is a Linux Site Reliability Engineer?

A Linux Site Reliability Engineer (SRE) is an IT professional responsible for ensuring the reliability, scalability, and performance of systems running on the Linux operating system. They bridge the gap between software development and operations by automating processes, monitoring infrastructure, and managing incidents. Linux SREs focus on system availability, building tools for deployment and monitoring, and improving system robustness through best practices and automation. Their work helps organizations deliver reliable online services and quickly recover from outages or system failures.

What is the difference between Linux Site Reliability Engineer vs Linux DevOps Engineer?

AspectLinux Site Reliability EngineerLinux DevOps Engineer
CredentialsLinux certifications, SRE-specific trainingLinux certifications, DevOps tools certifications
Work EnvironmentFocus on system reliability, monitoring, incident responseFocus on automation, CI/CD pipelines, deployment
Employer & IndustryTech companies, cloud providers, large enterprisesStartups, tech firms, software development teams
Search & Comparison IntentUnderstanding reliability roles, incident managementAutomation, deployment, continuous integration

While both roles involve Linux expertise, a Linux Site Reliability Engineer primarily focuses on maintaining system reliability, monitoring, and incident response. In contrast, a Linux DevOps Engineer emphasizes automation, continuous integration, and deployment processes. Both roles require Linux skills and often overlap, but their core responsibilities differ based on organizational needs.

What are popular job titles related to Linux Site Reliability Engineer jobs in Georgia? For Linux Site Reliability Engineer jobs in Georgia, the most frequently searched job titles are:
What job categories do people searching Linux Site Reliability Engineer jobs in Georgia look for? The top searched job categories for Linux Site Reliability Engineer jobs in Georgia are:
What cities in Georgia are hiring for Linux Site Reliability Engineer jobs? Cities in Georgia with the most Linux Site Reliability Engineer job openings:
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Intellum, Inc.

Atlanta, GA • Remote

$54.75 - $72.75/hr

Full-time

Medical, Dental, Vision, Retirement, PTO

Posted 8 days ago


Job description

About us

Intellum is the leader in corporate education technology and powers the largest, most successful customer, partner, and employee learning programs in the world. Large brands and fast-moving companies like Google, Meta, Amazon, Walmart, Xero, Atlassian, Mailchimp, Airbnb, Stripe, and TikTok rely on Intellum to engage and educate the audiences they touch.

We have always been a "remote first" company and are proud to have team members located all over the world. We value Curiosity, Creativity, Perseverance, and Kindness and strive to demonstrate these core values every day. Our culture is very important to us. We invest in our people in fun and exciting ways, including personal development budgets and an annual all-company retreat that is focused less on work and more on human connections. We are in growth mode, and our "smart growth" approach ensures that we will continue to scale our company effectively.

Summary

We are seeking a Lead Site Reliability Engineer to spearhead our SRE team. You are not just an operator; you are an experienced software engineer who excels at architecture, code optimization, and deep troubleshooting. In this role, you will drive operational maturity by defining our reliability standards (SLOs), hardening our security posture (WAF/InfraSec), and scaling the Intellum platform.

Our stack

  • Core: Applications written in Ruby on Rails and Node.js, PostgreSql, MongoDB,, Redis, Memcached, Sidekiq, ActiveJob, Elasticsearch, Websockets
  • Infrastructure: 100% Linux-based cloud infrastructure (AWS, Google Cloud, MongoDB Atlas) and services (ECS/EC2/Kubernetes, Elasticache, MemoryStore, RDS, CloudSQL, BigQuery etc.)
  • Infrastructure as Code (IaC): GitHub, Terragrunt, Terraform, Ansible
  • CI/CD: Spinnaker, Jenkins
  • Observability & Alerting: New Relic, AWS CloudWatch, Google Cloud Stackdriver, Squadcast
  • Agile/Scrum practices utilizing JIRA

Responsibilities

  • SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives.
  • Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience.
  • Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department.
  • Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline.
  • Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence.
  • Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".

Required Skills

Experience & Engineering

  • 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
  • Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
  • Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.

SRE & Operations

  • Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
  • SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
  • Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
  • Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).

Leadership

  • Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.
  • Documentation & Training: Skilled in documenting solutions and training operational teams on how to effectively support and maintain systems.
  • Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.

Bonus Skills

  • Automation Tools: Experience in developing solutions using server automation tools such as Terraform, Ansible.
  • CI/CD Expertise: Experience in writing and maintaining CI/CD pipelines and services.
  • Kubernetes: Experience in building, deploying, and optimizing Kubernetes-based infrastructure
  • Perimeter Defense: Experience configuring and managing Web Application Firewalls (WAF) (e.g., Cloudflare, AWS WAF, Akamai) and DDOS protection mechanisms.

Education

  • Bachelor's degree in Computer Science or related technical field

BENEFITS

  • Medical - 100% of employee premiums for selected individual plans
  • Dental - 100% of employee premiums covered
  • Vision - 100% of employee premiums covered
  • LinkedIn Learning
  • 401(k) plus matching (US Based Only)
  • Flexible PTO
  • Calm subscription
  • Annual Company Retreat

Intellum is an equal-opportunity employer. We're committed to building an inclusive team that celebrates diversity in people, perspectives, and backgrounds regardless of race, color, national origin, gender, sexual orientation, age, religion, disability, citizenship, veteran status, or any other protected status. We encourage you to apply for an open position and if you have questions about whether or not your job experience and skill set meet the requirements for a specific role, reach out to us directly at careers@intellum.com.


If you are an individual applying from CA, NY, CO, CT, MD, NV, or RI, please reach out to careers@intellum.com to inquire about specific pay ranges.