1

Linux Site Reliability Engineer Jobs in Virginia

Site Reliability Engineer

Sterling, VA · On-site

$56.50 - $75/hr

The Site Reliability Engineer (SRE) collaboratively works closely with the contract leadership ... Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including ...

Site Reliability Engineer

Sterling, VA · On-site

$56.50 - $75/hr

Site Reliability Engineer Location: Sterling, VA Clearance: TS/SCI Poly **This position is ... Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including ...

Site Reliability Engineer (SRE)

Vienna, VA · On-site

$57.25 - $76/hr

The AWS Site Reliability Engineer (SRE) is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team. You ...

Senior Site Reliability Engineer

Mclean, VA · On-site +1

$57.50 - $76.50/hr

As an SRE, your primary responsibility is to combine aspects of software engineering with ... Experience with Linux and Windows operating systems, along with scripting tools and techniques such ...

Senior Site Reliability Engineer

Mclean, VA

$57.50 - $76.50/hr

As an SRE, your primary responsibility is to combine aspects of software engineering with ... Experience with Linux and Windows operating systems, along with scripting tools and techniques such ...

Senior Site Reliability Engineer

Mclean, VA

$57.50 - $76.50/hr

As an SRE, your primary responsibility is to combine aspects of software engineering with ... Experience with Linux and Windows operating systems, along with scripting tools and techniques such ...

Site Reliability Engineer

Richmond, VA · On-site

$56.50 - $75/hr

Site Reliability Engineer One and Done Virtual Interview Needs to be onsite from day 1 in Richmond Virginia Only candidates that can convert in 12 months with no sponsorship Must haves: Log Data The ...

Site Reliability Engineer (SRE)

Vienna, VA · Hybrid

$57.25 - $76/hr

Minimum of 8 years of experience as a Site Reliability Engineer with a strong understanding of SRE principles for highly scalable and reliable systems Possess a bachelor's degree Experience working ...

AWS GovCloud SRE

Alexandria, VA · On-site

$83.20K - $164.32K/yr

Specify and configure physical and virtual machines with RedHat Enterprise Linux with a heavy focus ... The SRE's holistic view should include, but is not limited to, capacity planning, systemplatform ...

Site Reliability Engineer (SRE)

Vienna, VA · On-site

$57.25 - $76/hr

Up to 2 years in duration MUST HAVES: • Minimum of 8 years of experience as a Site Reliability Engineer with a strong understanding of SRE principles for highly scalable and reliable systems • ...

Staff Site Reliability Engineer

Reston, VA · On-site

$59.25 - $78.75/hr

The Site Reliability Engineering team drives reliability strategy, elevates engineering standards ... Deep Linux expertise - from kernel internals and system performance tuning to hardening and ...

Staff Site Reliability Engineer

Reston, VA

$59.25 - $78.75/hr

The Site Reliability Engineering team drives reliability strategy, elevates engineering standards ... Deep Linux expertise - from kernel internals and system performance tuning to hardening and ...

next page

Showing results 1-20

Linux Site Reliability Engineer information

What are the key skills and qualifications needed to thrive as a Linux Site Reliability Engineer, and why are they important?

To thrive as a Linux Site Reliability Engineer, you need deep expertise in Linux system administration, scripting (such as Bash or Python), and a solid understanding of networking concepts, usually backed by a computer science degree or equivalent experience. Familiarity with configuration management tools (like Ansible, Puppet, or Chef), containerization (Docker, Kubernetes), and cloud platforms (AWS, GCP, or Azure) is typically required, along with relevant certifications like RHCE or AWS Certified SysOps Administrator. Strong problem-solving skills, effective communication, and the ability to work under pressure are crucial soft skills for this role. These competencies ensure the reliability, scalability, and security of complex infrastructure, minimizing downtime and supporting seamless operations.

What are some common challenges faced by Linux Site Reliability Engineers when scaling infrastructure, and how can they be addressed?

Linux Site Reliability Engineers often encounter challenges related to maintaining system stability and performance as infrastructure scales. Issues such as configuration drift, automation bottlenecks, and monitoring gaps can arise when managing numerous servers or services. Addressing these challenges typically involves implementing robust configuration management tools, investing in automated deployment pipelines, and enhancing observability through comprehensive monitoring and alerting solutions. Collaboration with development and operations teams is essential to ensure that scalability solutions align with business needs and technical requirements.

What is a Linux Site Reliability Engineer?

A Linux Site Reliability Engineer (SRE) is an IT professional responsible for ensuring the reliability, scalability, and performance of systems running on the Linux operating system. They bridge the gap between software development and operations by automating processes, monitoring infrastructure, and managing incidents. Linux SREs focus on system availability, building tools for deployment and monitoring, and improving system robustness through best practices and automation. Their work helps organizations deliver reliable online services and quickly recover from outages or system failures.

What is the difference between Linux Site Reliability Engineer vs Linux DevOps Engineer?

AspectLinux Site Reliability EngineerLinux DevOps Engineer
CredentialsLinux certifications, SRE-specific trainingLinux certifications, DevOps tools certifications
Work EnvironmentFocus on system reliability, monitoring, incident responseFocus on automation, CI/CD pipelines, deployment
Employer & IndustryTech companies, cloud providers, large enterprisesStartups, tech firms, software development teams
Search & Comparison IntentUnderstanding reliability roles, incident managementAutomation, deployment, continuous integration

While both roles involve Linux expertise, a Linux Site Reliability Engineer primarily focuses on maintaining system reliability, monitoring, and incident response. In contrast, a Linux DevOps Engineer emphasizes automation, continuous integration, and deployment processes. Both roles require Linux skills and often overlap, but their core responsibilities differ based on organizational needs.

What job categories do people searching Linux Site Reliability Engineer jobs in Virginia look for? The top searched job categories for Linux Site Reliability Engineer jobs in Virginia are:
What cities in Virginia are hiring for Linux Site Reliability Engineer jobs? Cities in Virginia with the most Linux Site Reliability Engineer job openings:

Site Reliability Engineer

Nwis

Sterling, VA • On-site

$56.50 - $75/hr

Full-time

Posted 7 days ago


Job description

Nightwing provides technically advanced full-spectrum cyber, data operations, systems integration and intelligence mission support services to meet our customers' most demanding challenges. Our capabilities include cyber space operations, cyber defense and resiliency, vulnerability research, ubiquitous technical surveillance, data intelligence, lifecycle mission enablement, and software modernization. Nightwing brings disruptive technologies, agility, and competitive offerings to customers in the intelligence community, defense, civil, and commercial markets.

Job Title: Site Reliability Engineer
Location:Sterling, VA
Clearance:TS/SCI Poly

**This position is CONTINGENT upon contract award**

The Site Reliability Engineer (SRE) collaboratively works closely with the contract leadership, Platform teams, and Sponsor to refine the operational and technical strategy to automate key portions of IT operations and enable the Product team (Platform) to bring new software or new features to production as quickly as possible. The SRE executes and analyzes manual IT operations/admin tasks (log analysis, performance tuning, patch management, testing, and incident response) and converts them to automated tasks. The SRE works with the Platform, Network and Data Operations teams to assist in deployment planning and onboard systems. They assist with monitoring, system analysis, and IT operations support. Daily tasks include, but are not limited to:

  • Work with Sponsor, Mission partners, and technical personnel to deliver robust scalable operations architecture that meets the customer goals for the enterprise.
  • Analyze, define, and document requirements for data, workflow, logical processes, hardware and operating system environment, and network connectivity, other system interfaces, internal and external checks and controls, and outputs.
  • Monitor and track metrics, logs and traces across all services in the system/network and provide context for identifying root causes in the event of an incident, performance degradation, or availability issue.
  • Perform Network/Cloud optimization and resilience planning
  • Develop capabilities to automate hardware/software provisioning, monitoring, patching, and troubleshooting.
  • Collaborate with and assist Platform team and leadership in network and security health, intrusions or inappropriate activities.
  • Optimize business processes, workflows, and service operations by building efficient on-call processes and streamlining alerting workflows.
  • Leverage operational data to automate systems administration, operations and incident response processes to improve enterprise reliability to manage IT environment complexity.
  • Works with LSA, Lab Manager, and CM to compose technical documents including Design, Deployment, System specifications and Host Nation baselines, updates, user's manuals, training materials, installation guides, proposals, and reports.
  • Work with the OM to implement ITSM best practices for ICA/Service discrepancy and reporting, issue resolution and operations support to include Tier 2/3 escalation.

Required Skills:

  • Programming: Proficiency in at least one programming language (e.g., Python, Go, Java, or JavaScript) is essential for automating tasks and developing tools.
  • Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including command-line tools and system administration tasks.
  • Networking: Understanding of network protocols, infrastructure, and troubleshooting techniques.
  • Database Management: Familiarity with database technologies and principles.
  • Automation: Experience with automation tools and techniques, such as configuration management (e.g., Ansible, Puppet, Chef) and orchestration (e.g., Kubernetes).
  • Monitoring and Logging: Experience with monitoring tools and logging systems.
  • Problem-Solving: Strong analytical and problem-solving skills to diagnose and resolve system issues.
  • Communication: Ability to communicate technical information clearly and concisely to both technical and non-technical audiences.
  • Collaboration: Ability to work effectively with cross-functional teams, including software developers and operations personnel.

Desired Skills:

  • Cloud Technologies: Experience with cloud platforms (e.g., AWS, Google Cloud, Azure).
  • Containerization: Knowledge of containerization technologies (e.g., Docker, Kubernetes).
  • DevOps Principles: Understanding DevOps principles and practices.
  • Service Level Objectives (SLOs) and Service Level Agreements (SLAs): Experience with defining, tracking, and managing SLOs and SLAs.
  • Data Analysis: Experience with data analysis and visualization tools.

Desired Certs:

  • Global Skill Development Council (GSDC) Site Reliability Engineering (SRE) Foundation Certification (CSREF).
  • AWS Certified SysOps Administrator - Associate.
  • Google Cloud Certified Professional Cloud Architect.
  • Azure Certified Solutions Architect Expert.

At Nightwing, we value collaboration and teamwork. You'll have the opportunity to work alongside talented individuals who are passionate about what they do. Together, we'll leverage our collective expertise to drive innovation, solve complex problems, and deliver exceptional results for our clients.


Thank you for considering joining us as we embark on this new journey and shape the future of cybersecurity and intelligence together as part of the Nightwing team.

Nightwing is An Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status, age or any other federally protected class.