Senior Site Reliability Engineer
- Expired: over a month ago. Applications are no longer accepted.
As a Site Reliability Engineer on our team, you have the chance to use your hardware and software skills to improve the technology supporting the VA. You'll work with onsite incident and event management, problem management, and DevOps teams to detect, investigate, and diagnose system problems and defects across Enterprise level applications and technology stacks (most important item) and evaluate and modernize VA Enterprise systems. This is your chance to lead a team and develop your skills in enterprise-level triage and incident resolution while gaining experience in VA system infrastructure. Grow your skills by merging system knowledge and the use of modern system monitoring tools to improve VA enterprise reliability and improve the quality of services provided to veterans. As an SRE system engineer, you will be on the ground floor working with system and application owners to obtain existing design and functionality, leverage comprehension of workflow systems and applications processes within multiple system environments and work across technology and development teams to diagnose outages and recommend changes to increase reliability. You will work with application developers, system administrators, cyber security/identify access management and network administrators to troubleshoot performance issues and outages. This position is open to remote delivery anywhere within the U.S., to include the District of Columbia and may include shift work support during weekends, holidays, or off-hours, as required.
- 10+ years of experience in one or more Technology Areas (Network, Windows, Desktop, Unix/Linux, AWS or Azure Cloud, WebSphere Middleware, Java/JS Development, Microsoft or Oracle Database)
- 5+ years of experience working with key indicators for IT system operability, reliability, application performance and code quality
- Experience deploying, maintaining, and troubleshooting complex applications at an enterprise scale while working with cross-functional teams
- 3+ years monitoring and troubleshooting experience with one or more of the following APM tools, SolarWinds, AppDynamics, DynaTrace, Aternity, or ServiceNow Operator Workspace.
- Experience monitoring and troubleshooting application logging using Splunk
- Experience in service virtualization, AWS or Azure Cloud technologies, and SaaS and PaaS implementation.
- Experience leading a team to solve difficult technical challenges
- Experience with using Microsoft Office, including Word, Excel, and PowerPoint
- Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
- Master’s Degree in Computer Science, Engineering, or Equivalent and 10 total years of experience; or 20 total years of experience in lieu of a degree
Required Education Level:
- computer science, electronics engineering or other engineering or technical discipline is required.
- 10 Years
- 10 years of additional relevant experience may be substituted for education
Nice to Have:
- Experience with test-driven development, distributed systems, microservices and cloud-native application implementation
- Experience with the following tools: ScienceLogic SL-1, Riverbed – Aternity, and ServiceNow
- Possession of excellent written and verbal communication skills
- Possession of strong critical thinking and error assessment capabilities
- Virtual team management
- Public Trust Clearance
- you need to meet eligibility requirements of the U.S. government client.
Get fresh SITE Reliability Engineer jobs daily straight to your inbox!
You Already Have an Account
We're sending an email you can use to verify and access your account.
If you know your password, you can go to the sign in page.