Job Summary:
Spatial Front, Inc. is seeking a SRE Engineer to support their growing team. The SRE Engineer will improve the reliability, availability, performance, and operational resilience of mission-critical systems for a federal enterprise program.
Responsibilities:
• Define, implement, and maintain site reliability engineering practices for mission-critical applications and shared services, with emphasis on uptime, resiliency, recoverability, and operational excellence.
• Establish and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for critical services and environments.
• Implement and maintain monitoring, alerting, and observability solutions for production systems.
• Support production and pre-production operations across development, test, training, staging, and production environments.
• Lead incident response activities, conducting root cause analysis and implementing permanent fixes.
• Support capacity planning, performance analysis, trend monitoring, and scalability planning for enterprise platforms and services.
• Create and maintain runbooks, standard operating procedures, incident playbooks, operational dashboards, and knowledge articles.
• Support high availability, disaster recovery, backup/restore validation, and business continuity activities.
• Develop and implement automation to reduce manual operational toil and improve system reliability.
• Contribute to post-deployment validation, smoke testing, rollback readiness, and environment health checks during releases and maintenance windows.
• Collaborate with teams supporting Oracle/PeopleSoft platforms, integration services, reporting services, and shared enterprise tooling to improve reliability end to end.
• Collaborate with development teams to improve system reliability through design reviews and reliability engineering practices.
• Perform capacity planning and performance optimization for production systems.
• Other duties as assigned.
Qualifications:
Required:
• Bachelor's in Computer Science, Engineering, or related field.
• 5 years software engineering, 3 years site reliability engineering, production support engineering, or platform reliability for enterprise systems, 1 year unix/solaris experience.
• Experience supporting enterprise applications in a high-availability, security-conscious, and compliance-driven environment.
• Experience creating operational documentation, runbooks, and incident response procedures.
• Strong troubleshooting skills across application, middleware, integration, and infrastructure layers.
• Strong verbal and written communication skills, including the ability to work across engineering, security, testing, and program stakeholders.
• Demonstrated expertise in: Site reliability engineering, monitoring, automation, incident response, performance optimization; experienced with UNIX/Solaris.
• Must be a U.S. Citizen.
• Must possess an active Secret security clearance or be able to obtain one.
Preferred:
• DevOps Engineer or equivalent SRE certification.
• Experience supporting environments subject to RMF, STIG, audit, ATO, or similar compliance requirements.
• Experience with Splunk, enterprise monitoring/observability tooling, or similar operational analytics platforms.
• Experience supporting Oracle-based enterprise environments, including Oracle middleware, Oracle Database, or related platform services.
• Experience supporting PeopleSoft or similarly complex ERP / HCM / payroll platforms.
• Exposure to F5, Oracle Data Guard, Oracle GoldenGate, Kafka, or other enterprise integration / traffic / replication technologies.
• Familiarity with scripting and automation using tools such as Shell, Python, or PowerShell.
• Knowledge of DevOps, testing and scanning tools esp. within PeopleSoft environment such as PHIRE, PFT, Tricentis, Palo Alto, CAST etc.
• Experience as an SRE supporting DoD or federal agency programs.
• Familiarity with UNIX/Solaris administration and systems programming.
• Experience with observability platforms such as Prometheus, Grafana, Datadog, or Splunk.
Company:
SFI effectively delivers the right Information Technology solutions and Business Support services using thoughtful analysis, strategic planning and precise execution. Founded in 2008, the company is headquartered in Mc Lean, USA, with a team of 501-1000 employees. The company is currently Late Stage.