Job Title: Cloud Operations LeadLocation: Rockville, MD | Princeton, NJ | New York City, NY
Employment Type: Full-Time (Onsite/Hybrid as required)
Note: This is a 100% hands-on technical role. Architect-level profiles will not be considered.We are seeking a highly skilled and technically hands-on
Cloud Operations Lead to manage and optimize our multi-cloud infrastructure, with a primary focus on
AWS environments. The ideal candidate will have extensive experience in
cloud infrastructure management,
automation, and
hands-on support for critical cloud services. You will lead day-to-day operations, ensure reliability, drive automation, and enforce best practices across our cloud environments.
Primary Focus Areas- AWS Control Tower, Organizations, and policy management
- Multi-account deployment and governance
- Detailed expertise in AWS Backup, SSM Patching, and AMI deployments
- Automation for AMI rollout and configuration across accounts
- Deep hands-on in AWS core services: EC2, ECS, EKS, RDS, S3, CloudFront, Lambda, SageMaker, etc.
- Managing S3, SFTP, and site externalization
- Infrastructure as Code (IaC) using Terraform, CloudFormation, and Python
- Strong knowledge of IAM, access controls, and resource-based policy enforcement
Key Responsibilities- Lead and manage cloud infrastructure operations ensuring high availability, security, and performance
- Serve as primary escalation point for cloud operational issues
- Maintain AWS environments following best practices around cost, security, and performance
- Lead and manage incident response, perform RCA, and implement preventative measures
- Design and implement cloud automation using IaC and scripting
- Mentor cloud engineers, review code/configurations, and guide operational best practices
- Implement monitoring and alerting systems for proactive issue resolution
- Ensure regulatory compliance (e.g., GDPR, HIPAA) and enforce cloud governance standards
- Drive cloud cost optimization efforts including tagging, budgeting, and forecasting
- Develop and maintain disaster recovery and business continuity plans
- Create and maintain technical documentation, SOPs, and runbooks
- Collaborate with cross-functional teams including Security, DevOps, and App Engineering
Required Qualifications- Bachelor's degree in Computer Science, Information Technology, Engineering, or related field
- 7+ years of total IT experience, with 3+ years in cloud operations leadership roles
- Strong hands-on experience in AWS; additional exposure to Azure and Oracle Cloud (OCI) a plus
- Experience with Terraform, CloudFormation, Python, PowerShell, and related automation tools
- Proven experience in managing multi-account cloud environments, CI/CD pipelines, and backup/recovery setups
- Deep understanding of IAM, network security, encryption, and secure cloud design
- Familiarity with Windows/Linux server administration, VMWare, Active Directory, and Azure AD SSO
- Strong networking fundamentals – DNS, DHCP, LAN/WAN, and PKI
Preferred Certifications- AWS Certified Solutions Architect – Associate or Professional (Required)
- Microsoft Certified: Azure Architect Technologies (Preferred)
- OCI Certifications (Preferred)
- ITIL Foundation or related service management experience (Preferred)
Required Technical Skills- AWS core services: EC2, EKS, ECS, Lambda, S3
- IaC tools: Terraform, CloudFormation
- Scripting/Automation: Python (Required), PowerShell, Bash
- Experience with DevOps tools: Git, Jenkins, Ansible, CI/CD (Preferred)
Soft Skills & Traits- Strong communication and stakeholder engagement skills
- Proven ability to lead, mentor, and manage technical teams
- Analytical mindset with strong troubleshooting and root cause analysis abilities
- Ability to handle high-pressure incidents with calm, structured responses
- Driven by continuous improvement and automation
Skill Matrix Template – Full Name:Degree Major with University and Completion Year:Total Experience in Cloud Infrastructure / IT Operations:Total Experience in AWS Cloud Operations (Hands-on):Which AWS Services have you extensively worked with? (e.g., EC2, EKS, ECS, RDS, Lambda, S3, CloudFront, SageMaker, etc.):Experience with AWS Organizations / Control Tower / SCP Policies:Experience with Multi-Account Management and Deployment (e.g., Config pushing, AMIs):Experience with AWS S3, SFTP & Site Externalization Methods:AWS Backup and SSM Patching Process Experience (Detail your involvement):Experience with AMI Creation, Deployment, and Configuration across Accounts:Infrastructure as Code (IaC) Proficiency: (Terraform, CloudFormation – Please describe experience & tools used):Python Scripting Experience:Experience in Incident and Problem Management (RCA, Incident Communication):Cloud Monitoring and Reporting Tools Used (e.g., CloudWatch, Dynatrace, PowerBI):Experience in Leading Cloud Teams or Managing Technical Engineers:Experience with Security in Cloud (IAM policies, Access Management, Encryption, Compliance):Experience with Disaster Recovery / Business Continuity Planning in Cloud Environments:Have you worked in a multi-cloud environment (AWS, Azure, OCI)?Experience with CI/CD and DevOps Toolchains (e.g., Git, Jenkins, Ansible, etc.):Experience with VMWare, Active Directory, Azure AD SSO Integration:Networking Experience (DNS, DHCP, LAN/WAN, PKI, etc.):Cloud Certifications (e.g., AWS Certified Solutions Architect, Azure, OCI):Motivation/Reason for Interest in This Role:Motivation/Reason for Relocation (if not local to job location):Contact Number:Email ID:LinkedIn Profile URL:Full Address (Street, City, State, ZIP Code):Notice Period (in weeks):Current Work Authorization Status (e.g., US Citizen, Green Card, H1B, etc.):Expected Salary:Are you willing to relocate at your own expense and work hybrid at the specified location Rockville, MD / Princeton, NJ / NYC, NY? (Yes/No):