Job SummaryThe Senior DevOps / Kubernetes Platform Engineer is responsible for designing, building, and operating cloud-native platforms on AWS and Alibaba Cloud. This role focuses on Kubernetes-based platforms, GitOps-driven deployments, observability, networking, security, and large-scale production troubleshooting to ensure highly available and scalable systems.
Key Responsibilities- Design, build, and operate AWS and Kubernetes-based platforms for enterprise workloads.
- Define and implement GitOps-first deployment strategies using ArgoCD, with Spinnaker for advanced delivery workflows.
- Provision, configure, and optimize Alibaba Cloud (AliCloud) resources including ECS, VPC, SLB, OSS, RDS, and Function Compute.
- Build and support Kubernetes platforms, including core Kubernetes stack and Rubix platform.
- Implement and manage observability solutions using Prometheus, Grafana, and CI/CD tools.
- Design and support cloud networking including VPCs, Ingress controllers, load balancers, SSL/mTLS, certificate management, gRPC, REST, HTTP/JSON.
- Provide hands-on Linux/UNIX administration and scripting using Python and Shell.
- Support and troubleshoot Java-based applications running in cloud-native environments.
- Work with databases and caching technologies including Couchbase, Cassandra, Oracle, PostgreSQL, ElasticSearch, Solr, SNS/SQS.
- Partner with cross-functional teams to resolve production issues and ensure platform reliability.
- Participate in incident response, root cause analysis, and continuous improvement initiatives.
Required Skills & Experience- Strong experience with Kubernetes and cloud-native architectures.
- Hands-on expertise with Alibaba Cloud (AliCloud) core services:
- ECS
- OSS
- RDS
- VPC
- SLB
- Function Compute
- Experience with AWS cloud platforms.
- Strong knowledge of GitOps, ArgoCD, and Spinnaker.
- Experience with observability tools such as Prometheus and Grafana.
- Solid understanding of networking and security protocols (Ingress, Load Balancers, SSL/mTLS).
- Hands-on experience with Linux/UNIX, Python, and Shell scripting.
- Experience with NoSQL and relational databases (Couchbase, Cassandra, Oracle, PostgreSQL).
- Strong troubleshooting skills in distributed and Java-based systems.
- Excellent communication and coordination skills.
Competencies- DevOps & Platform Engineering
- Kubernetes & Cloud-Native Architecture
- GitOps & CI/CD Automation
- Observability & Reliability Engineering
- Cloud Networking & Security
- Production Support & Troubleshooting
Preferred Skills- Experience with Rubix platform
- SRE practices and incident management
- Multi-cloud experience (AWS + Alibaba Cloud)
- Exposure to enterprise-scale migration programs