Principal Site Reliability Engineer (SRE) at Jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Principal Site Reliability Engineer (SRE) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Site Reliability Engineer (SRE) in United States.. We are seeking a highly skilled Principal Site Reliability Engineer to drive the reliability, scalability, and security of large-scale cloud platforms. In this role, you will shape the technical direction of infrastructure systems, automate complex operational tasks, and embed reliability best practices across engineering teams. You will act as the go-to expert for AWS architecture, champion observability and performance standards, and build the automation that empowers developers to move quickly with confidence. This is a high-impact opportunity to influence platform strategy, mentor engineers, and ensure mission-critical systems remain secure, resilient, and cost-efficient.. . Accountabilities. . Design and operate multi-region, fault-tolerant systems to ensure high availability and resilience.. . Architect and optimize enterprise-scale AWS environments for scalability, reliability, and cost efficiency.. . Implement Infrastructure as Code libraries and automated CI/CD pipelines to accelerate delivery and reduce operational toil.. . Establish and enforce security guardrails, policy-as-code, and compliance best practices to protect sensitive data.. . Define SLIs/SLOs, manage error budgets, and drive observability maturity using Prometheus, Grafana, and AWS-native tools.. . Deploy and manage service mesh solutions to secure and monitor service-to-service communication across Kubernetes workloads.. . Provide technical leadership, conduct design reviews, and mentor engineers in SRE, DevSecOps, and cloud-native best practices.. . Partner with engineering and security stakeholders to shape platform strategy and drive operational excellence.. . . . 10+ years of experience in SRE, DevOps, or Platform Engineering roles managing production AWS workloads.. . Deep expertise in AWS services, EKS, Kubernetes networking, Helm, autoscaling frameworks, API Gateways, and serverless architectures.. . Proficiency in Infrastructure as Code (AWS CDK, Terraform, CloudFormation) and automation scripting in Go, Python, or TypeScript.. . Hands-on experience implementing service mesh (Istio, Linkerd, or AWS App Mesh) and policy-as-code (OPA/Rego) solutions.. . Strong knowledge of cloud security best practices including IAM, encryption, OS hardening, and compliance enforcement.. . Skilled in setting SLIs/SLOs, managing error budgets, and building monitoring/alerting stacks (Prometheus, Grafana, CloudWatch).. . Experience with CI/CD pipelines (Harness, GitHub Actions) and Internal Developer Portals (Backstage, Port, Cortex) is a strong plus.. . Excellent communication skills, with the ability to influence technical strategy and guide post-incident reviews.. . Proven ability to mentor engineers and elevate technical standards across teams.. . Company Location: United States.