Senior Site Reliability Engineer (Remote) at Jobgether

Source: https://jobs.workable.com/view/g5XxL8zFDToP8pvQws39eM/senior-site-reliability-engineer-(remote)-in-united-states-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior Site Reliability Engineer (Remote) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in the United States.. We are seeking a Senior Site Reliability Engineer to help shape and maintain the technical foundations of a high-growth platform. In this role, you will design and operate large-scale, secure, and highly available infrastructure while ensuring teams can deliver product features safely and efficiently. You will champion observability, CI/CD, and automation practices, enabling rapid innovation across engineering. This position offers the opportunity to work closely with senior leaders, partner with cross-functional teams, and have a measurable impact on platform reliability, scalability, and performance in a fast-paced, collaborative environment. The role requires both strategic thinking and hands-on technical expertise.. Accountabilities. . Design, implement, and maintain scalable and secure cloud infrastructure on AWS.. . Build and manage automation tools and self-service methods for infrastructure management (e.g., Terraform, CI/CD pipelines).. . Partner with engineering, QA, and FinOps teams to enable fast, safe, and cost-effective deployments.. . Own observability systems, establishing best practices for performance monitoring and service reliability across the organization.. . Contribute to disaster recovery, incident management, and risk mitigation strategies.. . Collaborate with teams to optimize infrastructure costs and improve operational efficiency.. . Provide mentorship and guidance to engineering teams, fostering a culture of reliability and engineering rigor.. . . BS or MS in Computer Science or related technical field, or equivalent experience.. . 5+ years of experience in a dedicated Site Reliability Engineering role.. . Strong experience with distributed systems and cloud infrastructure (AWS: EC2, RDS, EKS, CloudFront, ECR, S3, IAM, Lambda, Route53).. . In-depth knowledge of Kubernetes, including deployment, scaling, orchestration, and automation for teams.. . Experience building automation tools and using at least one programming language (e.g., Python, Golang, Rust) for infrastructure solutions.. . Proven ability to analyze systems, identify performance bottlenecks, and implement improvements.. . Strong communication and collaboration skills with cross-functional teams.. . Bonus Points:. . Familiarity with AWS Well-Architected Framework.. . Experience with event-driven systems and messaging technologies (Kafka, NATS, Aeron).. . Knowledge of security frameworks and infrastructure hardening.. . Experience with diverse database architectures or Elasticsearch management.. . Understanding of Ruby on Rails ecosystem and its operational considerations.. . Company Location: United States.