Senior Site Reliability Engineer (Remote - US) at Jobgether

Source: https://jobs.workable.com/view/eqLp7Qx8nmDZeuwee7a2KR/senior-site-reliability-engineer-(remote---us)-in-united-states-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior Site Reliability Engineer (Remote - US) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a . Senior Site Reliability Engineer. in the . United States. .. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the stability, performance, and scalability of our platform. You will design, build, and maintain automated, resilient systems while partnering closely with development teams to improve observability, deployment pipelines, and overall developer productivity. Your work will directly impact reliability, security, and operational excellence across the company’s systems. This role combines hands-on engineering with strategic planning, empowering you to proactively prevent issues and continuously improve infrastructure. You will thrive in a collaborative, growth-oriented environment where learning and mentorship are valued, and reliability is treated as a core principle.. Accountabilities. In this role, you will be responsible for:. . Ensuring high availability and resilience of production systems, anticipating and preventing potential issues.. . Building, improving, and maintaining automation for infrastructure provisioning, deployments, and system operations.. . Leading incident management, participating in on-call rotations, troubleshooting production incidents, and driving post-incident reviews.. . Identifying bottlenecks and improving system performance while defining operational standards.. . Implementing best practices for cloud infrastructure, identity, and access management to support security and compliance.. . Leading projects and collaborating with developers to enhance observability, deployment pipelines, and overall operational efficiency.. . Continuously evaluating systems and processes, recommending improvements to enhance reliability and developer experience.. . The ideal candidate will have:. . 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.. . Deep experience with cloud platforms, preferably AWS.. . Expertise in infrastructure-as-code tools such as Terraform.. . Strong knowledge of containers, orchestration, and CI/CD pipelines.. . Experience with monitoring and observability tools.. . Proficiency in at least one coding or scripting language.. . Solid understanding of networking, distributed systems, and systems-level troubleshooting.. . A growth mindset with a willingness to learn, mentor, and share knowledge with peers.. . Preferred qualifications:. . Experience with chaos engineering and resilience testing.. . Familiarity with compliance frameworks such as SOC 2, ISO 27001, or PCI DSS.. . Experience in the events or related industries.. . Company Location: United States.