Senior Site Reliability Engineer (Remote - US) at Jobgether

Source: https://jobs.workable.com/view/mv8LvFEM1gwBYxGmovqnLd/senior-site-reliability-engineer-(remote---us)-in-united-states-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior Site Reliability Engineer (Remote - US) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in the United States.. This role is designed for an experienced Site Reliability Engineer (SRE) responsible for maintaining high levels of availability, performance, and reliability for mission-critical services. You will work closely with engineering, DevOps, and support teams to ensure systems are secure, scalable, and efficient. The role emphasizes proactive monitoring, incident response, and continuous improvement initiatives to optimize Mean Time to Detect (MTTD) and Mean Time to Restore (MTTR). You will be involved throughout the service lifecycle—from deployment planning and SDLC participation to root cause analysis and automation of repetitive tasks. This position provides the opportunity to work in a dynamic, fast-paced environment, mentor team members, and influence operational excellence across multiple services.. . Accountabilities. · Maintain service availability, performance, reliability, and security for mission-critical systems.. · Proactively monitor production environments and respond quickly to incidents, minimizing downtime.. · Troubleshoot, debug, and escalate technical issues to ensure maximum customer satisfaction.. · Participate in incident management, root cause analysis, and post-incident reviews to prevent recurrence.. · Implement automation initiatives to enhance Mean Time to Restore (MTTR) and Mean Time to Detect (MTTD).. · Collaborate with engineering and DevOps teams to ensure SLAs, service reliability, and operational efficiency.. · Plan and deploy patches, product enhancements, and changes in alignment with IT Service Operations standards.. · Maintain and improve operational documentation, SOPs, and processes to support service operations.. · Mentor and coach other SRE team members, promoting best practices and technical excellence.. . · 4–5+ years of software development, technical operations, or SRE/DevOps experience.. · Experience maintaining and operating large-scale production systems with >99.95% SLA on cloud platforms.. · Strong expertise in monitoring, logging, and application performance tools (Grafana, CloudWatch, APMs).. · Hands-on experience with CI/CD pipelines (Git, Jenkins, Harness) and container technologies (Kubernetes, Docker).. · Proficiency with both Windows and Linux operating systems.. · Strong knowledge of AWS cloud services, including serverless and containerized workloads.. · Experience defining, monitoring, and improving application resilience and reliability.. · Excellent troubleshooting, root cause analysis, and problem-solving skills.. · Exceptional communication skills across teams and geographic boundaries.. · ITIL, HDI, or AWS cloud certifications are a plus.. · Comfortable working some non-standard hours to support a global team.. . Company Location: United States.