Sr. Site Reliability Engineer hims & hers. About the Role:. We are seeking a Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things.. You Will:. Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience. Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.. Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams. Manage Infrastructure through automation (Infrastructure as Code). Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments. Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed. Improve the codebase by resolving logic issues, deprecating unused code, etc.. Implement monitoring, logging, alerting and SLO Reporting. Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives.. Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence.. You Have:. 8+ years as a software engineer, shipping production code.. 5+ years of experience as a Site Reliability Engineer. . Experience with service-oriented architectures and microservices at scale. Strong proficiency with RDBMS databases (PostgreSQL, MySQL, SQL Server, etc.). Strong proficiency in SQL scripting. Proficiency developing in one or more languages such as Java, Kotlin, Python, and/or others. Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.). Proficiency in Git or other VCS. Experience with configuring, customizing, and extending monitoring tools (Datadog, Prometheus, New Relic etc.). Excellent debugging and troubleshooting skills. Strong technical competency, with a data-driven analytical approach towards solving complex challenges. Have a systematic problem-solving approach, coupled with strong and effective communication skills and a sense of drive. Nice-to-have: Experience with Terraform or other IAC tools such as Chef, Puppet or Ansible. Our Benefits (there are more but here are some highlights):. Competitive salary & equity compensation for full-time roles. Unlimited PTO, company holidays, and quarterly mental health days. Comprehensive health benefits including medical, dental & vision, and parental leave. Employee Stock Purchase Program (ESPP). Employee discounts on hims & hers & Apostrophe online products. 401k benefits with employer matching contribution. Offsite team retreats. . #LI-Remote.
Sr. Site Reliability Engineer at hims & hers