Want to improve this content? Edit this content
Senior Site Reliability Engineer @ Blameless

Blameless is an end-to-end Site Reliability Engineering (SRE) platform that enables and accelerates proactive and reactive incident management so engineering teams can balance reliability and innovation. Our software helps you set and monitor SLOs and SLAs, coordinate and automate incident response, identify root causes, and create a culture of learning and improvement across your organization. Our platform includes a bot that automates to faster resolution, an API to track and manage SLAs, and a web app to stay on top of key metrics, manage problems, track action items, and assess the reliability of your business. Blameless, based in San Mateo, is backed by Accel and Lightspeed Ventures. 

We’re looking for an experienced Site Reliability Engineer to join our SRE team and contribute to a software platform that helps organizations and engineers streamline their reliability efforts. Our SREs play an integral role in architecting, building and iterating on our resilient, scalable systems.

You’ll join our passionate team and help us launch reliable features and services. Want to work with the latest and greatest tools and frameworks to support our growing backend and infrastructure needs?  Are you self-motivated and comfortable working in a fast-paced environment?  Then SRE at Blameless is the place for you!

In this role, you will:

  • Help manage our cloud infrastructure (AWS, GCP and Azure), scripting/coding (BASH, Python and Go), infrastructure-as-code (Terraform) and the CloudNative ecosystem (Kubernetes, Prometheus, Helm, etc.).
  • Develop infrastructure to deploy onto, ensuring scalability and resiliency
  • Improve processes to help optimize our ability to deploy quickly while maintaining high quality systems
  • Build tools and automation to fill the gaps in our current systems
  • Assist with incidents and support our Engineering team in dog fooding our Blameless incident orchestration
  • Conduct postmortems to ensure constant improvement

About you:

  • 3+ years of experience as an Site Reliability Engineer
  • 2+ years of experience with Kubernetes and managing cloud resources.
  • 2+ years of managing cloud infrastructures (AWS, GCP, or Azure)
  • 2+ years of experience working with BASH, Python, or GO
  • Experience working with Terraform or similar tools
  • Experience collecting and processing metrics from tools such as Prometheus/Datadog/NewRelic
  • A strong understanding of SLOs and SLIs
  • Experience building and working on deployment systems

The Impact of this Role:

  • Help us design and build our early-stage company
  • Accelerate our efforts towards product-market fitInfluence an open, productive and effective culture
  • Learn about operating a startup, fund-raising and scaling teams
  • Grow your career exponentially by joining a rocket ship

Blameless is a rapid growth startup headquartered in San Mateo.  As an equal opportunity employer, we are committed to a team defined and empowered by diversity. We consider qualified applicants without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status or disability status. https://www.blameless.com/about

 

Active: Yes
Last Modified: 2020-11-3 4:41:20
Contributors of this content: jobs