Site Reliability Engineer GitLab. An overview of this role. The GitLab DevSecOps platform empowers 100,000+ organizations to deliver software faster and more efficiently. We are one of the world’s largest all-remote companies with 2,000+ team members and values that foster a culture where people embrace the belief that everyone can contribute. Learn more about Life at GitLab.. SREs with Gitaly work alongside Backend Engineers with a focus primarily on improving the availability and the reliability of the Gitaly fleet on GitLab.com. While the backend engineers approach their responsibilities from a software developer point of view, the SREs approach the same problems from the operational perspective and collaborate closely on finding an optimal solution, in addition to ensuring that new Gitaly features can run at scale and deployed to production safely.. Gitaly is the Git data storage tier of GitLab, providing a reliable, secure and fast distributed Git data store over gRPC. For more information about Gitaly, see the team’s Direction page. . Gitaly’s high-availability storage requires developers who understand distributed storage systems, their management, observability and availability. Cluster team contributes features, fixes bugs and improves performance of this software stack.. Currently, we're building a new distributed cluster solution and improvements to our Disaster Recovery readiness.. What you’ll do. . Work with peer SREs to maintain Gitaly’s environments within GitLab’s SaaS offerings, including cost and performance optimization, capacity planning, migrations and debugging production issues.. Participate in architectural discussions and decisions surrounding Gitaly, within the greater GitLab ecosystem.. Design RPC interfaces for the Gitaly service.. Scope, estimate and describe tasks to reach the team’s goals.. Develop production automation and tooling for Gitaly, for use both in SaaS and self-managed installations.. Help ensure that Gitaly development tooling, releases and other processes serve the team and the product’s goals.. Develop Gitaly in accordance with the product’s goals and a focus on reliability and maintainability.. Instrument, monitor and profile Gitaly in the production environment.. Build dashboards and alerts to monitor the health of your services.. Conduct acceptance testing of the features you’ve built.. Educate all team members on best practices relating to high availability.. Write performant, maintainable, and elegant code and peer review others’ code.. Be positive and solution-oriented.. Constantly improve the quality & security of the product.. Take initiative in improving the software in small or large ways to address pain points in your own experience as a developer.. Qualify developers for hiring.. Respond to user emergencies, platform alerts and support requests, including regular on-call duties.. What you’ll bring . . Mandatory: experience running highly-available systems in production environments at scale.. Mandatory: hands-on experience with Cloud technologies including Kubernetes.. Mandatory: proven professional experience building, debugging, optimizing software in large-scale, high-volume environments.. Mandatory: proven professional experience writing and testing high-quality code.. Mandatory: a good understanding of building instrumented, observable software systems.. Highly desirable: Experience with Terraform infrastructure as code.. Highly desirable: proven professional experience writing and testing quality code in Go.. Highly desirable: a good understanding of git’s internal data structures or experience running git servers.. Highly desirable: experience with gRPC.. Highly desirable: willingness to learn Ruby.. About the team. The Gitaly team owns and runs services that handle all Git operations on GitLab.com, one of the largest open source SaaS sites on the Internet. This means we are constantly faced with solving unique performance, scalability, and cost challenges that impact our users every day. Our future is about shipping improvements that can scale both GitLab.com from an infrastructure perspective, as well as deploying new features that will scale with the growing size of repositories across the industry.
Site Reliability Engineer at GitLab