Site Reliability Engineer - II at Coursera

Source: https://www.workingnomads.com/jobs/site-reliability-engineer-ii-coursera

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Site Reliability Engineer - II at Coursera. Location Information: Canada. Job Overview:. Our SRE team is part of the Coursera Infrastructure group that builds the foundation that keeps Coursera reliable, scalable, and efficient. We partner with product and platform teams to deliver resilient systems through automation, observability, and operational excellence. From incident response to infrastructure as code, we enable fast, safe, and cost-aware delivery of global learning experiences. We are hiring an IC3 Site Reliability Engineer (SRE) based in Canada to join our SRE team. This role will support reliability, observability, infrastructure automation, and cost optimization efforts across multiple services. The engineer will work closely with senior SREs to build scalable and efficient systems using our AWS-based tech stack, and gain hands-on experience with real-world SRE projects. Joining this team means working on high-impact projects that keep Coursera running smoothly for millions of learners and partners.. Responsibilities:. Contribute to building and maintaining observability systems (e.g., metrics, logs, dashboards). Assist in automating infrastructure provisioning, system configuration, and reducing toil. Participate in on-call rotations and support incident response processes. Collaborate with senior engineers on improving the reliability and scalability of services. Implement cost monitoring tools and assist in cloud resource optimization. Support disaster recovery planning, compliance tasks, and documentation. Basic Qualifications:. 2+ years of experience in Site Reliability, DevOps, or Backend Engineering roles. Hands-on experience with at least one cloud platform (e.g., AWS, GCP, Azure)Experience with monitoring and logging tools (e.g., Datadog, CloudWatch, SumoLogic, Graphana). Familiarity with Infrastructure as Code tools (e.g., Terraform, Ansible). Experience writing automation scripts and backend systems in Java, Python, Bash or similar languages. Preferred Qualifications:. Exposure to incident management processes and tools (e.g., PagerDuty). Familiarity with containerized infrastructure (e.g., Docker, Kubernetes). Experience working on cost visibility or optimization in cloud environments. Knowledge of version control systems and CI/CD practices. Experience contributing to disaster recovery or multi-region infrastructureKnowledge of security/compliance practices (e.g., audit logging, access controls). If this opportunity interests you, you might like these courses on Coursera:. Site Reliability Engineering: Measuring and Managing Reliability. – Learn SRE fundamentals including SLIs, SLOs, and error budgets. Introduction to Cloud Computing. – Understand core cloud concepts, including AWS services and architecture. Getting Started with Terraform for Cloud Infrastructure Automation. – Learn infrastructure-as-code using Terraform with hands-on AWS examples. Compensation:. Coursera offers competitive pay and equitable compensation practices. Our job titles may span more than one career level. The targeted hiring base salary range for this role is between CAD $113,600 - 170,400 for all Canada candidates. The actual base pay is dependent upon many factors, including but not limited to prior work experiences, training/education, transferable skills, business needs, and geographical location. The base pay range is subject to change and may be modified in the future. This role may also be eligible for variable pay, equity, and benefits.