Site Reliability Engineer at Sur

Source: https://jobs.workable.com/view/q32HzWwy9x1yCkvvor8ZUz/remote-site-reliability-engineer-in-s%C3%A3o-paulo-at-sur

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Site Reliability Engineer at Sur. Our US based client is looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering their secure, mission-critical SaaS platform. . You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to respond to incidents quickly, support ongoing automation, and scale systems reliably.. Responsibilities. . Be part of the team that owns the uptime and performance of our core backend infrastructure (Windows + Linux). . Maintain and enhance observability across systems using Kibana, CloudWatch, and custom telemetry. . Manage CI/CD pipelines, infrastructure as code (Terraform, Ansible), and deployment automation. . Support and maintain production Windows environments:. . . .NET Framework/Core apps running in IIS. . SQL Server with AlwaysOn replication and Service Broker-based messaging. . . Support and operate cloud-native services:. . AWS Lambdas, DynamoDB, Postgres/Aurora, Redshift, Redis, and containerized workloads in Docker. . Participate in on-call rotation and incident response. . Collaborate closely with engineering teams to improve system reliability and deployment workflows. . . 5+ years of SRE, DevOps, or WebOps experience supporting production SaaS systems. . Strong experience with Windows Server, IIS, and .NET applications in production. . Hands-on experience with SQL Server administration, including AlwaysOn and Service Broker. . Proficiency in AWS operations, including Lambda, DynamoDB, CloudWatch, and IAM. . Familiarity with Postgres, Redis, Kibana/ElasticSearch, and centralized logging. . Experience with Docker, Terraform, and Ansible for infrastructure management. . Strong scripting skills (PowerShell, Python). . Experience running and debugging containerized and distributed systems in production. . Excellent incident response and debugging skills. . Company Location: Brazil.