Site Reliability Engineer (Remote - On Contract) at Jobgether

Source: https://jobs.workable.com/view/hwt6Tdyd9hgjJSGLqYcN3g/site-reliability-engineer-(remote---on-contract)-in-india-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Site Reliability Engineer (Remote - On Contract) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in India.. We are seeking a highly skilled Site Reliability Engineer to ensure the resilience, observability, and continuous improvement of disaster recovery environments. In this role, you will collaborate with cross-functional teams including DR architects, security, infrastructure, and engineering to define and maintain SLIs/SLOs, reduce operational toil, and drive platform reliability initiatives. You will lead chaos engineering exercises, implement automation for failover and recovery, and participate in failover/failback simulations to validate system robustness. This is an opportunity to work in a fast-paced, innovative environment, optimizing critical cloud infrastructure across Azure, AWS, and private cloud platforms while contributing directly to operational excellence. The role emphasizes proactive problem-solving, collaboration, and a strong focus on system performance and reliability.. . Accountabilities. Design, build, and maintain observability dashboards and proactive alerting systems for DR environments across multiple cloud platforms.. Define and monitor Service Level Indicators (SLIs) and Error Budgets aligned with RPO/RTO targets.. Collaborate on runbook automation, synthetic testing, and validation pipelines to ensure DR readiness.. Lead chaos engineering exercises and game-day simulations to proactively identify system weaknesses.. Conduct post-incident reviews, implement feedback loops, and manage automation backlog.. Drive infrastructure as code (IaC) adoption and reliability improvements across platforms.. Contribute to compliance reporting and performance monitoring for protected applications.. 5+ years of experience in SRE, DevOps, or Platform Engineering roles.. Hands-on expertise with observability tools such as Grafana, Prometheus, Datadog, or Splunk.. Experience defining and tracking SLIs/SLOs, error budgets, and availability dashboards.. Proficiency in at least one scripting or programming language (Python, Bash, Go).. Knowledge of disaster recovery principles, failover practices, and RPO/RTO objectives.. Familiarity with IaC tools like Terraform, Ansible, or CloudFormation.. Experience with CI/CD pipelines, automated testing, and cloud-native deployments (Azure or AWS).. Strong problem-solving skills, collaboration, and cross-functional teamwork ability.. Fluent in written and spoken English.. . Nice to have:. Experience with Zerto, Veeam, chaos engineering tools, Kubernetes, TISAX/ISO 27001 compliance, or platform reliability for mission-critical systems.. Company Location: India.