DevOps / Site Reliability Engineer at Two95 International Inc.

Source: https://jobs.workable.com/view/smuEGumf5rqSxZWtpi5Ksc/remote-devops-%2F-site-reliability-engineer-in-united-states-at-two95-international-inc.

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

DevOps / Site Reliability Engineer at Two95 International Inc.. Job Title: Lead SRE (Site Reliability Engineer ). Location: Remote Work. Type: 6+ Month Contract to hire. Rate: $Open /hr.. Pl forward updated resume to . deivy.malli. @two95intl.com. and include your rate requirement along with your contact details with a suitable time when we can reach you. . . Responsibilities . · Own uptime, SLAs, and overall reliability of cloud infrastructure and kiosks platform. . · Lead incident response, root-cause analysis, and drive actionable postmortems. . · Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team. . · Maintain and improve monitoring, alerting, and observability (Grafana, Prometheus, New Relic, etc). . · Manage, operate and recommend improvement of mo . · Execute and continuously improve disaster recovery and business continuity plans. . · Partner with platform engineering, QA, and development teams to ensure operational readiness. . · Establish and maintain runbooks, operational standards, and reliability best practices. . · Provide leadership, mentorship, and clear communication during both normal operations and incidents. . · Optimize cloud and Kubernetes environments for reliability, performance, and scalability. . . Qualifications . · 8+ years in SRE, DevOps, or Platform Engineering roles; 2+ years in a senior or lead capacity. . · Strong experience supporting production environments with strict SLAs and high uptime requirements. . · Deep knowledge of Kubernetes, containers, and cloud-native infrastructure. . · Proficiency in automation and scripting using Bash, Python, or Go. . · Hands-on experience with CI/CD pipelines and release engineering in modern environments. . · Expert-level familiarity with IaC tools (Terraform preferred). . · Strong understanding of monitoring, alerting, logging, and observability tooling. . · Experience implementing and managing GitOps workflows (ArgoCD or similar). . · Demonstrated ability to lead incidents and communicate effectively with technical and non-technical stakeholders. . · Solid understanding of disaster recovery planning, resilience practices, and system hardening. . . Company Location: United States.