FBS Site Reliability Engineer at Capgemini

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

FBS Site Reliability Engineer at Capgemini. Our Client is one of the United States’ largest insurers, providing a wide range of insurance and financial services products with gross written premiums well over US$25 Billion (P&C). They proudly serve more than 10 million U.S. households with more than 19 million individual policies across all 50 states through the efforts of over 48,000 exclusive and independent agents and nearly 18,500 employees. Finally, our Client is part of one the largest Insurance Groups in the world.. Job Summary. This position will focus on infrastructure & code reviews to ensure solutions built and delivered are Highly Available and to minimize unplanned downtime. . Key Responsibilities. •Expert troubleshooter within IT who has broad technical experience in multiple disciplines of IT and is willing to help our Incident and Problem Management teams . •Understand root cause and the necessary tasks needed to ensure this incident does not recur. . •Validate root cause of incidents in nonproduction regions, ensuring that the cause is validated and then work with teams to determine the best approach to resolve. . •Participate in chaos testing - where we leverage a third-party tool to disable functions on a server and we verify that we can alert teams to the failure and then assemble a technical troubleshooting call to identify and restore the service. . •Leverage Observability tools set to define key transactions and observe their performance within systems . •Create golden signal reporting and error budgets for development teams. Must know the framework. •Perform failure analysis, leveraging chaos testing practices to break nonproduction systems to find weak points and work with infrastructure and development teams to improve the applications resilience.. •At least 6 years of experience in a similar role as a Reliability Engineer or Resilience Engineer. •Full English Fluency. •BS in Computer Science or similar. •Very strong experience using Code (writing, testing leveraging observability process) Ideally JAVA, C++. •Hands on approach, troubleshooting, very technical background. . Technical & Business Skills. . Site Reliability Engineer - Advanced. . Trend & Pattern Analysis – Advanced, Optimization, . . Resilience Engineering – Advanced. . Dynatrace - Intermediate (4-6 Years) . Desirable, not a must,. any other Observabilty tool. . Gremlin - Entry Level (1-3 Years) Chaos testing, Failure modeling experience or similiar . (MUST). . . Cloud Infrastructure, Experience: AWS / Azure / GCP - Intermediate (4-6 Years) . . Strong Coding experience. . Company Location: Mexico.