Site Reliability Engineer (Remote - EMEA) at Jobgether

Source: https://jobs.workable.com/view/vR4LA1UyYdt19odXGxRvqj/site-reliability-engineer-(remote---emea)-in-spain-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Site Reliability Engineer (Remote - EMEA) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a . Site Reliability Engineer. in . EMEA. .. As a Site Reliability Engineer, you will play a key role in ensuring the stability, performance, and scalability of complex cloud systems that power high-traffic digital platforms. You’ll collaborate with cross-functional engineering teams to design and operate resilient infrastructure, enhance observability, and streamline automation across environments. This role is ideal for a technically driven problem-solver who thrives in a remote-first, innovation-focused environment. You’ll help strengthen cloud governance, optimize operations, and contribute to major architectural initiatives that enable global scalability and reliability.. . Accountabilities:. Design, operate, and optimize AWS-based infrastructure using Terraform, Helm, and Kubernetes to ensure scalability and high availability.. Strengthen observability through effective monitoring, logging, and alerting systems to improve incident detection and resolution times.. Automate key workflows and reduce manual tasks to enhance engineering productivity and operational consistency.. Partner with software and cloud engineering teams to improve the resilience and performance of services under heavy workloads.. Participate in building the next-generation architecture supporting regional expansion and data residency requirements.. Contribute to on-call rotations, manage incidents calmly, and document learnings to prevent recurrence.. Experiment with and adopt AI tools to streamline workflows and increase efficiency in reliability operations.. Minimum of 4 years of experience in cloud engineering, systems administration, or site reliability engineering.. Strong proficiency with AWS (or other major cloud platforms such as GCP or Azure) and infrastructure-as-code tools.. Hands-on experience with Kubernetes, serverless technologies, and automation frameworks.. Proficiency in a programming language such as Python, Go, or TypeScript.. Solid understanding of observability practices and tools like Grafana, Datadog, Prometheus, and Sentry.. Excellent analytical and problem-solving skills with a passion for improving performance and reliability.. Strong communication and documentation abilities to share knowledge across teams.. Eagerness to explore and apply AI technologies to improve operational processes.. Ability to work independently in a remote-first environment and collaborate effectively across regions.. . Company Location: Spain.