Senior Site Reliability Engineer, Managed Kubernetes - Europe at Jobgether

Source: https://jobs.workable.com/view/1MPoR11bXKmeDVYwebTYJT/remote-senior-site-reliability-engineer%2C-managed-kubernetes---europe-in-spain-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior Site Reliability Engineer, Managed Kubernetes - Europe at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer, Managed Kubernetes in Europe.. Join a dynamic engineering team responsible for building and scaling large-scale Kubernetes platforms to power cutting-edge AI and machine learning workloads. As a Senior Site Reliability Engineer, you will ensure the reliability, performance, and scalability of cloud infrastructure while contributing to automation, monitoring, and platform improvements. You will work closely with engineering, HPC operations, and data center teams to solve complex technical challenges, provide operational support, and improve service quality. This is a high-impact role for those passionate about distributed systems, automation, and delivering reliable services at scale.. . Accountabilities:. Operate and maintain production Kubernetes clusters at scale, handling incidents, recovery, and cluster lifecycle management.. Build and maintain control plane services, custom controllers, and operators to enhance cluster reliability.. Automate deployment, upgrades, patching, and validation of Kubernetes workloads and platform components.. Collaborate with HPC Ops, Datacenter Ops, and engineering teams on cross-functional issues and incident resolution.. Define, implement, and monitor SLOs and SLIs to maintain high platform reliability and performance.. Assist customers with workload integration, authentication, and storage-related questions.. Contribute to tooling, observability, and platform quality improvements using Python, Go, and CI/CD pipelines.. 6+ years of experience in SRE, operations engineering, or similar roles managing Linux clusters and systems.. Strong programming skills in Go and Python, with experience in GitOps, Helm, and Kubernetes operators.. Proven experience running Kubernetes clusters in production, including EKS, GKE, on-prem, or hybrid environments.. Familiarity with observability and monitoring tools such as Prometheus, Grafana, and FluentBit.. Experience provisioning Kubernetes using kubeadm, Cluster API, or similar tools.. Ability to work independently and collaboratively, managing customer interactions during incidents.. Nice-to-have: Deep Kubernetes expertise (CRDs, CSI, CNI), experience with HPC or GPU clusters, multi-cloud environments, and contributions to CNCF or Kubernetes SIGs.. . Company Location: Spain.