Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote) at Alex Staff Agency

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote) at Alex Staff Agency. About the project. The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load.. Infrastructure:. ~100 servers (bare metal + VPS). active use of IaC. Kubernetes clusters in production. focus on stability, observability, and automation. The project is long-term — not a hype startup, but a mature product with real users.. . What the work looks like. This is a hands-on role with a clear time allocation:. 60% — operations and incidents (including helping teams). 20% — infrastructure automation. 20% — prototyping, improvements, technical initiatives. There is on-call responsibility, but normally after-hours incidents happen 2–3 times a year, not every week.. Responsibilities. Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting). Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews). Monitoring, alerting, backups, and regular recovery checks. Development of service and infrastructure automation. Development of CI/CD and release procedures. Incident diagnosis and resolution, support for product teams. Traffic analytics, bot and attack protection tools. Responsibility for 24/7 platform stability. What’s important. 4+ years of experience operating Linux/Ubuntu infrastructure and production services. Strong understanding of networking and troubleshooting. Kubernetes (cluster operations), Rancher, Docker / containerd. Hands-on experience with Ansible and Terraform. Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry. CI/CD: Jenkins. Automation: Bash, Python. Experience working with LVM. Nice to have. Experience working with blockchain nodes. Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters. Providers: Hetzner / OVHcloud. Cloudflare (edge, DDoS), experience with AWS. Handling abuse tickets with hosting providers. Technology stack. VPN: WireGuard, OpenVPN. Databases: ClickHouse, MongoDB, Redis, PostgreSQL. Applications: Node.js (pm2), php-fpm, Lua, Tarantool. Supporting services: Go (operatorSDK), Ruby, Node.js, PHP. Company Location: Italy.