Senior Site Reliability Engineer, Observability at Chainlink Labs. Location Information: United States. Join us in building the verifiable web. Build and orchestrate Modern OTEL-based Observability Platform. Support multiple telemetry types, like metrics, logs and traces.. Define and support modern governance in observability and problems at scale.. Ensure reliability, security, and performance exceed our defined SLAs. Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load. Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action.. Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline.. Oversee the availability, performance, and supportability of our observability infrastructure.. Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.. Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.. Champion reliability and security by taking the time to do your work right the first time. 7+ years of relevant professional experience in devops, infrastructure, SRE, and/or platform teams. Ability to develop software outside of the scope of typical infrastructure requirements and configurations. Experience programming in C, C++, Java, Python, Go, Perl, or Ruby. Expert knowledge in all aspects of designing, developing, and managing large real-time systems. Experience with monitoring and logging, exporting metrics using Prometheus, building Grafana dashboards, and centralized logging solutions like ELK Stack, Splunk or Grafana Stack.. Experience with distributed systems and container orchestration, maintaining or building Kubernetes clusters. Strong communication skills. Competitive salary. Remote work
Senior Site Reliability Engineer, Observability at Chainlink Labs