Site Reliability Engineer (Remote - US) at Jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Site Reliability Engineer (Remote - US) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in the United States.. This role offers the opportunity to ensure the reliability, stability, and performance of critical streaming and broadcast systems in a high-traffic, 24x7 environment. You will be responsible for monitoring live channel distribution, troubleshooting complex system issues, and implementing automation to prevent downtime. Collaborating with cross-functional teams and third-party vendors, you will identify root causes, deploy solutions, and document processes to optimize operational efficiency. The position requires technical expertise, a proactive mindset, and a dedication to delivering seamless media experiences at scale. Your work will have a direct impact on delivering high-quality streaming content to millions of viewers.. . Accountabilities. ·         Investigate and resolve issues within broadcast and streaming systems, identifying root causes and implementing solutions.. ·         Serve as a Level 2 escalation resource for live channel distribution and report findings to leadership and operations teams.. ·         Collaborate with internal teams and external vendors to troubleshoot unresolved issues and ensure timely resolutions.. ·         Develop and maintain comprehensive documentation detailing system issues, root causes, and effective solutions.. ·         Assist with deployment and testing of patches and fixes in development and production environments.. ·         Support on-air system integrations, rollouts, and special broadcast events, providing 24x7 operational coverage as needed.. ·         Participate in daily operations review meetings, providing updates on system status, new issues, and planned fixes.. ·         Contribute to the design, analysis, and evaluation of projects using sound engineering principles and best practices.. ·         Bachelor’s degree in Engineering, Computer Science, or a related field.. ·         5+ years of DevOps/Site Reliability Engineering experience in high-traffic, cloud-hosted environments (AWS preferred).. ·         3–5 years of experience in support, analysis, or operations roles within technology systems.. ·         Strong experience with deployment automation tools such as CloudFormation, Terraform, and Ansible.. ·         Familiarity with containerization and orchestration technologies (Kubernetes, Docker).. ·         Proficiency in CI/CD orchestration and deployment tools (e.g., GitHub Actions, Jenkins).. ·         3–5 years of Linux system administration experience.. ·         Programming experience in Go, Python, Ruby, Java, or shell scripting.. ·         Skilled at designing automation tools and processes for large-scale systems.. ·         Experience with log and metric aggregation software (e.g., CloudWatch, Elasticsearch + Kibana, Splunk, Grafana).. ·         Strong problem-solving, analytical, and troubleshooting skills.. ·         Knowledge of networking concepts and the OSI model; able to troubleshoot network issues.. ·         Preferred: 3+ years in Media & Entertainment or 24x7 production environments, supporting IT/broadcast systems, and familiarity with live TV broadcasting, OTT streaming, codecs, and ARQ technologies.. . Company Location: United States.