
Senior Infrastructure Engineer - AI/ML at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a . Senior Infrastructure Engineer - AI/ML. in . the United States. .. This fully remote role offers the chance to design, implement, and optimize cutting-edge AI/ML infrastructure that empowers organizations to maintain full control over their data and compute resources. You will work on modular, cloud-native, and reusable infrastructure components supporting model training, inference serving, experiment tracking, and data pipelines. This high-impact position combines hands-on engineering with strategic influence, allowing you to shape scalable, secure, and observable systems while collaborating with a globally distributed team. The ideal candidate has strong experience in Kubernetes, cloud platforms, and Infrastructure-as-Code, and thrives in a culture that values autonomy, open source, and innovative thinking.. Accountabilities:. · Design, implement, and maintain modular, composable infrastructure components for AI/ML workflows including training, inference, and experiment tracking.. · Contribute to open-source MLOps tooling and Kubernetes ecosystem projects that enable data sovereignty and client-controlled AI platforms.. · Optimize large-scale AI/ML workloads for performance, cost efficiency, reliability, and observability on client-owned cloud and hybrid infrastructure.. · Collaborate with ML engineers, cross-functional teams, and clients to deploy, configure, and maintain sovereign AI infrastructure.. · Mentor junior engineers, contribute to technical initiatives, and provide feedback to uphold engineering excellence.. · Participate in designing CI/CD pipelines, GitOps workflows, and automation processes for scalable AI/ML systems.. · 4+ years of hands-on infrastructure/platform/DevOps experience with production systems.. · Strong experience with Kubernetes, including troubleshooting, optimization, and production deployment.. · Proficiency with Infrastructure-as-Code tools such as Terraform, Helm, Pulumi, or Ansible.. · Experience with at least one major cloud platform (AWS, Azure, GCP), including networking, compute, and security.. · Strong programming skills in Python and/or Go for maintainable infrastructure code.. · Understanding of CI/CD practices, GitOps workflows, and automation principles.. · Ability to work independently in distributed teams and communicate effectively across time zones.. · Experience contributing to technical initiatives or mentoring junior engineers.. · Bonus experience: MLOps pipelines, model training and serving, monitoring tools (Prometheus, Grafana), GPU infrastructure, ML workflow orchestration (Kubeflow, MLflow, Airflow), service meshes, cost optimization, and secure deployment environments.. Company Location: United States.