
Machine Learning Operation Manager at Weekday AI. This role is for one of the Weekday's clients. Min Experience: 6 years. Location: Remote (India). JobType: full-time. As the Machine Learning Operations Manager, you will oversee the end-to-end ML lifecycle — from model training and deployment to monitoring and optimization. You will lead a small, high-performing team of engineers while remaining hands-on in building scalable, reliable, and efficient ML infrastructure. This role combines strategic leadership with deep technical expertise to ensure smooth collaboration between research, engineering, and operations teams.. Key Responsibilities:. . . End-to-End ML Lifecycle:. Manage training infrastructure, experiment tracking, deployment, and continuous optimization. . . Collaboration with Researchers:. Partner with research teams to streamline training, evaluation, and fine-tuning workflows. . . Team Leadership:. Mentor and guide a small team of ML engineers (3–4) while contributing as an individual contributor. . . Performance Optimization:. Improve latency, throughput, and cost efficiency; ensure robust packaging and runtime reliability. . . Automation & Reliability:. Develop systems for CI/CD, versioning, rollback, A/B testing, monitoring, and alerting. . . Infrastructure Management:. Maintain scalable, secure, and compliant AI environments across training and inference stages. . . Cloud & AI Integration:. Collaborate with cloud providers (AWS, GCP, Azure) and AI platforms to enhance tooling and optimize costs. . . Cross-Functional Collaboration:. Support GenAI and AI-driven projects across teams beyond core MLOps responsibilities. . . Architecture & Roadmap:. Contribute to architectural planning, documentation, and the continuous evolution of the ML stack. . . Best Practices:. Promote automation, MLOps standards, and operational excellence throughout the ML lifecycle. . . Requirements:. . 5+ years of hands-on experience in MLOps or ML/AI Engineering. . Strong understanding of ML/DL concepts and applied experience in model training and deployment infrastructure. . Proficiency with cloud-native ML tools (e.g., GCP Vertex AI, AWS SageMaker, Kubernetes). . Experience working across both model training and inference systems. . Familiarity with model optimization methods such as quantization, distillation, TensorRT, or FasterTransformer. . Demonstrated ability to lead complex technical projects independently. . Excellent communication and collaboration skills with a cross-functional mindset. . Ownership-oriented approach with comfort in driving clarity in ambiguous situations. . . Skills:. MLOps, ML Engineering, Machine Learning Infrastructure, Model Deployment, Model Monitoring, CI/CD, Vertex AI, AWS SageMaker, GCP AI Platform, Kubernetes, Docker, MLflow, Kubeflow.. Company Location: India.