Machine Learning Systems Engineer (Remote - EU) at Jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Machine Learning Systems Engineer (Remote - EU) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a . Machine Learning Systems Engineer. in . European Union. .. We are seeking a talented Machine Learning Systems Engineer to join a remote-first, globally distributed team working on cutting-edge AI infrastructure. In this role, you will contribute to the development of large-scale language model systems, focusing on high-performance training, inference, and self-improving AI agents. You will work at the intersection of machine learning research, distributed systems, and high-performance computing, building tools and frameworks that enable researchers and organizations worldwide to deploy advanced AI solutions. This role offers the chance to work on technically demanding, open-source projects while collaborating with a passionate international team. Your work will have a direct impact on the future of scalable AI systems.. . Accountabilities:. Contribute to the development and optimization of . large-scale language model frameworks. .. Implement high-performance distributed training algorithms using frameworks such as . Megatron-LM. , . DeepSpeed. , and . vLLM. .. Develop and optimize inference engines and tools for model deployment, fine-tuning, and AI agent self-improvement.. Integrate diverse machine learning ecosystems including . HuggingFace. and other LLM tools.. Optimize performance across . multi-GPU, multi-node architectures. , leveraging HPC and CUDA/ROCm programming.. Collaborate with the open-source community to enhance the codebase, implement features, and resolve issues.. Research and implement advanced techniques for self-improving AI agents and high-efficiency ML pipelines.. . 3+ years. of experience in machine learning engineering or research.. Proficiency in . Python. and . C/C++. , with strong systems programming skills.. Deep understanding of . high-performance computing. concepts, including MPI, BSP, and distributed multi-GPU training.. Solid experience with transformer architectures, gradient descent, backpropagation, and deep learning training.. Familiarity with distributed training strategies: . data parallelism, model parallelism, pipeline parallelism. .. Experience with . containerization. (Docker, Kubernetes) and cluster orchestration.. Demonstrated experience with ML frameworks like . vLLM, Megatron-LM, HuggingFace. , or similar.. Commitment to . open-source development. and community collaboration.. Excellent problem-solving, debugging, and performance optimization skills.. Bonus:. Advanced degrees (MS/PhD), experience with SLURM, mixed-precision training, MLOps, or prior contributions to major open-source ML projects.. . Company Location: Spain.