ML Engineer (Remote - US) at Jobgether

Source: https://jobs.workable.com/view/3zZUuXXFrmkbYVNjVZDNVZ/ml-engineer-(remote---us)-in-united-states-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

ML Engineer (Remote - US) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a . Machine Learning Engineer. in the . United States. .. This role provides the opportunity to build and optimize training pipelines for cutting-edge AI models, focusing on practical, production-ready implementations rather than academic research. You will work on automating fine-tuning processes and enabling vendors to deploy adaptable AI models efficiently across diverse hardware. Collaborating closely with backend engineers, DevOps, and cross-functional teams, you will ensure reproducible, high-performance training workflows. This position is ideal for engineers passionate about ML engineering, system optimization, and building tools that have direct real-world impact. You will operate in a dynamic, fast-paced environment, contributing to the next generation of AI infrastructure while maintaining a strong emphasis on quality, scalability, and reproducibility.. Accountabilities:. In this role, you will be responsible for:. Implementing and maintaining LoRA/QLoRA fine-tuning pipelines using PyTorch and Hugging Face Transformers.. Developing logic for incremental training and adapter stacking, producing clean, versioned “delta packs.”. Automating data preprocessing workflows, including tokenization, formatting, and filtering for user datasets.. Building training scripts and workflows that integrate with orchestration backends via REST/gRPC or job queues.. Implementing monitoring hooks (loss curves, checkpoints, evaluation metrics) feeding dashboards for real-time tracking.. Collaborating with DevOps and backend engineers to ensure reproducible, portable, and efficient training environments.. Writing tests to guarantee reproducibility, correctness, and reliability of adapter outputs.. Participating occasionally in on-site meetings for discussions and collaborative problem-solving.. The ideal candidate will have:. Strong programming skills in Python with hands-on experience in PyTorch.. Practical experience with Hugging Face ecosystem (Transformers, Datasets, PEFT).. Familiarity with LoRA/QLoRA or parameter-efficient fine-tuning techniques.. Understanding of mixed-precision training (FP16/BF16) and memory optimization techniques.. Experience building production-ready training scripts with reproducibility, logging, and error handling.. Comfortable working in Linux GPU environments (CUDA, ROCm).. Ability to collaborate with engineers across disciplines, including non-ML specialists.. Preferred Qualifications:. Experience with bitsandbytes, xformers, or flash-attention.. Familiarity with distributed training frameworks (multi-GPU, NCCL, DeepSpeed, or Accelerate).. Prior work in MLOps or packaging ML pipelines for deployment.. Contributions to open-source ML libraries.. Company Location: United States.