
Senior AI Inference Engineer - Large Language Models (f/m/d) at Aleph Alpha. Location Information: Heidelberg. . Overview:. You will join our product team in a position that sits at the intersection of artificial intelligence research and real-world solutions. We foster a highly collaborative work culture where you can expect to work closely with your teammates and have a high level of communication between teams through methodologies such as pair or mob programming. . Your responsibilities:. Model Inference: Focus on inference optimization to ensure rapid response times and efficient resource utilization during real-time model interactions. . Hardware Optimization: Run models on various hardware platforms, from high-performance GPUs to edge devices, ensuring optimal compatibility and performance. . Experimentation and Testing: Regularly run experiments, analyze outcomes, and refine the strategies to achieve peak performance in varying deployment scenarios. . Staying up to date with the current literature on MLSys . Your profile:. You care about making something people want. You want to ship something that will bring value to our users. You want to deliver AI solutions end-to-end and not finish building a prototype. . Bachelor's degree or higher in computer science or a related field. . You understand how multimodal transformers work. . You understand the characteristics of LLM inference (KV caching, flash attention, and model parallelization). . You have hands-on experience with large language models or other complex AI architectures. . You have experience in system design and optimization, particularly within AI or deep learning contexts. . You are proficient in Python and have deep understanding of deep learning frameworks such as . PyTorch. . . A deep understanding of the challenges associated with scaling AI models for large user bases. . . Nice if you have: . Previous experience in a high-growth tech environment or a role focused on scaling AI solutions. . Expertise with CUDA and Triton programming and GPU optimization for neural network inference. . Experience with Rust. . Experience in adapting AI models to suit a range of hardware, including different accelerators. . Experience in model quantization, pruning, and other neural network optimization methodologies. . A track record of contributions to open-source projects (please provide links). . Some Twitter presence discussing ML Sys topics. . What you can expect from us:. Become part of an AI revolution!. 30 days of paid vacation. Access to a variety of fitness & wellness offerings via . Wellhub. Mental health support through . nilo.health. Substantially subsidized company pension plan for your future security. Subsidized Germany-wide transportation ticket. Budget for additional technical equipment. Flexible working hours for better work-life balance and hybrid working model. Virtual Stock Option Plan. JobRad®. Bike Lease. .