Senior Data Scientist - LLMs, RAG & Multimodal AI (Remote | Immediate joiner) at Proximity Works

Source: https://jobs.workable.com/view/oC6yNGbCehKFfBriCFcTsM/senior-data-scientist---llms%2C-rag-%26-multimodal-ai-(remote-%7C-immediate-joiner)-in-india-at-proximity-works

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior Data Scientist - LLMs, RAG & Multimodal AI (Remote | Immediate joiner) at Proximity Works. Join Proximity Works, one of the world’s most ambitious AI technology companies, shaping the future of Sports, Media, and Entertainment. Since 2019, Proximity Works has created and scaled AI-driven products used by 697 million daily users, generating $73.5 billion in enterprise value for our partners. With headquarters in San Francisco and offices in Los Angeles, Dubai, Mumbai, and Bangalore, we partner with some of the biggest global brands to solve complex problems with cutting-edge AI.. We are looking for a Senior Data Scientist with deep expertise in large language models (LLMs), retrieval-augmented generation (RAG), and multimodal learning to shape the next generation of intelligent, scalable, and reliable search systems.. Role Summary. This is a hands-on applied science role at the frontier of AI. You will design, fine-tune, and optimize large-scale language and multimodal models, productionize retrieval-augmented pipelines, and define robust evaluation frameworks. You will work closely with engineering and product teams to build systems that combine language, vision, and retrieval modalities—delivering real-world, user-facing AI applications at scale.. What You’ll Do. . Design, fine-tune, and optimize LLMs for applied multimodal generation use cases.. . Build and productionize RAG pipelines that combine embedding-based search, metadata filtering, and LLM-driven re-ranking/summarization.. . Apply prompt engineering, RAG techniques, and model distillation to improve grounding, reduce hallucinations, and ensure output reliability.. . Define and implement evaluation metrics across semantic search (nDCG, Recall@K, MRR) and generation quality (grounding accuracy, hallucination rate).. . Optimize inference pipelines for latency-sensitive use cases with strategies like token budgeting, prompt compression, and sub-100ms response targets.. . Train and adapt models via transfer learning, LoRA/QLoRA, and checkpoint reloading, ensuring robust deployment in production environments.. . Collaborate with product and research teams to explore innovative multimodal integrations for user-facing applications.. . . What Success Looks Like. . Deployment of production-ready LLM + RAG pipelines powering global-scale search and discovery applications.. . Demonstrable improvements in grounding accuracy and hallucination reduction across deployed systems.. . Consistent delivery of sub-100ms inference latency for generation workloads.. . Adoption of rigorous evaluation metrics that drive continuous model improvement.. . Effective cross-functional collaboration with engineering, product, and research teams.. . What You’ll Need. . Strong background in NLP, machine learning, and multimodal AI.. . Proven hands-on experience in LLM fine-tuning, RAG, distillation, and evaluation of foundation models.. . Expertise in semantic search and retrieval pipelines (e.g., FAISS, Weaviate, Vespa, Pinecone).. . Demonstrated ability to deploy models at scale, including distributed inference setups.. . Solid understanding of evaluation frameworks for ranking, retrieval, and generation.. . Proficiency in Python, PyTorch/TensorFlow, and modern ML toolkits.. . Experience in multimodal AI (bridging text, vision, or speech with LLMs).. . Track record of shipping latency-sensitive AI products.. . Strong communication skills and the ability to collaborate with cross-functional global teams.. . Success Traits. Builder’s mindset · High ownership · Analytical clarity · Collaborative spirit · Global mindset · Growth orientation. Company Location: India.