Senior AI Engineer at ScultureAI

Source: https://jobs.workable.com/view/rSVvmx3Y7kgmSWmph1F2NY/remote-senior-ai-engineer-in-united-kingdom-at-scultureai

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior AI Engineer at ScultureAI. About ScultureAI. ScultureAI is a B2B SaaS startup developing groundbreaking coaching solutions for shaping organisational culture using large-language models and other cutting-edge Al technologies. An organisation’s culture is a key driver of employee wellbeing and company success, and we are driven by the mission to improve the lives and performance of employees and companies all over the world. . ScultureAI been named one of Europe’s hottest startups at The Europas and one of the leading UK AI startups by Generative Group. Having raised over £1.5m to date, we are currently onboarding our first major enterprise clients and this is just the beginning! . Our team operates in a dynamic, supportive atmosphere where everyone's voice is heard, and every good idea is valued. We want to build a workplace community where passionate individuals can thrive, grow, and contribute to groundbreaking Al coaching solutions that transform organisational culture and have a positive impact on the world. Join us on this exciting journey and shape the future of coaching and workplace culture. . About the role. . We are looking for a talented Senior Prompt Engineer to help us push the boundaries of what’s possible with LLMs. This role has two core aspects. First, you’ll design, build and optimise complex multi-agent prompt pipelines that directly power our product and customer outcomes. Second, you'll build and scale rigorous evaluation systems to evaluate these constantly changing pipelines – both automated and human-in-the-loop – to continuously assess and improve the quality, performance, reliability, and cost-efficiency of our prompt architectures. You’ll need to be deeply immersed in the latest techniques across prompting, LLM behaviour, multi-agent orchestration and agent design, and be excited to apply that knowledge in a fast-paced, high-impact startup environment.. . Architect, design, and implement robust multi-agent pipelines leveraging a diverse range of LLMs. . Systematically decompose complex problems into structured, scalable prompt-driven solutions. . Use advanced prompt engineering techniques to drive desired results. . Build and maintain a library of atomic evaluation prompts to measure coaching output quality across several dimensions. . . Develop and automate evaluation systems that can benchmark, identify regressions, and measure the stability of new prompts, models, and model versions. . Use statistical techniques and domain knowledge to define quality thresholds, analyse variance, and surface outliers or black swan failures. . Participate in designing and running human-in-the-loop processes to validate and improve evaluator prompts. . Contribute to internal tooling for prompt observability, output sampling, evaluator scoring, and user feedback. . Optimise the trade-offs between latency, quality and operational cost. . Implement safety and security best practices. . Build and manage modular, version-controlled prompt libraries with support for templating and reuse. . Collaborating with full-stack engineers to design and implement complex automatedsystems to evaluate the quality and consistency of outputs across pipeline stages and agents.. . Collaborating with other technical colleagues to develop tools and systems for prompt observability, such as usage tracking, output variance monitoring, and feedback loop integration.. . Necessary to have. . 18+ months of hands-on experience in LLM prompt engineering, ideally with multi-prompt or agentic architectures. . Demonstrated experience designing or contributing to LLM evaluation systems, either for QA, R&D, or production monitoring. . Strong understanding of LLM behaviour, capabilities, and failure modes. . Comfortable working with prompt evaluation tools or libraries (e.g., OpenAI Evals, DeepEval, Promptfoo, TruLens, LangChain eval, etc.). . Familiarity with advanced evaluation metrics and an ability to interpret results. . A Masters degree with distinction in a relevant subject. . Appreciation of coaching, behaviour change and organizational culture principles. . Experience designing few-shot evaluator prompts and LLM-as-a-judge pipelines. . . Personal Characteristics . . Great communicator who builds strong relationships with colleagues. . Self-starter and fast learner, able to operate in a fast-paced environment . . Creative problem solver with a can-do attitude . . Accountable, reliable with high attention to detail. . Passionate about our vision to reimagine coaching and corporate culture. . Great communicator who builds strong relationships with colleagues. . Self-starter and fast learner, able to operate in a fast-paced environment . . Creative problem solver with a can-do attitude . . Accountable, reliable with high attention to detail. . Passionate about our vision to reimagine coaching and corporate culture. . Company Location: United Kingdom.