AI Quality Assurance Engineer - LLM at Enroute

Source: https://jobs.workable.com/view/5wJcnEvXBDCLvSTTA91sad/remote-ai-quality-assurance-engineer---llm-in-dinast%C3%ADa-at-enroute

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

AI Quality Assurance Engineer - LLM at Enroute. We love technology, and we enjoy what we do. We are always looking for innovation. We have social awareness and try to improve it daily. We make things happen. You can trust us. Our Enrouters are always up for a challenge. We ask questions, and we love to learn.. We pride ourselves on having great benefits and compensations, a fantastic work environment, flexible schedules, and policies that positively impact the balance of work and life outside of it. We care about who you are in the office and as an individual. We get involved, we like to know our people, we want every Enrouter to become part of a great community of highly driven, responsible, respectful, and above all, happy people. We want you to enjoy working with us.. We’re looking for a . QA Engineer with experience testing Large Language Model (LLM) applications. .. . 3+ years in QA, including . 1+ year testing AI/LLM applications. . . . Experience with . RAG frameworks. (Response Accuracy, Grounding, Faithfulness). . . Experience using . AI evaluation tools such as Weights & Biases (W&B) or MLflow.. . . Skilled in . hallucination detection. , multilingual validation, and prompt evaluation. . . UI testing for AI-driven interfaces using . Cypress, Playwright, or Detox. . . . API testing using . Postman. , . REST-assured. , or . custom scripts. . . . Strong knowledge of . edge case testing. , fallback validation, and response analysis. . . Collaborative mindset; ability to work closely with . AI/ML engineers. and . product teams. . . . Hands-on with . TestRail, Jira, Zephyr. , and other QA tools. . . Strong documentation and defect-reporting skills.. . Key Responsibilities:. . Design and run test strategies for . LLM responses. , using the . RAG triad. framework. . . Evaluate conversational AI outputs to flag hallucinations or inconsistencies. . . Validate . chatbot/voice UI. elements across mobile/web. . . Perform . agentic decision tree validation. and simulate edge case scenarios (e.g., API rate limits). . . Conduct regression, exploratory, and accessibility testing. . . Maintain test cases/scripts for AI features using automation tools like . Cypress. or . Detox. . . . Track and report . AI-specific quality metrics. (e.g., hallucination rate, response latency). . . Clearly document bugs with reproducible steps and AI model response samples.. . Company Location: Mexico.