Research Engineer, Evaluations at AssemblyAI

Source: https://justremote.co/remote-manager-exec-jobs/research-engineer-evaluations-assemblyai

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Research Engineer, Evaluations at AssemblyAI. Location Information: United States. AssemblyAI is an applied artificial intelligence company. We use the latest deep learning technology to build practical products that bring futuristic ideas to life.. Our team includes researchers, engineers, and designers that have worked at some of the largest technology companies all over the world. Our main office is located in downtown San Francisco.. At AssemblyAI, we believe that cutting edge artificial intelligence technology should not be limited to only those with the funding or resources to invest in it.. Our goal is to help make creative, new ideas possible by making AI technology accessible to everyone through easy to use products, whether you are an independent developer, startup, or global company.. Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics. Build and maintain competitive benchmarking pipelines. Design and run systematic experiments to measure the impact of model changes. Onboard, curate, and maintain evaluation datasets. Create evaluation subsets to stress-test specific capabilities and edge cases. Define evaluation metrics for real-world performance. Translate qualitative customer feedback into quantifiable evaluation criteria. Work with customer-facing teams to understand pain points and convert them into research priorities. Maintain clean evaluation pipelines and clear documentation. Identify evaluation gaps proactively and propose solutions. ML fundamentals: Interpret results and debug issues without training from scratch. Strong Python skills: Write clean evaluation scripts, work with data pipelines, comfortable with SQL and cloud infrastructure. Metric intuition: Understanding of good evaluation metrics and ensuring statistical rigor. Voice agent stack familiarity: Understands VAD, ASR, turn detection, LLM, TTS systems interaction. Tinkerer mentality: Preference for shipping and iterating quickly. Communication skills: Explain technical results, summarize findings, and translate customer feedback. Ownership mindset: Proactively fill evaluation gaps. Work at least 3-4 hours overlapping with Eastern US Time Zone. Pay range:. $210K - $260K