Principal Agentic Engineer at ZeroFox

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Principal Agentic Engineer at ZeroFox. ZeroFox protects organizations from external threats across the public attack surface, and we're building agentic AI into the core of how that work gets done.. You'll define and build the production agentic systems that power ZeroFox's product: the architectures, tooling, and practices that make agents manage state across steps, handle failures without losing context, and know when to escalate to a human. This is deep agent systems work. You'll be solving the hardest problems in multi-step orchestration, reliability, and evaluation for non-deterministic systems operating on adversarial data.. What you'll do. Design and implement production agent architectures: state management, error handling, retry logic, graceful degradation, human-in-the-loop escalation. Build evaluation and testing frameworks for non-deterministic agent workflows: offline tests, synthetic data generation, regression checks, and post-deploy monitoring. Implement orchestration patterns: multi-agent coordination, tool-calling chains, memory management, context window optimization. Define deployment and governance practices: agent versioning, rollback, behavioral telemetry, anomaly detection. Instrument agents with tracing, logging, and observability that make production behavior debuggable. Establish architectural standards and best practices for agentic development through design reviews, pairing, and mentorship. Required qualifications. 7+ years of software engineering experience building production systems, with hands-on experience designing and deploying LLM-based agents. Deep knowledge of agent reliability patterns: state management, error boundaries, escalation logic, context management, tool-calling failure modes. Experience with agent orchestration frameworks and understanding of tradeoffs between existing frameworks and building custom. Strong backend engineering fundamentals: testing, monitoring, deployment, debugging, performance optimization. Hands-on experience with retrieval-augmented systems and understanding of how retrieval quality affects agent behavior. Experience building systems that handle adversarial or noisy input (cybersecurity, fraud detection, content moderation, or similar domains). Familiarity with cloud-based AI deployment, including observability, reliability, and cost considerations. What success looks like. Agents ship to production with defined reliability contracts and clear operational ownership. Agent failures are detected and handled systematically through telemetry and evaluation pipelines, not customer reports. The team has reusable patterns for state management, error handling, and escalation that new agent projects build on rather than reinvent. Preferred qualifications. Experience defining agent governance practices (versioning, behavioral telemetry, rollback) in a production environment. Background in cybersecurity or adjacent domain with high-noise, adversarial data. Experience with tool-use architectures and integration patterns for connecting agents to external systems. Track record of raising engineering capability across a team through mentoring, pairing, or design review. Company Location: United States.