Senior Data Engineer at COGNNA. As a Senior Data Engineer, you will be the architect of our security data ecosystem. Your primary mission is to design and build high-performance data lake architectures and real-time streaming pipelines that serve as the foundation for COGNNA's Agentic AI initiatives. You will ensure that our AI models have access to fresh, high-quality security telemetry through sophisticated ingestion patterns.. Key Responsibilities. 1. Data Lake & Storage Architecture. Architectural Design: . Design and implement multi-tier Data Lakehouse architectures to support both structured security logs and unstructured AI training data.. Storage Optimization: . Define lifecycle management, partitioning, and clustering strategies to ensure high-performance querying while optimizing for cloud storage costs.. Schema Evolution:. Manage complex schema evolution for security telemetry, ensuring compatibility with downstream AI/ML feature engineering.. 2. Real-Time & Streaming Processing. Streaming Ingestion: . Build and manage low-latency, high-throughput ingestion pipelines capable of processing millions of security events per second in real-time.. Unified Processing:. Design unified batch and stream processing architectures to ensure consistency across historical analysis and real-time threat detection.. Event-Driven Workflows:. Implement event-driven patterns to trigger AI agent reasoning based on incoming live data streams.. 3. AI/ML Enablement & Feature Engineering. Vector Data Foundations: . Architect the data infrastructure required to support semantic search applications and variants of RAG architectures for our generative AI models.. Feature Management: . Design and maintain a centralized repository for ML features, ensuring consistent data is used for both model training and real-time inference.. AI Pipeline Orchestration:. Build automated workflows to handle data preparation, model evaluation, and deployment within our cloud AI ecosystem.. 4. DataOps & Systems Design. Infrastructure as Code:. Utilize declarative tools (e.g., Terraform) to manage the entire lifecycle of our cloud data resources and AI endpoints.. Quality & Observability: . Implement automated data quality frameworks and real-time monitoring to detect "data drift" or pipeline failures before they impact AI model performance.. Experience & Education:. 5+ years in Data Engineering or Backend Engineering, focused on large-scale distributed systems. B.S. or M.S. in Computer Science or a related technical field.. Cloud Architecture:. Deep architectural mastery of the Google Cloud Platform ecosystem, specifically regarding managed analytical warehouses, serverless compute, and identity/access management. Proven track record of deploying enterprise-scale Data Lakehouses from scratch.. Real-Time Mastery: . Expertise in building production-grade distributed messaging and stream processing engines (e.g., managed Apache Beam/Flink environments) capable of handling high-velocity telemetry.. AI Enablement: . Strong understanding of how data architecture impacts AI performance. Experience building embedding pipelines, feature stores, and automated workflows for model training and evaluation.. Software Fundamentals:. Expert-level Python and advanced SQL. Proficiency in high-performance languages like Go or Scala is highly desirable.. Operational Excellence: . Advanced knowledge of CI/CD, containerization on Kubernetes, and managing cloud infrastructure through code to ensure reproducible environments.. Preferred Qualifications. Experience with dbt for modern analytics engineering.. Understanding of cybersecurity data standards (OCSF/ECS).. Previous experience in an AI-first startup or a high-growth security tech company.. Company Location: India.
Senior Data Engineer at COGNNA