Senior Data Engineer at Growth Protocol

Source: https://jobs.ashbyhq.com/growthprotocol/db268829-edbf-42c2-91a6-f20b190f6096

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Senior Data Engineer at Growth Protocol. Remote Location: Remote (United States). ABOUT THE COMPANY. . Growth Protocol is an Enterprise Reasoning Platform headquartered in New York. We help enterprises turn strategy into action by combining a neuro-symbolic reasoning core, seamless data activation, and agentic workflows. Our platform reduces execution drag, accelerates growth, and delivers measurable business outcomes for some of the world’s most complex organizations.. . JOB DESCRIPTION. We are searching for an ambitious go-getter who welcomes the challenge of meeting the needs of a hyper-growth startup. As a Sr. Data Engineer, you will be at the heart of Growth Protocol’s data infrastructure, playing a foundational role in building the systems that power our AI platform. Your work will directly influence product features, client outcomes, and strategic business decisions.. You will collaborate with Data Scientists, Backend Engineers, Client IT, and business stakeholders to build and maintain scalable pipelines that serve billions of rows of structured and unstructured data weekly, enabling high-impact insights across multiple industries.. THE ROLE. Growth Protocol is hiring a Senior Data Engineer to play a foundational role in building the systems that power our AI platform. You will be at the heart of Growth Protocol's data infrastructure, directly influencing product features, client outcomes, and strategic business decisions.. You will collaborate with Data Scientists, Backend Engineers, Client IT, and business stakeholders to build and maintain scalable pipelines that serve billions of rows of structured and unstructured data weekly, enabling high-impact insights across multiple industries.. Ideal candidates are ambitious go-getters who welcome the challenge of meeting the needs of a hyper-growth startup and bring deep technical expertise across modern data infrastructure and ML operationalization.. OBJECTIVES OF THE ROLE. Collaboration. Work closely with Data Scientists to translate business and ML requirements into robust data workflows. Ensure timely delivery of clean, reliable data to support model development and production features. Technical Development. Engineer and manage scalable ETL architecture using Airflow, Snowpark, Cloud Run, and Apache Beam. Design and implement a high-performance data infrastructure for seamless processing and integration. Extract data from diverse online platforms. Operationalize machine learning models, focusing on deployment, reliability, and performance. Data Connectivity. Partner with client IT teams to identify the most efficient and secure methods for data ingestion including Snowflake Sharing, Databricks Delta Sharing, Private Link, and VPN. Work alongside the Platform Engineering team to define requirements for secure networking paths that support high-performance data transfers. Perform end-to-end testing of client connections to ensure data integrity and connectivity. Integrate customer databases with our platform. Monitoring and Reliability. Create and manage real-time monitoring systems for data ingestion and transformation pipelines. Proactively identify and resolve issues to maintain high levels of system reliability and data integrity. REQUIRED SKILLS AND QUALIFICATIONS. 5+ years of experience in Data Engineering. Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. Experience building data pipelines with robust unit and integration testing. Proficiency in distributed computing frameworks including Apache Beam and Spark. Functional understanding of enterprise networking including VPC peering, Private Link, and VPNs, with the ability to troubleshoot connectivity in a cloud environment. Hands-on experience operationalizing ML models in production. Familiarity with ML/AI, NLP, and Data Science workflows including MLFlow. Deep understanding of ETL workflows, data modeling, and data architecture. Strong debugging and problem-solving skills. Excellent communication skills and experience collaborating across teams. Preferred Qualifications. Experience working on enterprise products serving Fortune 500 clients across Financial Services, Industrials, and Consumer Products. Prior startup experience. Interest in current events, market dynamics, and emerging technologies. Experience creating Agent Skills. Familiarity with APIs and web scraping for data collection. Familiarity with Graph Databases. TECH STACK. Languages: Python, TypeScript. Frameworks: Apache Beam, Spark, FastAPI, Airflow. Cloud: Google Cloud Platform. Data: Elasticsearch, Snowflake, Databricks, Neo4J, PostgreSQL, MongoDB, GCS. Infrastructure and DevOps: Docker, Terraform, GitHub Actions, Cloud Run. Frontend: Next.js. PERKS. Competitive compensation and equity in a rapidly growing company. 100% company-paid health, dental, and vision insurance plus 401(k). Pet-friendly office