Data Engineer, Sr. (Remote - US) at Jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Data Engineer, Sr. (Remote - US) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Data Engineer, Sr. in the United States.. This senior-level data engineering role focuses on building and optimizing data pipelines that power AI-driven applications and intelligent assistants. You will work with large-scale data ingestion, transformation, and storage processes while ensuring compliance with data governance and privacy standards. The position emphasizes innovation in data engineering, including embedding generation for text, image, and voice data, and optimizing GPU/CPU workloads. You will collaborate closely with cross-functional teams to improve operational efficiency and enhance AI/ML capabilities. This role is remote but requires alignment with Eastern Time Zone operations. Ideal candidates are technically proficient, experienced in streaming and batch data processing, and passionate about scalable, high-quality data solutions.. . Accountabilities:. Design, develop, and maintain streaming and batch ETL/ELT pipelines for ingesting structured and unstructured data into analytics, vector, and graph stores.. Automate embedding generation processes and refresh pipelines using change-data capture and cost-aware routing strategies.. Define and enforce data contracts, schema evolution strategies, and tiered storage policies for secure and efficient data management.. Monitor pipelines with OpenTelemetry metrics, tracking lineage, data freshness, and performance dashboards.. Collaborate with enterprise architects and technical leads to improve data retrieval, storage optimization, and governance standards.. Ensure compliance with data privacy regulations and maintain audit-ready, high-quality pipelines.. Provide guidance on scalable architecture and best practices for AI/ML data solutions.. . 7+ years of experience in data engineering, with a proven record in designing and managing large-scale data pipelines.. 2+ years of experience working with embeddings or vector databases.. Hands-on expertise with Spark, Flink, Kafka, Kinesis, and ELT tools like DBT, Airflow, or Dagster.. Strong understanding of CDC, schema evolution, and compliance with GDPR/CCPA data retention policies.. Proficiency in Python and TypeScript for production-grade development.. Experience with Infrastructure as Code (IaC) tools such as Terraform or CDK.. Ability to optimize storage costs and GPU inference workloads effectively.. Excellent problem-solving, communication, and collaboration skills across technical teams.. . Company Location: United States.