Data Engineer at CV Library

Source: https://jobs.workable.com/view/upwQJL7ygv6K9x2sUz6Enz/remote-data-engineer-in-cape-town-at-cv-library

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Data Engineer at CV Library. About Us. At CV-Library, we have a simple vision: to help the world to work, and we are looking for exceptional and talented people to help us realise this vision in both UK and overseas markets.. We now have an exciting opportunity for a Data Engineer to join our Data Platform team.. Hours:. Monday–Friday, 9:30–18:30 or 10:30–19:30 (depending on daylight savings or business requirements) . Location:. Remote working with 2 days per quarter in our Cape Town office as required. . What You’ll Be Working On. Data Pipelines & ETL/ELT. Build and maintain scalable batch and streaming pipelines using Python, Lambda, ECS/Fargate, and Airflow (MWAA).. Develop ingestion flows for diverse datasets including click events, operational systems, and third-party APIs.. Support Iceberg-based ingestion flows and metadata pipelines for high-volume event datasets.. Data Modelling & Transformation. Develop and optimise data models in dbt, including incremental models, snapshots, testing, documentation, and best practices.. Work with Redshift and Athena + Iceberg to design performant analytical datasets.. Apply best practices for schema design, partitioning, clustering, and compute efficiency.. Streaming & Event Data. Support ingestion from Kafka and other streaming sources.. Work with event schemas, JSON path extraction, and schema evolution strategies.. Build pipelines to standardise, enrich, and land event data into Iceberg and Redshift.. Data Science & GenAI Enablement. Collaborate with Data Scientists to operationalise models and workflows in Databricks and AWS SageMaker.. Help convert notebooks into production-ready pipelines and automate model scoring, monitoring, and retraining.. Support Generative AI integration projects using Amazon Bedrock, including prompt orchestration, retrieval workflows, and embedding pipelines.. Cloud Infrastructure. Build and maintain infrastructure using Terraform across multiple AWS environments (dev, staging, prod).. Implement IAM roles, S3 structures, Glue catalog objects, VPC connectivity, and CI/CD workflows.. Monitor cost efficiency, cluster performance, and resource utilisation.. Data Quality & Governance. Implement data quality checks, freshness monitors, and SLAs using dbt tests, S3 audits, and pipeline guardrails.. Build observability tools, metadata logging, and lineage improvements across the platform.. Ensure Analytics and Data Science teams have access to accurate and trustworthy data.. What You’ll Bring. Technical Skills. Strong experience with AWS (S3, Glue, Lambda, ECS, Redshift, Athena, IAM, CloudWatch).. Solid SQL skills, especially with Redshift or other MPP warehouses.. Hands-on experience with dbt (materialisations, macros, testing, incremental pipelines).. Experience building and maintaining Airflow DAGs.. Proficiency in Python for ETL, automation, and orchestration.. Experience handling semi-structured data (JSON, Parquet).. Understanding of streaming platforms (Kafka or similar).. Familiarity with Data Science tooling (Databricks, SageMaker) is a strong plus.. Nice to Have. Experience with Iceberg.. Understanding of MLOps and feature store patterns.. CI/CD experience (GitHub Actions, etc.).. Experience with Amazon Bedrock or LLM-driven data workflows.. . How You Work. You enjoy solving data problems end-to-end: ingestion → modelling → optimisation → monitoring.. You collaborate well with engineering teams, analysts, product teams, and data scientists.. You care deeply about data quality, reliability, and production readiness.. You take ownership — when something breaks, you investigate, fix it, and prevent it from recurring.. . Why Join the Data Platform Team?. Work in a modern, cloud-native environment that powers business-critical analytics and intelligence.. Opportunity to shape architecture, best practices, and future platform direction.. Collaborate with a high-performing team driving significant improvements across the CV-Library data landscape.. Your work will directly influence marketing, product, sales, reporting, and emerging AI initiatives.. Company Location: South Africa.