Principal Data Engineer (PySpark) (Remote - US) at Jobgether

Source: https://jobs.workable.com/view/6YJ9xdrCWtZpWYjn5aTvhy/principal-data-engineer-(pyspark)-(remote---us)-in-united-states-at-jobgether

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Principal Data Engineer (PySpark) (Remote - US) at Jobgether. This position is posted by Jobgether on behalf of a partner company. We are currently looking for a . Principal Data Engineer (PySpark). in the . United States. .. We are seeking a highly skilled Principal Data Engineer to lead the design, development, and optimization of scalable data systems that power advanced analytics and AI initiatives. In this hands-on role, you will architect both batch and real-time data pipelines, collaborate closely with product and AI teams, and influence the overall data strategy. You will mentor other engineers, enforce best practices, and ensure high-quality, reliable, and accessible data for the organization. The ideal candidate thrives in a fast-paced, high-impact environment, enjoys solving complex problems, and is passionate about building robust, efficient, and scalable data infrastructure.. Accountabilities:. Design, implement, and evolve distributed, cloud-based data infrastructure for batch and real-time workloads.. Build and maintain scalable data pipelines supporting analytics and AI/ML applications.. Integrate with third-party e-commerce platforms to expand and enrich the data ecosystem.. Ensure data reliability, availability, and quality through automated monitoring and auditing.. Collaborate with engineering, AI, and product teams to deliver data solutions that meet business-critical needs.. Mentor and support data engineers, promoting coding standards, best practices, and professional growth.. Drive innovation by identifying opportunities for improved data processing and architecture.. 10+ years of experience in software development and data engineering with ownership of production-grade systems.. Deep expertise in PySpark and scaling Spark in production environments (Databricks, EMR, etc.).. Strong knowledge of distributed computing and modern data modeling for scalable systems.. Proficiency in Python and implementation of software engineering best practices.. Hands-on experience with both relational (PostgreSQL, MySQL) and NoSQL (MongoDB, DynamoDB, Cassandra) databases.. Excellent problem-solving skills and ability to communicate effectively across teams.. Bachelor’s degree in Computer Science or related field, or equivalent practical experience.. Experience mentoring and influencing cross-functional teams.. Familiarity with MLOps pipelines and integrating ML models into data workflows is a plus.. Previous experience in early-stage, high-growth environments is advantageous.. Company Location: United States.