Data Engineer at Sunset

We are redirecting you to the source. If you are not redirected in 3 seconds, please click here.

Data Engineer at Sunset. Remote Location: New York. About Sunset. Sunset is building the data layer for real-world AI training. We work with frontier labs to turn messy, multi-modal enterprise data into the highest-quality training data on the market — sourced from the hundreds of venture-backed startups we've helped wind down.. We're a fast-growing team based in-person in Dumbo, Brooklyn. Backed by Floodgate, Afore Capital, Hustle Fund, and incredible entrepreneurs. . The Role. As a Data Engineer at Sunset, you'll own the pipeline that turns raw, chaotic enterprise data into the highest-quality training data on the market. One of our core technical challenges is entity resolution and de-identification across different sources and modalities. An even deeper challenge is understanding the node structures and linkages well enough to effectively reconstruct the business world this data comes from. All of this happens on sensitive data, which means security and privacy aren't a separate workstream but are built into every pipeline, system, and decision we make.. What You'll Work On. You'll own problems end-to-end. Some examples of what you might tackle in your first 90 days:. Designing the de-identification layer that replaces PII with stable pseudonyms while preserving every relationship across every source. Building coreference resolution across Slack threads, email chains, and Linear comments so that "me," "him," and first-name mentions all resolve to the right canonical entity. Hardening how we ingest, store, and process sensitive client data — from encryption and access controls to audit trails and isolation boundaries. Extending our entity resolution pipeline to handle new modalities — think audio, video, design files, or embedded references inside documents. You Might Be a Fit If. You are a product minded engineer and have shipped data pipelines at scale. You have strong Python and are comfortable across NER, record linkage, and coreference. You take security and privacy seriously and have built systems where getting it wrong wasn't an option. You want to own a hard, ambiguous problem end-to-end rather than wait for a PRD. AI is deeply integrated into your workflow and life. This Role Might Not Be a Fit If. You want to work remote or hybrid — we're in-person 5 days/week in Dumbo. You want to do novel ML research — this role is applied, not research. You prefer long planning cycles or narrow ownership. Our Stack. Python, Postgres, Redis, AWS. We pick tools based on the problem, not the other way around.. Compensation & Benefits. $180K–$280K base + meaningful equity. 100% covered medical, dental, and vision. Unlimited PTO. $500 in-office setup allowance. How We Hire. Intro Chat (20 min). – mutual fit and interests. Technical Session (1hr). – collaborative problem-solving. Onsite (2–3 hrs). – product deep dive, system design, meet the team. Quick references → Offer