
Data Engineer for AI Offshorly Ltd.. Location Information: Philippines (Remote). . This is a remote position.. What the engineer will actually do:. . . P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.. . P1 | Develop/maintain simple Auto Loader or Fivetran . pipelines. for ERP and ticketing systems.. . P2 | Add basic text‐embedding or LLM‐based entity extraction (LangChain or open‐source transformers) to enrich the document feed.. . P3 | Write unit tests and lightweight data‐quality checks (Great Expectations) so parsing errors do not break the pipeline.. . P3 | Produce concise handover docs for our future data architect.. . . Skill Set:. Must‐have (core):. . . 2‐4 years building . ETL. or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).. . Solid Python (pandas, . PySpark. ) and experience parsing Office files with libraries such as python‐pptx, openpyxl, pdfplumber, or PyPDF.. . Basic SQL tuning and ability to work with structured schemas.. . Git and CI/CD familiarity.. . Nice‐to‐have (bonus):. . . Exposure to LangChain, Hugging Face Transformer, or any LLM inference workflow.. . Experience adding embeddings to tables for downstream ML or search.. . Great Expectations or similar data‐quality tooling.. . Familiarity with Unity Catalog or Snowflake RBAC concepts.. . . . .