Datamatics Data Engineer(Data Analyst (Databricks, Apache Spark, and Delta Lake, GenAI or AI/ML integrations).)

Data Engineer(Data Analyst (Databricks, Apache Spark, and Delta Lake, GenAI or AI/ML integrations).)

Datamatics

Negociable

RemotoExp de 3-5 YrsDiplomaContrato

Detalles remotos

Abrir país：Filipinas

Requisitos de idioma：Inglés

Este trabajo remoto está abierto a candidatos en países específicos. Por favor, confirme si desea continuar a pesar de las posibles restricciones de ubicación

Descripción del trabajo

Descripción

Job Role: Data Analyst (Databricks, Apache Spark, and Delta Lake, GenAI or AI/ML integrations).

Location: Manila.

Duration: 6+ Months Contract.

Job Description:

Scope of Work/Responsibilities

1. Data Pipeline Development:

Design, implement, and optimize end-to-end data pipelines using Databricks and related technologies.
Build workflows to handle large-scale data ingestion, transformation, and storage.

2. Data Preparation for LLMs:

Preprocess, clean, and structure diverse datasets (text, structured, and unstructured) for LLM training and fine-tuning.
Implement feature engineering, tokenization, and vectorization techniques to support NLP models.

3. Performance Optimization:

Use Databricks features, including Delta Lake and MLflow, to streamline data workflows.
Optimize data infrastructure for high availability, scalability, and cost-efficiency.

4. Collaboration with Teams:

Work closely with data scientists, ML engineers, and other stakeholders to understand data requirements for LLM technology requirements.
Ensure alignment between engineering pipelines and machine learning goals.

5. Data Quality & Governance:

Implement processes to ensure data quality, consistency, and compliance with governance policies.
Monitor and maintain data integrity throughout the pipeline lifecycle.

6. Emerging Technology Adoption:

Stay updated on advancements in Databricks, generative AI, and LLM technologies.
Contribute to the adoption of innovative tools and practices to improve workflows.

Requirement and Qualification (Education & Work Experience)

Experience:

7+ years of experience in data engineering roles, with at least 2 years in a leadership role and projects involving Databricks.
Proven expertise in data pipelines, feature engineering, and dataset preparation for machine learning, specifically LLMs.
Experience building enterprise-grade applications with GenAI or AI/ML integrations.

Technical Skills:

Expertise in Databricks, Apache Spark, and Delta Lake.
Strong programming skills in Python and SQL; knowledge of libraries like pandas, NumPy, or PyTorch is a plus
Understanding of state management libraries like Redux, Recoil, or Zustand.Cypress), and version control (Git).
Understanding of web security principles and compliance requirements for enterprise applications.

Soft Skills:

Exceptional problem-solving and decision-making abilities.
Excellent communication and leadership skills, with the ability to guide technical discussions and mentor team members.
Strong focus on delivering quality