Senior Data Engineer specializing in Azure Databricks, Apache Spark, and modern data platform architecture. Proven experience in designing enterprise-scale ETL/ELT solutions, implementing Lakehouse architectures using Delta Lake, and optimizing large-scale data pipelines on Azure. Skilled in PySpark, SQL, ADF, ADLS, Synapse Analytics, CI/CD automation, and data governance frameworks. Adept at collaborating with business stakeholders, data scientists, and analytics teams to deliver robust, scalable, and cost-efficient data solutions that drive business insights and operational excellence.
Qualifications
3- 5 years of experience in Data Engineering.
Strong hands-on experience with Databricks.
Expertise in Apache Spark (PySpark or Scala Spark).
Proficiency in Python, SQL, PostgreSQL.
Experience designing and implementing ETL/ELT pipelines.
Strong understanding of data warehousing concepts and data modeling.
Experience working with large-scale distributed data processing systems.
Knowledge of CI/CD processes and version control tools such as Git.
Strong analytical, troubleshooting, and problem-solving skills.
Excellent communication and collaboration skills.
Experience with Microsoft Azure cloud services, including:
Azure Databricks
Azure Data Factory (ADF)
Azure Data Lake Storage (ADLS)
Azure Synapse Analytics
Azure SQL Database
Azure Event Hubs
Azure Functions
Azure DevOps
Experience with Delta Lake and Lakehouse architecture.
Knowledge of real-time data processing and streaming technologies such as Kafka.
Experience with Infrastructure as Code (Terraform, ARM Templates, or Bicep).
Familiarity with data governance and security best practices.
Responsibilities
Design, develop, and maintain scalable data pipelines using Databricks and Apache Spark.
Build and optimize ETL/ELT workflows for processing large volumes of structured and unstructured data.
Develop data ingestion frameworks from various data sources including databases, APIs, files, and streaming platforms.
Implement data transformation, cleansing, and enrichment processes.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Collaborate with business stakeholders, data analysts, and data scientists to understand data requirements.
Ensure data quality, reliability, and governance across data platforms.
Monitor and troubleshoot production data pipelines and resolve performance bottlenecks.
Participate in code reviews, architecture discussions, and best practice implementation.
Create and maintain technical documentation for data solutions and workflows.