Experience: 5 to 10 years
Location: Chennai/ Coimbatore/ Remote
Job Type: Full Time – Permanent
- • 5+ years of hands-on data engineering experience, with at least 2 years working extensively in Databricks on enterprise-scale workloads
- • Expert-level PySpark proficiency — must be able to write, review, and optimize complex transformations, understand the Catalyst optimizer, and diagnose runtime behavior
- • Deep experience with Databricks Declarative Pipelines (Delta Live Tables) — including expectations, pipeline modes, table types (streaming vs. materialized), and event-driven triggering
- • Proven understanding of Autoloader: configuration, schema inference and evolution, checkpointing, and operational best practices
- • Ability to troubleshoot long-running MERGE operations — including understanding write amplification, file compaction, and transaction log behavior in Delta Lake
- • Demonstrated ability to diagnose pipeline performance issues and distinguish between cluster sizing problems vs. code inefficiencies vs. data volume/skew issues
- • Strong knowledge of common PySpark and pipeline performance bottlenecks: data skew, excessive shuffles, poor partition strategy, broadcast join misuse, UDF overhead
- • Strong SQL proficiency — complex transformations, window functions, query optimization
- • Experience building within a medallion architecture (Bronze / Silver / Gold) in a production environment
- • Familiarity with Azure cloud services relevant to the Databricks ecosystem (Azure Data Lake Storage, Azure Key Vault, Azure Monitor, etc.)
Preferred Qualifications
- • Databricks Certified Data Engineer Professional certification (Associate minimum)
- • Experience with Databricks serverless compute and understanding of its trade-offs vs. classic clusters
- • Familiarity with Databricks streaming workloads using Structured Streaming
- • SQL Server experience — understanding of source system structures common in enterprise retail environments
- • Familiarity with Power BI data models and downstream semantic layer considerations
- • Background in retail, fuel/convenience, or CPG data engineering