Exploring the Power of Databricks

Author: Wavicle Data Solutions


This fall Wavicle successfully conducted a two-part workshop session at Kumaraguru College of Technology, Coimbatore, India.

 

The first part was on discussions and introductions to AWS and Databricks. The second part was a virtual workshop to review a cancer detection use case.

 

The workshop was delivered by:

  • Venkatesa Prasannaa Selvaraj, Director – Data Analytics
  • Sivasubramanian Thirugnanasambandam, Data Architect
  • Mohankumar Balasubramaniyam, Senior Data Scientist
  • Selva Prabhu Subburathinam, Lead Data Engineer
  • Mohanraj Balakrishnan, Senior Data Scientist
  • Sharan Antony, Data Scientist

 

Introductions to AWS+Databricks

In this workshop, we showed the simple steps needed to set up AWS environments using a Databricks notebook on the Databricks cloud platform. In addition, this workshop covered the foundational concepts necessary to help code in Python & spark, focusing on data analysis. No prior programming knowledge was required.

 

Data analysis with pandas

The second part of the workshop was on pandas, a powerful open-source Python package for data analysis and manipulation. Students learned to read data, compute summary statistics, check data distributions, conduct basic data cleaning and transformation, and plot simple visualizations. Although no prep work is required, we did recommend basic python knowledge.

 

Introduction to ML: scikit-learn

The workshop’s second part also covered machine learning, the different types of machine learning, and how to build a simple machine learning model using Scikit-learn, a prevalent open-source machine learning library used by Data scientists across the globe. The workshop further focused on the techniques of applying and evaluating machine learning methods and some of the statistical concepts behind them.

 

We had an overall attendance of over 150+ students.

 

The work was conducted as part of the Databricks India User Group community, which Wavicle helped create.