Real Time Data Ingestion – Kinesis Overview
Analytics, BI & Data Integration together today are changing the way decisions are made. The science of data is evolving rapidly as we are not only generating heaps of data every second but also putting together systems/applications to integrate that data & analyze it. BI & Predictive Analytics is important and provide actionable insights but at the same time we need to ensure that bringing the data from various sources is the bedrock for the business intelligence and data mining.
Broadly, if I classify, most common data sources available today are:
- Organizational data which are very much a stable historic data that change at a slow pace.
- Real-time sensor data could be mobile app generated data, video or audio, or machine sensor generated data or gps data that keeps on changing rapidly
- Consumer data generated by people on social media platforms
When data from all varying data sources are combined it can lead to better analysis, better predictive models & higher precision. But to reach this stage, data ingestion is an essential step. Today, there are several database options available as per the type of data available. One such platform that processes & ingests the real time data generated from machines, apps, sensors, etc. is Amazon Kinesis platform.
Amazon Kinesis offers several applications in its platform for different needs. Following chart highlights the offerings & high-level use cases of Amazon Kinesis applications.
One of the simplest Amazon Kinesis Application is Amazon Kinesis Data Analytics. It is used to process & analyze streaming data using standard SQL language. Since this application is designed to read continuously and process streaming data it is best suited for performing time series analytics and creating real time dashboards.
There is some similarity in how we operate on relational databases and on Amazon Kinesis Data Analytics.
There are few interesting things to note about Kinesis.
- Data collection process is separate from data processing. Here, systems that input the data through some web service are called “producers” & these producers push data into “streams”.
- Several Kinesis applications can consume data from single stream with no interference of other Kinesis applications. This separation of data allows easy processing & catering to varying reporting needs. Kinesis, as of now, provides four methods to process rapidly collected data –
- Kinesis API application
- Kinesis Client Library Application
- Elastic MapR connector
- Apache Storm Spout