This course provides a well grounded introduction to Hadoop and its building blocks - HDFS, Map/Reduce, Pig, Hive, HBase, Sqoop and Flume. Anyone with a basic understanding of programming and databases (SQL) can benefit from this course. Prior knowledge of Java is not essential, although it is useful. As part of lab work, students build their own development cluster and implement a wide range of technical use cases.
Spark is fast emerging as an alternative to Hadoop & Map/Reduce due to its speed. Spark Programming is often necessary to address complex processing loads, involving huge data volumes, which can't processed by Hadoop in a timely manner. Its in-memory computing engine makes Spark the choice of platform for real-time analytics, which requires high speed data ingestion and processing within seconds. A whole new generation of analytics applications is now emerging to process geo-location data, streaming web events, sensors data, as well as data received from mobile and wearable devices.
Cassandra is emerging as one of the most powerful Big Data stores. it is often seen as a replacement for RDBMS but nothing can be farther than truth. Cassandra is a No-SQL database designed to store Terabytes of data in memory and provide less-than-50-milliseconds response to SQL like queries. But this is possible, only if, we know where to use Cassandra and how to deploy it. Cassandra is also an essential accompaniment for many real-time analytics applications. Cassandra can be used as part of Java or .NET applications using a variety of readily available Drivers.
Data ingestion is an art and science in itself. Ingesting data effectively into a Hadoop cluster or any other data store, requires a good understanding of the source and sink with an ability to configure data pipelines. Ingestion becomes a complex task, as we source events from multiple sources in parallel and need to deliver them to various destinations in real-time. High-speed data ingestion is especially critical when implementing real-time analytics.
Big Data Strategy Definition
Architecting Data Platform
Building End To End Analytics Application
Front End Re-Engineering
Back End Re-Engineering