Hadoop is the rapidly expanding technology stack that is the go-to solution for big data analysis. HDInsight is the framework for the Microsoft Azure cloud implementation of Hadoop.
Hadoop clusters in HDInsight use versions of the Hortonworks Data Platform (HDP) distribution and the set of Hadoop components within that distribution.
Apache Hadoop is a software framework for big data management and analysis. Apache Hadoop core provides reliable data storage with the Hadoop Distributed File System (HDFS), and a simple MapReduce programming model to process and analyze, in parallel, the data stored in this distributed system. HDFS uses data replication to address hardware failure issues that arise when deploying such highly distributed systems.
Here are the Hadoop technologies in HDInsight:
- Cluster provisioning, management, and monitoring
- Avro (Microsoft .NET Library for Avro): Data serialization for the Microsoft .NET environment
- HBase Non-relational database for very large tables
- HDFS Hadoop Distributed File System
- Hive: SQL-like querying
- Mahout: Machine learning
- MapReduce and YARN: Distributed processing and resource management
- Oozie: Workflow management
- Pig: Simpler scripting for MapReduce transformations
- Sqoop: Data import and export
- Storm: Real-time processing of fast, large data streams
- Zookeeper: Coordinates processes in distributed systems