Hadoop is open source software framework for distributed storage and distributed processing of large datasets (BigData)
Hadoop cluster are build on commodity hardware
Hadoop is used for processing structures and unstructured datasets.
There was time when processing large data require supercomputer or special hardware. Hadoop changed it. It built a mechanism to process large amount of data on commodity hardware. It use industry standard computer and does not use any specialized hardware.
Hadoop is designed to scale from one computer to thousand of machines.
Hadoop can handle any file format. It can ingest structured, unstructured data. Earlier a comany use to get data from company transaction system which was well defined and data used to come in structured format. Today business rely not only on internal data but also on external data which comes in form of weather data, emails, tweet, Facebook post, queries, search logs.
Hadoop is ideal place to process such data and prepare a meaningful set.
You can deploy Hadoop cluster on-premise or you can deploy Hadoop cluster on AWS, Azure cloud platform. When you use cloud - you need not have to worry about set up cost. One main benefit of using cloud is - most of time customer does not run Hadoop job continuously. if you need Hadoop to run 1 hour job every week - Cloud is best platform. You schedule to add computer to cluster when job need to be run and release after job complete.
Like us on facebook