WHAT IS BIG DATA & APACHE HADOOP
Big data & hadoop is emerging technology in current IT sector and many professionals are looking there career in data science. apache hadoop is a powerful framework that deal with Big data.
Big data is a problem ! Now Question is how data can be a problem...yes it is because now we have data in PBs the Square Kilometre Array (SKA) are generating 20,000 PB data / day facebook,google,yahoo,twitter and other top organizations are creating Big Data -huge amount of data.
We can define big data by volume ,velocity and variety as we are not dealing with structured data only now we have unstructured and semi structured data as well like audio,video,text,json etc.
Apache Hadoop is an open source product that is solution for Big data problem. you would not like to wait for a result like web response just think if you will get youtube recommended videos after two days.
Hadoop is not a single product it is a framework having couple of tools like hdfs, map-reduce,hive,pig,sqoop,flume etc that we will discuss in next blog. Hadoop services are running on a cluster that is having master slave architecture Industries are using apache hadoop and apache spark togather for data analysis.
Apache Spark Framework having analytical modules like spark sql and spark streaming for realtime data processing.
Comments
Post a Comment