Skip to main content

What is Big data & Apache Hadoop.

WHAT IS BIG DATA & APACHE HADOOP



Big data & hadoop is emerging technology in current IT sector and many professionals are looking there career in data science. apache hadoop is a powerful framework that deal with Big data.

Big data is a problem ! Now Question is how data can be a problem...yes it is because now we have data in PBs the Square Kilometre Array (SKA)  are generating 20,000 PB data / day facebook,google,yahoo,twitter and other top organizations are creating Big Data -huge amount of data.

We can define big data by volume ,velocity and variety as we are not dealing with structured data only now we have unstructured and semi structured data as well like audio,video,text,json etc.

Apache Hadoop is an open source product that is solution for Big data problem. you would not like to wait for a result like web response just think if you will get youtube recommended videos after two days.

Hadoop is not a single product it is a framework having couple of tools like hdfs, map-reduce,hive,pig,sqoop,flume etc that we will discuss in next blog. Hadoop services are running on a cluster that is having master slave architecture Industries are using apache hadoop and apache spark togather for data analysis.

Apache Spark Framework having analytical modules like spark sql and spark streaming for realtime data processing.

Comments

Popular posts from this blog

KAFKA CLUSTER SETUP GUIDE

KAFKA CLUSTER SETUP Step-1: Set password less communication in master and slave machine 1-check communication in both machine. [ping master.node.com  /   ping slave.node.com] 2- set fully qualified domanin name [/etc/host] 3- su root [master/slave machine] 4- change hostname  /etc/hostname file.... hostname -f master.node.com 3-update password less ssh in master and slave. check previous blog. http://hadoop-edu.blogspot.com/2018/09/installation-of-apache-hadoop.html Step-2: Extract Kafka and Zookeeper and Update bashrc file. 1-   /usr/lib/kafka/...   [tar kafka here]         [in both machine] 2-   /usr/lib/zoo/...       [tar zookeeper here] [in both machine]       tar -xvfz  zookeeper-3.4.10.tar.gz   [master & slave] nano ~/.bashrc export ZOOKEEPER_HOME=/usr/lib/zoo/zookeeper-3.4.10 export KAFKA_HOME=/usr/lib/kafka/kafka_2.11-1.1.0 P...

Apache Spark and Apache Zeppelin Visualization Tool

Apache Spark and Apache Zeppelin Step-1: Installation and configuration of Apache Zeppelin https://zeppelin.apache.org/download.html Step-2: Extract Apache Zeppelin and move it to /usr/lib directory. sudo tar xvf   zeppelin-*-bin-all.tgz move   zepline   to   /usr/lib/directory   Step-3: Install Java development kit in ubuntu and set JAVA_HOME variable. echo $JAVA_HOME create     zepplin-env.sh   and zeppline-site.xml   from template files. open zepplin-env.sh                set          JAVA_HOME=        /path/                set          SPARK_HOME=     /path/ ...

Apache Hadoop cluster setup guide

APACHE HADOOP CLUSTER SETUP UBUNTU 16 64 bit Step 1: Install ubuntu os system for master and slave nodes.             1-install vmware workstation14.             https://www.vmware.com/in/products/workstation-pro/workstation-pro-evaluation.html             2- install Ubuntu 16os-64bit  for masternode using vmware             3- install Ubuntu 16os-64bit  for  slavenode  using vmware Step-2 : Update root password so that you can perform all admin level operations.                          sudo passwd root command to set new root password Step-3-Creating a User  from root user for Hadoop Eco System.             It is recommended to create a separate user for Hadoop to isolate Hadoop file system from          ...