Big Data & Hadoop

Posts

Showing posts from September, 2018

KAFKA CLUSTER SETUP GUIDE

KAFKA CLUSTER SETUP Step-1: Set password less communication in master and slave machine 1-check communication in both machine. [ping master.node.com / ping slave.node.com] 2- set fully qualified domanin name [/etc/host] 3- su root [master/slave machine] 4- change hostname /etc/hostname file.... hostname -f master.node.com 3-update password less ssh in master and slave. check previous blog. http://hadoop-edu.blogspot.com/2018/09/installation-of-apache-hadoop.html Step-2: Extract Kafka and Zookeeper and Update bashrc file. 1- /usr/lib/kafka/... [tar kafka here] [in both machine] 2- /usr/lib/zoo/... [tar zookeeper here] [in both machine] tar -xvfz zookeeper-3.4.10.tar.gz [master & slave] nano ~/.bashrc export ZOOKEEPER_HOME=/usr/lib/zoo/zookeeper-3.4.10 export KAFKA_HOME=/usr/lib/kafka/kafka_2.11-1.1.0 P...

Apache Spark & Apache KAFKA

APACHE SPARK AND KAFKA Apache Spark is a framework that does not have its own file system so Spark is taking benefit of apache hadoop and yarn that is cluster resourse management system and its is part of hadoop eco system. Do you think Kafka and spark are competitor ! according to me spark is different than apche kafka so let us discuss the difference between apache spark and apache kafka. 1- Apache Kafka is distributed messaging framework that can handle big volume of messages. 2- Spark is framework having couple of component that you are using for big data analysis. 3-Kafka messaging system are based on producers and consumers ..one can send message to broker and broker is broadcasting messages to multiple consumers. 4- internally kafka is using socket programming. 5- Apache spark is having spark streaming module where you can deal with real time data. you can create ...

Apache Spark and Apache Zeppelin Visualization Tool

Apache Spark and Apache Zeppelin Step-1: Installation and configuration of Apache Zeppelin https://zeppelin.apache.org/download.html Step-2: Extract Apache Zeppelin and move it to /usr/lib directory. sudo tar xvf zeppelin-*-bin-all.tgz move zepline to /usr/lib/directory Step-3: Install Java development kit in ubuntu and set JAVA_HOME variable. echo $JAVA_HOME create zepplin-env.sh and zeppline-site.xml from template files. open zepplin-env.sh set JAVA_HOME= /path/ set SPARK_HOME= /path/ ...

Hadoop distributed file System-HDFS

Hadoop Distributed File System HDFS is a management of hadoop file system for example every Operating System do have FS like in windows we have NTFS , FAT32 managing metadata about your directories and files same in hdfs Master node [namenode] is managing metadata about all file and directories that are present in whole cluster. Hadoop is not a single tool and having distributed file System. when user upload data at HDFS then hadoop distribute data across multiple nodes. Hdfs and map-reduce are two base component of hadoop echo system. Let us Understand the concept of hadoop distributed file system. 1- hadoop is working on cluster computing concept that is having master slave architecture. 2- Master and slaves are machine serving couple of services. Master: 1-NameNode 2-Job Tracker. 3-Secondary namenode. Slave: 1-Data Nmode 2-Task Tracker 3- Child Jvm. 1-Job Tracker ...

Apache Hadoop cluster setup guide

APACHE HADOOP CLUSTER SETUP UBUNTU 16 64 bit Step 1: Install ubuntu os system for master and slave nodes. 1-install vmware workstation14. https://www.vmware.com/in/products/workstation-pro/workstation-pro-evaluation.html 2- install Ubuntu 16os-64bit for masternode using vmware 3- install Ubuntu 16os-64bit for slavenode using vmware Step-2 : Update root password so that you can perform all admin level operations. sudo passwd root command to set new root password Step-3-Creating a User from root user for Hadoop Eco System. It is recommended to create a separate user for Hadoop to isolate Hadoop file system from ...