Skip to main content

Apache Spark & Apache KAFKA

APACHE SPARK AND KAFKA




Apache Spark is a framework  that does not have its own file system so Spark is taking benefit of apache hadoop and yarn that is cluster resourse management system and its is part of hadoop eco system.

Do you think Kafka and spark are competitor ! according to me spark is different than apche kafka  so let us discuss the difference between apache spark and apache kafka.

1- Apache Kafka is distributed messaging framework that can handle big volume of messages.

2- Spark is framework having couple of component that you are using for big data analysis.

3-Kafka messaging system are based on producers and consumers ..one can send message to               broker and broker is  broadcasting messages to multiple consumers.

4- internally kafka is using socket programming.

5- Apache spark is having spark streaming module where you can deal with real time data.
    you can create a program to fetch real time stream from any server and then you can pass this              stream to kafka producers.

6- Apache Spark is having Spark SQL module where you can create data frames or schema RDD.
    and can run sql queries at the top of schema RDD. 

7- To implement Kafka Cluster  zookeeper must be installed in multiple child node.

8- you can write spark application in Java,scala and python programming language.

9- you can write Kafka producer and consumers using node.js, python, java , scala . 

Comments

Popular posts from this blog

KAFKA CLUSTER SETUP GUIDE

KAFKA CLUSTER SETUP Step-1: Set password less communication in master and slave machine 1-check communication in both machine. [ping master.node.com  /   ping slave.node.com] 2- set fully qualified domanin name [/etc/host] 3- su root [master/slave machine] 4- change hostname  /etc/hostname file.... hostname -f master.node.com 3-update password less ssh in master and slave. check previous blog. http://hadoop-edu.blogspot.com/2018/09/installation-of-apache-hadoop.html Step-2: Extract Kafka and Zookeeper and Update bashrc file. 1-   /usr/lib/kafka/...   [tar kafka here]         [in both machine] 2-   /usr/lib/zoo/...       [tar zookeeper here] [in both machine]       tar -xvfz  zookeeper-3.4.10.tar.gz   [master & slave] nano ~/.bashrc export ZOOKEEPER_HOME=/usr/lib/zoo/zookeeper-3.4.10 export KAFKA_HOME=/usr/lib/kafka/kafka_2.11-1.1.0 P...

Apache Hadoop cluster setup guide

APACHE HADOOP CLUSTER SETUP UBUNTU 16 64 bit Step 1: Install ubuntu os system for master and slave nodes.             1-install vmware workstation14.             https://www.vmware.com/in/products/workstation-pro/workstation-pro-evaluation.html             2- install Ubuntu 16os-64bit  for masternode using vmware             3- install Ubuntu 16os-64bit  for  slavenode  using vmware Step-2 : Update root password so that you can perform all admin level operations.                          sudo passwd root command to set new root password Step-3-Creating a User  from root user for Hadoop Eco System.             It is recommended to create a separate user for Hadoop to isolate Hadoop file system from          ...

Apache Spark and Apache Zeppelin Visualization Tool

Apache Spark and Apache Zeppelin Step-1: Installation and configuration of Apache Zeppelin https://zeppelin.apache.org/download.html Step-2: Extract Apache Zeppelin and move it to /usr/lib directory. sudo tar xvf   zeppelin-*-bin-all.tgz move   zepline   to   /usr/lib/directory   Step-3: Install Java development kit in ubuntu and set JAVA_HOME variable. echo $JAVA_HOME create     zepplin-env.sh   and zeppline-site.xml   from template files. open zepplin-env.sh                set          JAVA_HOME=        /path/                set          SPARK_HOME=     /path/ ...