Apache Spark & Apache KAFKA

APACHE SPARK AND KAFKA

Apache Spark is a framework that does not have its own file system so Spark is taking benefit of apache hadoop and yarn that is cluster resourse management system and its is part of hadoop eco system.

Do you think Kafka and spark are competitor ! according to me spark is different than apche kafka so let us discuss the difference between apache spark and apache kafka.

1- Apache Kafka is distributed messaging framework that can handle big volume of messages.

2- Spark is framework having couple of component that you are using for big data analysis.

3-Kafka messaging system are based on producers and consumers ..one can send message to broker and broker is broadcasting messages to multiple consumers.

4- internally kafka is using socket programming.

5- Apache spark is having spark streaming module where you can deal with real time data.

you can create a program to fetch real time stream from any server and then you can pass this stream to kafka producers.

6- Apache Spark is having Spark SQL module where you can create data frames or schema RDD.

and can run sql queries at the top of schema RDD.

7- To implement Kafka Cluster zookeeper must be installed in multiple child node.

8- you can write spark application in Java,scala and python programming language.

9- you can write Kafka producer and consumers using node.js, python, java , scala .

Check Kafka installation guide.
http://hadoop-edu.blogspot.com/2018/09/kafka-cluster-setup-guide.html

Big Data & Hadoop

Search This Blog

Apache Spark & Apache KAFKA

APACHE SPARK AND KAFKA

Labels

Comments

Post a Comment

Popular posts from this blog

KAFKA CLUSTER SETUP GUIDE

Apache Hadoop cluster setup guide

Apache Spark and Apache Zeppelin Visualization Tool