Skip to main content

Apache Spark and Apache Zeppelin Visualization Tool

Apache Spark and Apache Zeppelin


Step-1: Installation and configuration of Apache Zeppelin
        https://zeppelin.apache.org/download.html

Step-2: Extract Apache Zeppelin and move it to /usr/lib directory.
                sudo tar xvf  zeppelin-*-bin-all.tgz 
        move  zepline  to  /usr/lib/directory
 









Step-3: Install Java development kit in ubuntu and set JAVA_HOME variable.
        echo $JAVA_HOME
                create     zepplin-env.sh  and zeppline-site.xml  from template files.
                open   zepplin-env.sh
                set          JAVA_HOME=       /path/
                set          SPARK_HOME=    /path/
  
Step-4: Update port number for apache zeppelin server           
                zepplin-site.xml
               check port Number for zepplin server 8082.
                Start zepplin servre at localhost:8082 port.
 

Step-5: start Zeppeline Server

[root@quickstart zeppelin-0.7.3-bin-all]# bin/zeppelin-daemon.sh  start
Log dir doesn't exist, create /usr/lib/zeppelin-0.7.3-bin-all/logs
Pid dir doesn't exist, create /usr/lib/zeppelin-0.7.3-bin-all/run
Zeppelin start                                             [  OK  ]
 

Step-6: Set default Interpreter spark.

Step-7: Run Spark Sql Code and Enjoy with all type of graph.
val file=sc.textFile("file:///home/cloudera/emp.txt")
case class Employee(eno:Int,ename:String,location:String,sal:Int) val sal=file.map(_.split(",")).map(e=>Employee(e(0).trim.toInt,e(1),e(2),e(3).trim.toInt)).toDF() sal.printSchema() sal.registerTempTable("emp") sqlContext.sql("select location,sum(sal) from emp group by location")
Thank You!

Comments

Popular posts from this blog

Apache Hadoop cluster setup guide

APACHE HADOOP CLUSTER SETUP UBUNTU 16 64 bit Step 1: Install ubuntu os system for master and slave nodes.             1-install vmware workstation14.             https://www.vmware.com/in/products/workstation-pro/workstation-pro-evaluation.html             2- install Ubuntu 16os-64bit  for masternode using vmware             3- install Ubuntu 16os-64bit  for  slavenode  using vmware Step-2 : Update root password so that you can perform all admin level operations.                          sudo passwd root command to set new root password Step-3-Creating a User  from root user for Hadoop Eco System.             It is recommended to create a separate user for Hadoop to isolate Hadoop file system from          ...

Hadoop distributed file System-HDFS

Hadoop Distributed File System HDFS is a management of hadoop file system for example every Operating System do have FS like in windows we have NTFS , FAT32 managing metadata about your directories and files same in hdfs Master node [namenode] is managing metadata about all file and directories that are present in whole cluster. Hadoop is not a single tool and having distributed file System. when user upload data at HDFS then hadoop distribute data across multiple nodes. Hdfs and map-reduce are two base component of hadoop echo system. Let us Understand the concept of hadoop distributed file system. 1- hadoop is working on cluster computing concept that is having master slave architecture. 2- Master and slaves are machine serving couple of services. Master:      1-NameNode     2-Job Tracker.     3-Secondary namenode. Slave:     1-Data Nmode     2-Task Tracker     3- Child Jvm. 1-Job Tracker ...