Apache Spark and Apache Zeppelin
Step-1: Installation and configuration of Apache Zeppelin
https://zeppelin.apache.org/download.html
Step-2: Extract Apache Zeppelin and move it to /usr/lib directory.
sudo tar xvf zeppelin-*-bin-all.tgz
move zepline to /usr/lib/directory

Step-3: Install Java development kit in ubuntu and set JAVA_HOME variable.
echo $JAVA_HOME
create zepplin-env.sh and zeppline-site.xml from template files.
open zepplin-env.sh
set JAVA_HOME= /path/
set SPARK_HOME= /path/
Step-4: Update port number for apache zeppelin server
zepplin-site.xml
check port Number for zepplin server 8082.
Start zepplin servre at localhost:8082 port.
Step-5: start Zeppeline Server
[root@quickstart zeppelin-0.7.3-bin-all]# bin/zeppelin-daemon.sh start
Log dir doesn't exist, create /usr/lib/zeppelin-0.7.3-bin-all/logs
Pid dir doesn't exist, create /usr/lib/zeppelin-0.7.3-bin-all/run
Zeppelin start [ OK ]
Step-6: Set default Interpreter spark.
Step-7: Run Spark Sql Code and Enjoy with all type of graph.
val file=sc.textFile("file:///home/cloudera/emp.txt")
case class Employee(eno:Int,ename:String,location:String,sal:Int)
val sal=file.map(_.split(",")).map(e=>Employee(e(0).trim.toInt,e(1),e(2),e(3).trim.toInt)).toDF()
sal.printSchema()
sal.registerTempTable("emp")
sqlContext.sql("select location,sum(sal) from emp group by location")
Thank You!
Comments
Post a Comment