I have been using spark for a long time. It is an excellent, distributed computation framework. I use this regularly at work, and I also have it installed on my local desktop and laptop.
This document
is to show the installation steps for installing spark 3+ on Windows 10 in a sudo distributed
mode.
Steps:-
- Install WSL2
- Install Ubuntu 20.4 LTS from the Microsoft store.
- Install windows terminal form the Microsoft store. This step is optional. You can use PowerShell or MobaXterm
- Fire up the Ubuntu from WSL
- Once logged in, then go to home dir
- “ cd ~ ”
- For spark, we need
- Python3
- Java
- Latest Scala
- Spark with Hadoop, zip file
- Let's download and install all the prerequisite
- install python
- sudo apt-get install software-properties-common
- sudo apt-get install python-software-properties
- install Java (open JDK)
- sudo apt-get install openjdk-8-jdk
- Check the java and javac version
- java -version
- javac -version
- Install Scala
- get scala binary for unix
- wget https://downloads.lightbend.com/scala/2.13.3/scala-2.13.3.tgz
- tar xvf scala-2.13.3.tgz
- edit bashrc file to add Scala
- vi ~/.bashrc
- add these lines in the end
- export SCALA_HOME=Path-where-scala-file-is-located#/root/scala-2.13.3
- export PATH=$PATH:$SCALA_HOME/bin
- source ~/.bashrc
- scala -version
- get the spark
- I downloaded the spark from the source
- wget "https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz"
- tar xvf spark-3.1.1-bin-hadoop3.2.tgz
- vi ~/.bashrc
- export SPARK_HOME="/home/sandipan/spark-3.1.1-bin-hadoop3.2"
- export PATH=$PATH:$SPARK_HOME/bin
- source ~/.bashrc
- Start Spark Services
- cd $SPARK_HOME
- start the master Server
- ./sbin/start-master.sh
- once you start the master server you will get a message saying it had started.
- you can see the spark status in web console of master using local http://localhost:8080
- there you will see the master url
- Mine looks like:- "spark://LAPTOP-7DUT93OF.localdomain:7077"
- we can start a slave using bellow command
- SPARK_WORKER_INSTANCES=3 SPARK_WORKER_CORES=2 SPARK_WORKER_MEMORY=7G ./sbin/start-worker.sh spark://LAPTOP-7DUT93OF.localdomain:7077
- SPARK_WORKER_INSTANCES = how many worker instances you want to start
- SPARK_WORKER_CORES = how many cores per instances you want to give. Generally, I give 1 core.
- SPARK_WORKER_MEMORY = Memory per worker. Be very careful with this parameter. My laptop has 32 GB of memory, so I keep 3GB to 4GB for Windows, 2 GB for the driver program and the rest for the worker node.
- open the pyspark shell
- SPARK_HOME/bin/pyspark --master spark://LAPTOP-7DUT93OF.localdomain:7077 --executor-memory 6500mb
- SPARK_HOME/bin/spark-shell --master spark://LAPTOP-7DUT93OF.localdomain:7077 --executor-memory 6500mb
- to stop all the workers
- SPARK_WORKER_INSTANCES=3 SPARK_WORKER_CORES=2 ./sbin/stop-worker.sh spark://LAPTOP-7DUT93OF.localdomain:7077
- OR TRY
- kill -9 $(jps -l | grep spark | awk -F ' ' '{print $1}')
Thanks for Sharing!
ReplyDeleteReally a nice post.
Now You Can Easily Download Every Crack Software From Here*
Please Visit!
FastKeys Crack
MobaXterm 21.4 crack
Nitro Pro Enterprise crack
SlimWare DriverUpdate crack
MediaHuman YouTube Downloader crack
After looking through a few blog articles on your website,
ReplyDeletewe sincerely appreciate the way you blogged.
We've added it to our list of bookmarked web pages and will be checking back in the near
future. Please also visit my website and tell us what you think.
Great work with hard work you have done I appreciate your work thanks for sharing it.
CrackBins Full Version Softwares Free Download
MediaHuman YouTube Crack
Amazing blog! I really like the way you explained such information about this post to us. And a blog is really helpful for us this website.
ReplyDeleteMediaHuman YouTube Downloader Crack
ProtonVPN Crack
Razer Cortex Game Booster Crack
Antares AVOX Crack
Easy Duplicate Finder Crack
Such A nice post You have.
ReplyDeleteMediaHuman YouTube Downloader Crack
After looking through best blog article Technology Blog .
ReplyDelete