Before getting into any Spark implementation or testing you would need a correct Spark environment/set up. In this post, I am going to tell you how to set up the spark in your Windows environment.
The steps are very simple, as the title says our objective is to setup PySpark on windows, there is no specific prerequisite is required. So to avoid all misconceptions we just need to follow the below steps to get this set-up ready. Here I have assumed you already have Java installed.
Steps 1-

Step 2-

Step 3-

Step 4-
At this point, we have done with the setup but I found something very important from this blog which is optional but to avoid some errors when you work with SPark with the hive.
Optional: Some tweaks to avoid future errors -

Step 5- Check the installation
The steps are very simple, as the title says our objective is to setup PySpark on windows, there is no specific prerequisite is required. So to avoid all misconceptions we just need to follow the below steps to get this set-up ready. Here I have assumed you already have Java installed.
Steps 1-
- Goto Spark official site and get the Spark distribution download from http://spark.apache.org/downloads.html
- You can download any latest stable release, as I have highlighted "spark-2.4.5-bin-hadoop2.7.tgz". As it is a pre-built of Hadoop, you will also get the Spark with Scala 2.11.
- Extract this .tgz file in your C:\ directory, in my case I have WinRAR installed by which I can easily extract this .tgz file
Step 2-
- Download winutils.exe from Hadoop binaries repository https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe
- Save this downloaded file into your Spark bin directory-
Step 3-
- Now setup the environment paths for Spark.
- Go to "Advanced System Settings" and set below paths
- JAVA_HOME="C:\Program Files\Java\jdk1.8.0_181"
- HADOOP_HOME="C:\spark-2.4.0-bin-hadoop2.7"
- SPARK_HOME="C:\spark-2.4.0-bin-hadoop2.7"
- Also add their bin path into PATH system variable, as shown below
Step 4-
At this point, we have done with the setup but I found something very important from this blog which is optional but to avoid some errors when you work with SPark with the hive.
Optional: Some tweaks to avoid future errors -
- Create folder C:\tmp\hive
- Open your Command Prompt CMD as an administrator
- Give the full right to this temp hive directory using below command
- Check the given permission
winutils.exe ls -F D:\tmp\hive
Step 5- Check the installation
- Open your cmd and run command "spark-shell"