setup local development environment for pyspark in Windows

  1. Install Java(8+)
    https://www.java.com/en/download/help/download_options.xml#windows
    https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    Add to Environment Variables
    JAVA_HOME C:\Programs Files\jdk1.8.0_192
    add to Path C:\Program Files\Java\jdk1.8.0_191\bin
    C:\Program Files\Java\jre1.8.0_191\bin

    1. Setup Hadoop winutils:
      Download from this GitHub repo https://github.com/steveloughran/winutils
      Copy bin folder under Hadoop 2.7.1 to your location. For example C:\ProgramData\wintuils
      Add to Environment Variables
      HADOOP_HOME C:\ProgramData\winutils
      add to Path %HADOOP_HOME%\bin

    2. Install Anaconda(5.3)
      https://www.anaconda.com/download/
      Downgrade python to 3.6.5 since python 3.7 may not compatible with some packages

    3. Install Spark(2.3.2 recommended) with Hadoop 2.7
      Download .tgz from https://www.apache.org/dyn/closer.lua/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz

    You might need to install 7zip tp unzip .tgz, move unzipped folder to a separate location, for example, C:\

    Add to Environment Variables
    SPARK_HOME C:\spark-2.3.2-bin-hadoop2.7
    Add to Path %SPARK_HOME%\bin
    To verify it, open Command Prompt, cd to bin folder of SPARK_HOME, and type pyspark

    1. Install Pycharm
      https://www.jetbrains.com/pycharm/download/#section=windows
      set up interpreter in pycharm, using Conda Environment and adding packages by click "+"

    2. For command line run, using pip install -r requirements.txt
      Using pip freeze > requirements.txt when updating dependencies
      Or open Pycharm the Settings/Preferences dialog (Ctrl+Alt+S) and select Tools | Python Integrated Tools.
      In the Package requirements file field, type the name of the requirements file or click the browse button and locate the desired file.

    Click OK to save the changes.

H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now
Logo
Center