Install Java
Java is required to run pyspark
Docker
Docker container is on Debian jessie, jessie-backports needs to be added for OpenJDK
echo "deb http://cdn-fastly.deb.debian.org/debian jessie-backports main" >> /etc/apt/sources.list
apt-get update -y
Please note the -t jessie-backports option
apt-get install -t jessie-backports openjdk-8-jdk -y
Download & Install Spark
download spark 2.2.0 pre-built for hadoop 2.7 [https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz] untar spark distribution under /opt/spark
Configure Spark
configure the following environment variables in /opt/spark/conf/spark-env.sh For example:
SPARK_LOCAL_IP=192.168.1.11
SPARK_WORKER_MEMORY=4g
SPARK_WORKER_CORES=2
Make the default spark rdd directory group writable
mkdir -p /var/lib/spark/rdd
chown -R spark: spark
chmod g+w -R spark
No comments:
Post a Comment