Working With Apache Spark, Python and PySpark
 
       1. Environment     ·           Hadoop  Version: 3.1.0   ·           Apache Kafka Version: 1.1.1   ·           Operating System: Ubuntu 16.04   ·           Java Version: Java 8   2. Prerequisites   Apache Spark requires Java. To ensure that Java is installed, first update the Operating System then try to install it:   sudo  apt-get update   sudo  apt-get –y upgrade   sudo  add-apt-repository -y ppa:webupd8team/java   sudo  apt-get install oracle-java8-installer    3. Installing Apache Spark   3.1. Download and install Spark   First, we need to create a directory for apache Spark.   sudo   mkdir  /opt/spark   Then, we need to download apache spark binaries package.   wget  “ http://www-eu.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz ”   Next,...