Posts

Showing posts from 2018

Working With Apache Spark, Python and PySpark

Image
1. Environment ·           Hadoop  Version: 3.1.0 ·           Apache Kafka Version: 1.1.1 ·           Operating System: Ubuntu 16.04 ·           Java Version: Java 8 2. Prerequisites Apache Spark requires Java. To ensure that Java is installed, first update the Operating System then try to install it: sudo  apt-get update sudo  apt-get –y upgrade sudo  add-apt-repository -y ppa:webupd8team/java sudo  apt-get install oracle-java8-installer  3. Installing Apache Spark 3.1. Download and install Spark First, we need to create a directory for apache Spark. sudo   mkdir  /opt/spark Then, we need to download apache spark binaries package. wget  “ http://www-eu.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz ” Next,...

Apache Kafka and flume installation guide (import data from Kafka to HDFS)

Image
This article contains a complete guide on how to install Apache Kafka, creating Kafka topics, publishing and subscribing Topic messages. In addition, it contains Apache Flume installation guide and how to import Kafka topic messages into HDFS using Apache Flume. 1. Environment ·           Hadoop  Version: 3.1.0 ·           Apache Kafka Version: 1.1.1 ·           Apache Flume Version: 1.8.0 ·           Operating System: Ubuntu 16.04 ·           Java Version: Java 8 2. Prerequisites 2.1. Install Java Apache Kafka requires Java. To ensure that Java is installed first update the Operating System then try to install it: sudo  apt-get update sudo  apt-get upgrade sudo  add-apt-repository -y ppa:webupd8team/java...