Don’t install Hadoop on Windows!

- May 10, 2020

A few years ago, I was hearing from my colleagues “don’t ever think about installing Hadoop on Windows operating system!”. I was not convinced of this saying because I am a big fan of Microsoft products, especially Windows.

In the past three years, I worked on three projects where I was asked to build a Hadoop cluster on Ubuntu. The first time, it was a single-node Hadoop installation with a single Apache Spark worker. The other projects were about building a Big Data ecosystem for radiation data engineering, where the multi-node Hadoop cluster is deployed. Besides Hadoop, we installed and configured Apache Kafka, Spark, Hive, Pig, and Flume installations (I have published some installation guides previously, you can check the links at the end of this article).

It was hard to become familiar with those technologies for the first time since they don’t have much documentation online. Each time I had to install Hadoop, I was thinking that why always Hadoop is installed on Linux-based operating systems.

One month ago, I was finally asked to build a single-node Hadoop cluster with Apache Hive and Pig installations. When I started working, I decided to write a simple installation guide for each technology.

A few days ago, I published the installation guides for Hadoop, Hive, and Pig on Windows 10. And yesterday, I finished installing and configuring the ecosystem. The only consequence I have is that “Think 1000 times before installing Hadoop and related technologies on Windows!”.

First of all, you cannot install these technologies on Windows without doing some installation hacks and workarounds since these technologies are mainly built for Linux. Besides, you will face many issues that don’t have any online resources, even on Stack Overflow and the other Stack Exchange communities. In brief, I can say that the previous three projects were easier than the last one.

For this reason, Don’t install Hadoop on Windows!

External Links

Previously published guides for Linux

Previously published guides for Windows

Radiation data Engineering project and publications

ORADIEX: A Big Data-driven smart framework for real-time surveillance and analysis of individual exposure to radioactive pollution

Search This Blog

Data Developers