Posts

Showing posts from May, 2020

Hadoop components for SSIS

Image
This month I published a series on SQL Shack about the Hadoop components added to SSIS in SQL Server 2016 release. The series is composed of three articles as following: SSIS Hadoop Connection Manager and related tasks In this article, I gave a brief introduction to Hadoop and how it is integrated with SQL Server. Then, I illustrated how to connect to the Hadoop cluster on-premises using the SSIS Hadoop connection manager and the related tasks. Importing and Exporting data using SSIS Hadoop components In this article, I briefly explained the Avro and ORC Big Data file formats. Then, I talked about Hadoop data flow task components and how to use them to import and export data into the Hadoop cluster. Then I compared those Hadoop components with the Hadoop File System Task. Finally, I concluded my work. Connecting to Apache Hive and Apache Pig using SSIS Hadoop components In this article, I talked about Hadoop Hive and Hadoop Pig Tasks. I first gave a brief...

Don’t install Hadoop on Windows!

Image
A few years ago, I was hearing from my colleagues “don’t ever think about installing Hadoop on Windows operating system!” . I was not convinced of this saying because I am a big fan of Microsoft products, especially Windows. In the past three years, I worked on three projects where I was asked to build a Hadoop cluster on Ubuntu. The first time, it was a single-node Hadoop installation with a single Apache Spark worker. The other projects were about building a Big Data ecosystem for radiation data engineering , where the multi-node Hadoop cluster is deployed. Besides Hadoop, we installed and configured Apache Kafka, Spark, Hive, Pig, and Flume installations (I have published some installation guides previously, you can check the links at the end of this article). It was hard to become familiar with those technologies for the first time since they don’t have much documentation online. Each time I had to install Hadoop, I was thinking that why always Hadoop is installed on Linu...