Posts

Sphinx Search engine and related technologies

Image
  Last month, I ended publishing my article series about the Sphinx search engine and related technologies on SQL Shack:   Getting started with Sphinx search engine In this article, we talked about the Sphinx search engine and how to install it on the Windows operating system. Building full-text indexes using the Sphinx search engine In this article, we talked about building full-text indexes using Sphinx . We covered the following topics: The Sphinx configuration file Defining data sources Defining full-text indexes Indexer and Searchd services options Database preparation Indexing Using Linked servers to access indexes Manticore search: a continuation of the Sphinx search engine In this article, we talked about Manticore Search, which is an open-source search engine first released in 2017 as a fork of the Sphinx search engine  

Hadoop components for SSIS

Image
This month I published a series on SQL Shack about the Hadoop components added to SSIS in SQL Server 2016 release. The series is composed of three articles as following: SSIS Hadoop Connection Manager and related tasks In this article, I gave a brief introduction to Hadoop and how it is integrated with SQL Server. Then, I illustrated how to connect to the Hadoop cluster on-premises using the SSIS Hadoop connection manager and the related tasks. Importing and Exporting data using SSIS Hadoop components In this article, I briefly explained the Avro and ORC Big Data file formats. Then, I talked about Hadoop data flow task components and how to use them to import and export data into the Hadoop cluster. Then I compared those Hadoop components with the Hadoop File System Task. Finally, I concluded my work. Connecting to Apache Hive and Apache Pig using SSIS Hadoop components In this article, I talked about Hadoop Hive and Hadoop Pig Tasks. I first gave a brief

Don’t install Hadoop on Windows!

Image
A few years ago, I was hearing from my colleagues “don’t ever think about installing Hadoop on Windows operating system!” . I was not convinced of this saying because I am a big fan of Microsoft products, especially Windows. In the past three years, I worked on three projects where I was asked to build a Hadoop cluster on Ubuntu. The first time, it was a single-node Hadoop installation with a single Apache Spark worker. The other projects were about building a Big Data ecosystem for radiation data engineering , where the multi-node Hadoop cluster is deployed. Besides Hadoop, we installed and configured Apache Kafka, Spark, Hive, Pig, and Flume installations (I have published some installation guides previously, you can check the links at the end of this article). It was hard to become familiar with those technologies for the first time since they don’t have much documentation online. Each time I had to install Hadoop, I was thinking that why always Hadoop is installed on Linu

Learn Biml for SSIS

Image
A few months ago, I started writing a new series of articles about Business Intelligence Markup Language (BIML) for the SQL Shack website. After finishing writing about using BIML in SQL Server Integration Services, I would like to share with you these articles: Getting Started With Biml In this article, we will give a brief overview of BIML by showing how it can be installed and how we can use it to generate SSIS packages using the BimlExpress tool. Finally, we will provide some useful resources to learn this technology. Converting SSIS packages to Biml scripts In this article, we will explain how to generate scripts from existing SSIS packages by showing all related options. Then, we will analyze the generated script and identify how each object is mentioned in it. Using Biml scripts to generate SSIS packages In this article, we will explain how to create SSIS packages using business intelligence markup language by providing a step-by-step guide where we will il

SSIS features face-to-face

Image
SQL Server Integration Services provides a wide variety of features that helps developers to build a robust Extract, Transform and Load process. After many years contributing to SSIS related tags on Stackoverflow.com, I can say that many developers have some misunderstanding about SSIS features (SSIS OLE DB Source, SSIS Expressions, SQL Server destination …) especially those which are very similar and have some common usability. Last year, I started writing SQL related articles for the SQL Shack blog. In this post, I would like to share with you a series I have published a few months ago called "SSIS features face-to-face" which is composed of 11 articles where I made a comparison between similar SSIS features. SSIS OLE DB Source: SQL Command vs Table or View In this article, we do a comparison between SSIS OLE DB Source data access modes:  “Table or View”  with Specific columns selected Vs. Using  “SQL Command”  to select specific columns. SSIS Expression

SSIS Truncate Microsoft Access table using Execute SQL Task

Image
You can use Execute SQL Task to query a Microsoft Access Database, but you cannot use a Truncate command since it is not supported, you have to use a  DELETE FROM ...  command. The following screenshots show an example of the OLE DB Connection manager, and the Execute SQL Task configuration: Original post:  https://stackoverflow.com/questions/58306930/ssis-access-database-need-to-empty-table-on-my-destination-access-database-tab/58348096#58348096

SSIS - automatically setting Flat File columns data types

Image
Problem: Have a large number of (TSV) files that need to be imported (periodically) (via SSIS package) into existing MSSQL DB tables. Getting many data type issues from the  OLE DB Destination  tasks eg: [Flat File Source [2]] Error: Data conversion failed. The data conversion for column "PRC_ID" returned status value 2 and status text "The value could not be converted because of a potential loss of data.". and the type suggestions from the connection managers for each table from  Flat File Source  tasks are not accurate enough to prevent errors when running the import package (and the DB types are the correct ones, so don't want to just make them all (wrongly) strings for the sake of loading the TSVs). Is there a way to load the type data for the columns in some single file rather than one by one in the connection manager window for the  Flat File Source  tasks (this would be hugely inconvenient as each table may have many fields)? I have the cre