I am excited to finally build on my earlier blog post Unified Data Platform - SQL 2019.
In this series, I dig deep into how to achieve data virtualization through PolyBase®. This post covers the introduction and demo prerequisites, and Part Two gets to the demo itself.
To refresh your memory, recall that the previous post covered the following items:
With data being omnipresent, we continuously face challenges to move or copy it to another location to process it further. With a small data set, this is easy enough, but it can be a problem with ever-increasing data sizes. Also, with the progressive augmentation of data mining by organizations, data leaders do not advocate keeping data in one place. Likewise, it can be tedious to fetch or use data from different data stores of structured and unstructured data and Big Data.
Data virtualization is the solution to this problem.
Data virtualization is an approach to data management that enables an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at the source or physically located. It can provide a single customer view of the overall data.
There are a lot of Data Virtualization tools present in the market, such as the following tools:
For this series, I focus on PolyBase, which Microsoft introduced in SQL 2016 and has improved with each subsequent SQL version.
Polybase enables SQL Server to run Transact-SQL queries on external data sources like Azure® Blob, Hadoop®, Oracle®, MongoDB®, and so on. The same Transact-SQL used to process external data can also run on relational databases. This ability helps to integrate data from external sources with relational data in your database. The following image shows a simple illustration of SQL Polybase:
Now that you know the basics of Polybase, I want to share a demo that fetches data from an Azure blob external source by using SQL PolyBase. This post covers the prerequisites for the demo.
Before you can run the demo, you need to perform the following prerequisite tasks:
You can install Polybase with only one SQL instance on a machine.
Currently, I have one default SQL 2019 instance running on my local machine. However, I did not select Polybase during installation. The following image shows the SQL Server configuration manager:
I had to re-run the SQL setup and select the following elements during the feature selection window to install Polybase:
Go ahead and run the SQL Setup and perform the following steps to install the PolyBase feature. Keep clicking Next till you reach the very last screen. Then, click the Finish Installation tab, selecting the tabs highlighted in dark red.
At this point, you can see in the SQL configuration manager that we have two more features installed. However, you might still get the error message Polybase not installed in SSMS while trying to enable Polybase. To fix this, restart the server after installing Polybase.
To enable PolyBase, run the following steps:
Connect to SQL Server in SSMS and run the following query to confirm that Polybase installed successfully.
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsSuccessfullyInstalled;
The following image shows the output for a successful installation:
Enable Polybase by running the following queries:
EXEC sp_configure 'polybase enabled', 1; Go
Run the following query:
This query is important. Without this step, errors can appear during external file format creation in the steps covered in Part Three of this series.
Perform the following steps to create an Azure Storage account:
Log in to the Azure Portal by using your credentials.
Search for the Azure Storage account service and perform the following steps to create the storage account. Keep clicking Next till you reach the very last screen. Then, click the Review and Create option. Select the tabs highlighted in dark red.
On the Azure Portal search bar, select Azure Storage account and click + Add to create a new storage account.
Keep the default settings for the Networking, Data Protection, Advanced, and Tags screens.
Next, click Review+Create and, after the validation succeeds, click Create Tab to create the storage account, as shown in the following image:
To create an Azure container, go to the created Azure storage account, click on Containers in the left pane, and then click on +Container.
At this stage, create a text data file and upload it to the container.
Note: You can also use CSV, Excel®, or other external data sources. However, depending on the external data source, you need to take a few additional steps. For example, for CSV or Excel data sources, you should install proper drivers on the SQL Server and add the connection properties to an ODBC data source name (DSN). You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.
You have successfully completed the prerequisites for the PolyBase demo. Part Two presents the demo.
Use the Feedback tab to make any comments or ask questions. You can also start a conversation with us.