A Glimpse into U-SQL
A Glimpse into U-SQL
BY Stephen Dillon | Schneider Electric| Global Solutions|Engineering Fellow | Data Architect
I recently had the opportunity to explore “U-SQL” which is Microsoft’s latest query language bundled with Azure Data Lake (ADL). It is part of the ADL Analytics service which is currently in Public Preview; so you must request access at the time of this writing. If you are unfamiliar with Azure Data Lake, it is Microsoft’s single repository to capture, store, and process Big Data in HDFS and expose it to analytics applications. It is comprised of four components including the ADL Store, the Analytics Service, HDInsight (Hadoop), and ADL tools for Visual Studio.
The “U” in U-SQL stands for “Unified”; which is aptly named whereas it is designed to execute parallel queries across distributed relational or unstructured data sources using the SQL syntax. If you are familiar with ANSI SQL or T-SQL, the syntax will look very familiar to you whereas Microsoft, as they describe, built U-SQL “…from the ground up as an evolution of the declarative SQL language…”. This provides SQL developers with a very small learning curve. There are very few nuances to writing U-SQL; such as all keywords must be capitalized, NULLs follow the C# functionality, and subqueries are not supported. Otherwise; all of the typical capabilities of SQL you are familiar with such as inner and outer joins are available to you. Let’s look at a quick example.
@t = EXTRACT sensorid string
, date string
, time string
, value string
FROM “/input/SensorData.csv”
USING Extractors.Csv();
@res = SELECT sensorID, COUNT(*) AS measuresCount
FROM @t
GROUP BY sensorID;
OUTPUT @res TO “/output/SensorDataAnalysis.csv”
ORDER BY measuresCount DESC
USING Outputters.Csv();
In brief, what the code sample above does is extract some fields from a csv file using the built-in extractor and assign the results to a variable. Then we may execute a traditional SELECT statement against the variable and perform operations such as an ORDER BY against those results and output them to another file. We are not limited to only working with files. We can create databases and tables within an Azure Data store as well that we can execute U-SQL against. But U-SQL is not just about the ability to execute SQL. It also natively supports integrated user code in the form of C#. There is even talk that this will be expanded to other languages but for now they support C#. This makes your code all the more powerful as you able to blend the two together.
There is only so much one has to say about U-SQL. What you really need to do is download the ADL tools for visual studio and get your hands dirty working with your data. To do this, request access to the Data Analytics preview in your Azure portal and then check out the easy to follow examples provided in the portal.
In conclusion; this appears to have some great promise especially for those who have invested many years into mastering SQL. It is one example of how SQL is being integrated into the Big Data and Analytics landscape and should be a quick win for many engineers.