Quantcast
Viewing all articles
Browse latest Browse all 3015

MSDN Blogs: Troubleshooting Azure Stream Analytics jobs with SELECT INTO

Azure Stream Analytics is a fully managed service to do real time processing of data with a flexible SQL-like language. You can easily construct queries to complete complex analysis. However, it also means sometimes the system can be hard to troubleshoot when it does not run as expected. Here is one trick that can help troubleshooting Azure Stream Analytics jobs.

The SELECT INTO statement

Sometimes knowing what the data looks like in the middle of the query can be very helpful. Since inputs or steps of an Azure Stream Analytics job can be read multiple times, we can write extra SELECT INTO statements to output intermediate data into storage and inspect the correctness of the data, just like “watch variables” when debugging a program. Let’s look at an example.

Example

Here we have a simple Azure Stream Analytics job. It has one stream input, two reference data inputs and an output to Azure Table Storage.

Image may be NSFW.
Clik here to view.
1

This query joins data from the event hub and two reference blobs to get the name and category information:

Image may be NSFW.
Clik here to view.
2

This job is running fine. However, no events are being produced in the output. From the Monitoring tile we can see that input is producing data, but we don’t know which step of the JOIN caused all the events to be dropped.

Image may be NSFW.
Clik here to view.
3

In this situation, we can add a few extra SELECT INTO statements to “log” the intermediate JOIN results as well as the data read from the input.

Let’s first add two new “temporary outputs”. They can be any sink you like. Here we use Azure Storage as an example:

Image may be NSFW.
Clik here to view.
4

Then let’s rewrite the query like this:

Image may be NSFW.
Clik here to view.
5

 

Now start the job again and let it run for a few minutes. Then we can query temp1 and temp2 with Visual Studio Cloud Explorer:

Image may be NSFW.
Clik here to view.
6

Image may be NSFW.
Clik here to view.
7

As we can see, temp1 and temp2 both have data, and the name column is populated correctly in temp2. However, there is still no data in output:

Image may be NSFW.
Clik here to view.
8

Now we are almost certain that the issue is with the 2nd JOIN. Let’s download the reference data from blob and take a look:

Image may be NSFW.
Clik here to view.
9

Aha! The format of GUID in this reference data is different from the format of the [from] column in temp2. That’s why our data wasn’t arriving in output1 as expected. Let’s fix the data format, upload to reference blob and try again:

Image may be NSFW.
Clik here to view.
10

And then we got the data in output with name and category nicely populated!

Image may be NSFW.
Clik here to view.
11

 

Conclusion

As you can see, this trick is very helpful in troubleshooting Azure Stream Analytics jobs. Beyond the scenario we showed in the example, it can also be used to troubleshoot issues when wrong data are produced by a step. Just use the “SELECT INTO” statement and you can troubleshoot your jobs like debugging a program!

 


Viewing all articles
Browse latest Browse all 3015

Trending Articles