Quantcast
Channel: Randy Riness @ SPSCC aggregator
Viewing all articles
Browse latest Browse all 3015

MSDN Blogs: PDW Best Practices: Linked Server: OpenQuery VS EXEC

$
0
0

Many users utilize Linked Server functionality in order to issue queries to PDW using SSMS.  This is completely supported and a practical way of processing.  However it is important to pay attention to how you are issuing the queries through PDW.  There are two popular methods, either using OPENQUERY or EXEC.   The latter is the preferred method and will result in much more efficient processing, lower resource utilization on the APS installation, and simplification in the DMV’s mentoring query execution. Open query is supported, however is mainly present for backwards compatibility.  THe behavior described below also occurs if issuing queries to a SQL instance.  For this reason SQL also discourges users form using this method and ar in favor of the EXEC execution method.

 

See below for a detailed write up of what happens when a query is issued through OPENQUERY and EXEC and what impact it can have on an APS appliance.

 

Issue

Large number of queries/Sessions issued to the appliance during a short time period.  Some of these include SP_PREPARE which the customer is not running.

 

Observations

  • During the peak times, the CPU on the CTL node had very high utilization.
  • There were a large number of queries running, many beyond the 32 concurrency limit
  • There were a large number of prepared statement/parametrized query executions
  • Also noticed explicit transactions and rollback

 

Findings

  • The queries are being executed using SQL server linked server openquery syntax, such as:

select * from openquery(cssc8a ,’ select top 1 * from dbo.col_test’)

When running this syntax internally, I had the following findings:

  • Two distinct sessions are being created to execute this query.

select * from sys.dm_pdw_exec_sessions

where session_id in (‘SID148612’ , ‘SID148613’)

order by login_time desc

 

session_idstatusrequest_idsecurity_idlogin_namelogin_timequery_countis_transactionalclient_idapp_namesql_spid
SID148613ClosedNULLNULLsa1/21/16 7:2960172.18.177.109:1090Microsoft SQL Server183
SID148612ClosedNULLNULLsa1/21/16 7:2990172.18.177.109:1089Microsoft SQL Server183

 

 

These two sessions are executing a total of 15 queries, even though only one was submitted.

 

I can see these two sessions are running the prepared statements as well as an explicit transaction.

 

select * from sys.dm_pdw_exec_requests

where session_id = ‘SID148612’

order by submit_time desc

 

select * from sys.dm_pdw_exec_requests

where session_id = ‘SID148613’

order by submit_time desc

 

request_idsession_idstatussubmit_timestart_timeend_compile_timeend_timetotal_elapsed_timelabelerror_iddatabase_idcommandresource_class
QID1912364SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:2916NULLNULL31SET NO_BROWSETABLE OFFNULL
QID1912362SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31exec [sp_unprepare] @P1NULL
QID1912363SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31exec [sp_unprepare] @P1NULL
QID1912359SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:2931NULLNULL31exec [sp_prepare] @P1 OUT, @P2, @P3, @P4NULL
QID1912360SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:2931NULLNULL31exec [sp_prepare] @P1 OUT, @P2, @P3, @P4NULL
QID1912361SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:2931NULLNULL31;select top 1 * from dbo.col_testNULL
QID1912358SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31SET NO_BROWSETABLE ONNULL
QID1912357SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31SELECT @@SPIDNULL
QID1912356SID148612Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31USE [tim_sandbox]NULL
request_idsession_idstatussubmit_timestart_timeend_compile_timeend_timetotal_elapsed_timelabelerror_iddatabase_idcommandresource_class
QID1912370SID148613Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:29125NULLNULL31rollbackNULL
QID1912369SID148613Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:2978NULLNULL31;select top 1 * from dbo.col_testsmallrc
QID1912368SID148613Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31begin tranNULL
QID1912367SID148613Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31SET XACT_ABORT OFFNULL
QID1912366SID148613Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:2916NULLNULL31SELECT @@SPIDNULL
QID1912365SID148613Completed1/21/16 7:291/21/16 7:291/21/16 7:291/21/16 7:290NULLNULL31USE [tim_sandbox]NULL

 

We believe this is causing unnecessary overhead.  We know (in AU4 or earlier releases) that during times of high concurrency and a large amount of sessions are connecting, the CPU utilization will be increased on the control node and at some point new connections will begin to timeout.   This is currently a limitation of the PDW Engine process.  The limit this occurs varies with workload.  With the above symptoms, we are effectively doubling the amount of new session requests which will cause this limit to be hit with less intentional sessions than anticipated.  We are also doing a large amount of unnecessary work.  The total duration for both sessions is 328ms.

 

As a comparison, I executed the same query using the following syntax:

 

exec (‘select top 1 * from dbo.col_test’) at cssc8a

 

This appears to create a single session which issues 3 queries.

 

session_idstatusrequest_idsecurity_idlogin_namelogin_timequery_countis_transactionalclient_idapp_namesql_spid
SID148614ClosedNULLNULLsa1/21/2016 7:36:38.3830172.18.177.109:1380Microsoft SQL Server114

 

These three queries seem much more reasonable:

 

request_idsession_idstatussubmit_timestart_timeend_compile_timeend_timetotal_elapsed_timelabelerror_iddatabase_idcommandresource_class
QID1912401SID148614Completed1/21/2016 7:36:381/21/2016 7:36:381/21/2016 7:36:381/21/2016 7:36:3831NULLNULL31select top 1 * from dbo.col_testsmallrc
QID1912400SID148614Completed1/21/2016 7:36:381/21/2016 7:36:381/21/2016 7:36:381/21/2016 7:36:3816NULLNULL31SELECT @@SPIDNULL
QID1912399SID148614Completed1/21/2016 7:36:381/21/2016 7:36:381/21/2016 7:36:381/21/2016 7:36:380NULLNULL31USE [tim_sandbox]NULL

 

 

 

Total execution time using openquery: 328ms

Total Execution time using exec:  47ms

 

 

Conclusion

 

Using exec in lieu of openquery is far more efficient.  We see a reduction in the number of sessions created, queries issued to PDW, and total elapsed time when using exec; about a 700% improvement with a single query.

 


Viewing all articles
Browse latest Browse all 3015

Trending Articles