Long-Running Query Alerts With sp_WhoIsActive

Who doesn’t love sp_WhoIsActive? Nearly every single DBA talk I have attended in the last two years mentions as a “must have!” I do the same during my DBA Newbie talk at SQL Saturdays.

About 18 months ago I came across this post by Tara Kizer and started logging activity every 30 seconds on most of our servers using sp_WhoIsActive. That simple job has helped us diagnose the root cause of server performance issues numerous times and I often query the results just to get a quick idea of queries that are consistent resource hogs. Our application is heavy on ad hoc reporting and this gives us a window into reports being run in ways we didn’t expect so that we can evaluate and tune the base queries better.

Our SaaS environment has hundreds of databases and thousands of users. Inevitably, one of two things usually causes us to scramble when server performance tanks.  One of those ad hoc reports runs amuck and consumes more resources then we have, or we have a partner use an API in a way we never intended, suddenly bringing the server to a halt. Each of these occurrences provides a learning opportunity to improve our application and processes and prevent repeat failures. But we’re not perfect… and I hate when a customer informs us of a problem before we know about it.

I’ve tried a number of suggestions for monitoring long-running queries, from the actual server performance alert to our commercial monitoring software. In many cases I just didn’t have the control over the alerts I wanted (or the alert fired inconsistently) and I kept thinking I could come up with a way to do this with sp_WhoIsActive that was scriptable across all of my servers and fairly low maintenance.

On my first attempt, I assumed I was missing the “right way” to use sp_WhoIsActive from a scheduled SPROC.  When run from within a SPROC, the context of the SPROC and sp_WhoIsActive are different. Therefore, you can’t use a local temp table or table variable to pass information back and forth. Instead, the only solution I found was to use a global temp table and that just “felt wrong”. In my search for solutions I found this post by Kendra Little on outputting the results of sp_WhoIsActive to a global temp table for further processing and that set my mind at ease a bit to choose a similar route.

Armed with all of the above, I’ve created the first version of my long-running query email alert. I’m sure there will be some updates and improvements in the coming months, but I wanted to share this in case someone else finds it useful as a starting example.

First Things First – Initial Setup

This solution expects a standard database on all of our servers to hold monitoring and utility information called DBA_stats, an idea taken from that post by Tara Kizer above.  This table holds both my regular sp_WhoIsActive output (what I run every 30 seconds) and a second table that holds a log of long-running queries.

That second table has a few additional columns that help me manage simple alerting, sending an email at a low threshold (5 minutes) and again at a high threshold (30 minutes) if the query is still running.

***NOTE: The following ASSUMES that you already have sp_WhoIsActive installed. If not, go do that first!  🙂

[sql]

USE master
GO

SET NOCOUNT ON;
DECLARE @retention int = 7,
@destination_table varchar(500) = ‘WhoIsActiveLogging’,
@longrunning_table VARCHAR(500) = ‘LongRunningQueries’,
@destination_database sysname = ‘DBA_stats’,
@schema varchar(max),
@SQL nvarchar(4000),
@createSQL NVARCHAR(500),
@alterSQL NVARCHAR(2000),
@parameters nvarchar(500),
@exists bit;

SET @destination_table = @destination_database + ‘.dbo.’ + @destination_table;
SET @longrunning_table = @destination_database + ‘.dbo.’ + @longrunning_table;

If(db_id(@destination_database) IS NULL)
BEGIN;
PRINT ‘Creating stats database: ‘ + @destination_database;
SET @createSQL = N’CREATE DATABASE ‘ + @destination_database + ‘;

ALTER DATABASE ‘ + @destination_database + ‘ SET RECOVERY SIMPLE;’;

EXEC(@createSQL);
END;

–create the logging table
IF OBJECT_ID(@destination_table) IS NULL
BEGIN;

PRINT ‘Creating periodic logging table: ‘ + @destination_table;

EXEC sp_WhoIsActive @get_transaction_info = 1, @get_outer_command = 1, @get_plans = 1, @format_output=0, @return_schema = 1, @schema = @schema OUTPUT;
SET @schema = REPLACE(@schema, ‘<table_name>’, @destination_table);
EXEC(@schema);
END;

–create the long-running query table
IF OBJECT_ID(@longrunning_table) IS NULL
BEGIN;

PRINT ‘Creating long-running queries table: ‘ + @longrunning_table;

EXEC sp_WhoIsActive @get_transaction_info = 1, @get_outer_command = 1, @get_plans = 1, @format_output=0, @return_schema = 1, @schema = @schema OUTPUT;
SET @schema = REPLACE(@schema, ‘<table_name>’, @longrunning_table);
EXEC(@schema);

SET @alterSQL = N’
ALTER TABLE ‘+ @longrunning_table + ‘ ADD
id INT IDENTITY CONSTRAINT PKC_ID PRIMARY KEY CLUSTERED,
email_sent BIT CONSTRAINT DF_email_sent DEFAULT 0,
email_time DATETIME NULL,
email2_sent BIT CONSTRAINT DF_email2_sent DEFAULT 0,
email2_time DATETIME NULL;

CREATE NONCLUSTERED INDEX IX_SessionID_LoginName_DatabaseName_StartTime ON ‘+ @longrunning_table +’ (session_id, login_name, database_name,start_time);
‘;

EXEC(@alterSQL);
END;

–create index on collection_time
SET @SQL = ‘USE ‘ + QUOTENAME(@destination_database) + ‘; IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(@destination_table) AND name = N”cx_collection_time”) SET @exists = 0′;
SET @parameters = N’@destination_table varchar(500), @exists bit OUTPUT’;
EXEC sp_executesql @SQL, @parameters, @destination_table = @destination_table, @exists = @exists OUTPUT;

IF @exists = 0
BEGIN;
SET @SQL = ‘CREATE CLUSTERED INDEX cx_collection_time ON ‘ + @destination_table + ‘(collection_time ASC)’;
EXEC (@SQL);
END;
GO

[/sql]

 Readability Matters – Pretty HTML Emails

Once we start getting data into the long-running table I want to send email alerts. Raw query output doesn’t look good in email, so we need something to turn this into an HTML table. I looked at number of simple solutions for this task and landed on this example from StackOverflow.com.  The code below has a few tweaks and some CSS styles to help it look more readable in email.

[sql]

USE DBA_Stats
GO

PRINT ‘Creating Stored Procedure QueryToHtmlTable’;
GO

IF OBJECT_ID(‘QueryToHtmlTable’,’P’) IS NOT NULL
DROP PROCEDURE QueryToHtmlTable
GO
— Description: Turns a query into a formatted HTML table. Useful for emails.
— Any ORDER BY clause needs to be passed in the separate ORDER BY parameter.
— =============================================
CREATE PROC QueryToHtmlTable
(
@query nvarchar(MAX), –A query to turn into HTML format. It should not include an ORDER BY clause.
@orderBy nvarchar(MAX) = NULL, –An optional ORDER BY clause. It should contain the words ‘ORDER BY’.
@html nvarchar(MAX) = NULL OUTPUT –The HTML output of the procedure.
)
AS
BEGIN
SET NOCOUNT ON;

IF @orderBy IS NULL BEGIN
SET @orderBy = ”
END

SET @orderBy = REPLACE(@orderBy, ””, ”””);

DECLARE @realQuery nvarchar(MAX) = ‘
DECLARE @headerRow nvarchar(MAX);
DECLARE @cols nvarchar(MAX);

SELECT * INTO #tableSQL FROM (‘ + @query + ‘) sub;

SELECT @cols = COALESCE(@cols + ”, ””””, ”, ””) + ”[” + name + ”] AS ””td”””
FROM tempdb.sys.columns
WHERE object_id = object_id(”tempdb..#tableSQL”)
ORDER BY column_id;

SET @cols = ”SET @html = CAST(( SELECT ” + @cols + ” FROM #tableSQL ‘ + @orderBy + ‘ FOR XML PATH(””tr””), ELEMENTS XSINIL) AS nvarchar(max))”

EXEC sys.sp_executesql @cols, N”@html nvarchar(MAX) OUTPUT”, @[email protected] OUTPUT

SELECT @headerRow = COALESCE(@headerRow + ””, ””) + ”” + name + ””
FROM tempdb.sys.columns
WHERE object_id = object_id(”tempdb..#tableSQL”)
ORDER BY column_id;

SET @headerRow = ”” + @headerRow + ””;

SET @html = ”<html>
<head>
<style type=”text/css”>
table {
color: #333;
font-family: Helvetica, Arial, sans-serif;
width: 2000px;
border-collapse:
collapse; border-spacing: 0;
}

td, th {
border: 1px solid transparent; /* No more visible border */
height: 30px;
}

th {
background: #DFDFDF; /* Darken header a bit */
font-weight: bold;
}

td {
background: #FAFAFA;
text-align: center;
}

/* Cells in even rows (2,4,6…) are one color */
tr:nth-child(even) td { background: #F1F1F1; }

/* Cells in odd rows (1,3,5…) are another (excludes header cells) */
tr:nth-child(odd) td { background: #FEFEFE; }
</style>
</head>
<body>
<table width=”2000″ border=”1″>” + @headerRow + @html + ”</table>
</body>
</html>”;
‘;

EXEC sys.sp_executesql @realQuery, N’@html nvarchar(MAX) OUTPUT’, @[email protected] OUTPUT;
END;
GO

[/sql]

Doing the Work – Finding Long-Running Queries

This solution runs sp_WhoIsActive every minute and saves the output into a global temp table. From there, I look for any processes that have been running for more than the low threshold setting. Any of the processes that have not been identified and stored previously get logged, output to an HTML table, and an email alert sent.

Next, I take a second look at the table for anything that’s been running longer than the high threshold.  If a second email alert has not been sent for these processes, we output the same data and send the email. If two alerts have already been sent for these processes, I don’t do anything else at the moment. One of the next updates to this script will send an alert to our DevOps notification system for anything running longer than some final threshold (or maybe just the high threshold).

[sql]

USE DBA_stats;
GO

IF OBJECT_ID(‘LongRunningQueriesAlert’,’P’) IS NOT NULL
DROP PROCEDURE LongRunningQueriesAlert
GO

PRINT ‘Creating Stored Procedure: LongRunningQueriesAlert’;
GO

CREATE PROCEDURE [dbo].[LongRunningQueriesAlert]
@email_Subject VARCHAR(255) = ‘Long-Running Queries on ‘,
@low_threshold_min VARCHAR(2) = ‘5’,
@high_threshold_min VARCHAR(2) = ’30’,
@dbmail_profile VARCHAR(128) = ‘DB Alerts’,
@email_recipients VARCHAR(500) = ‘[email protected]
AS
BEGIN

SET NOCOUNT ON;

DECLARE @server_name VARCHAR(255),
@schema NVARCHAR(4000),
@SQL NVARCHAR(4000),
@lrq_table VARCHAR(255),
@html NVARCHAR(MAX),
@low_threshold_subject VARCHAR(255),
@high_threshold_subject VARCHAR(255);

SET @server_name = @@SERVERNAME;
SET @email_Subject = @email_Subject + @server_name;

 

/*
Setting this to a global temp table so that it’s available to the select after it is
created and we insert data
*/
SET @lrq_table = QUOTENAME(‘##LongRunningQueries_’ + CAST(NEWID() as varchar(255)));

–create the logging temp table
IF OBJECT_ID(@lrq_table) IS NULL
BEGIN;
EXEC sp_WhoIsActive @get_transaction_info = 1,
@get_outer_command = 1,
@get_plans = 1,
@format_output=0, — Don’t format output so that it works in an email
@return_schema = 1,
@schema = @schema OUTPUT;
SET @schema = REPLACE(@schema, ‘<table_name>’, @lrq_table);

EXECUTE sp_executesql @schema;
END;

— Run WhoIsActive again and put results into the table
EXEC sp_WhoIsActive @get_transaction_info = 1,
@get_outer_command = 1,
@get_plans = 1,
@format_output=0,
@destination_table = @lrq_table,
@not_filter = ‘PITTPROCWIN01’, @not_filter_type = ‘host’;

/*
Insert any new long-running queries that haven’t existed before

The WHERE clause below is very specific at the moment and not very flexible.
Improvements to what we ignore and how we specify it are needed.
*/
SET @SQL = N’
INSERT INTO LongRunningQueries ([session_id], [sql_text], [sql_command], [login_name], [wait_info], [tran_log_writes], [CPU], [tempdb_allocations], [tempdb_current], [blocking_session_id], [reads], [writes], [physical_reads], [query_plan], [used_memory], [status], [tran_start_time], [open_tran_count], [percent_complete], [host_name], [database_name], [program_name], [start_time], [login_time], [request_id], [collection_time])
SELECT tempLRQ.*
from ‘ + @lrq_table + N’ tempLRQ
LEFT JOIN LongRunningQueries LRQ ON
LRQ.session_id = tempLRQ.session_id
AND LRQ.login_name = tempLRQ.login_name
AND LRQ.database_name = tempLRQ.database_name
AND LRQ.start_time = tempLRQ.start_time
WHERE LRQ.session_id IS NULL
AND tempLRQ.start_time < DATEADD(MINUTE,-‘ + @low_threshold_min + N’,GETDATE())
AND tempLRQ.database_name NOT in (”master”,”msdb”,”tempdb”,”DBA_Stats”)
AND tempLRQ.program_name NOT LIKE ”%Service Broker%”
AND tempLRQ.program_name <> ”SQBCoreService.exe”’;

EXEC sp_executesql @SQL;

/*
Now send the emails for any new long-running queries
*/
—- Using the new SPROC, format the output as HTML for the email,
EXEC QueryToHtmlTable @query = N’SELECT id, LRQ.session_id, LRQ.sql_text,LRQ.blocking_session_id, LRQ.reads, LRQ.writes, LRQ.status, LRQ.host_name, LRQ.database_name, LRQ.program_name,
LRQ.start_time FROM dbo.LongRunningQueries LRQ
WHERE LRQ.email_sent = 0′ ,
@orderBy = N” ,
@html = @html OUTPUT

IF(LEN(@html) > 1)
BEGIN
SET @low_threshold_subject = @email_Subject + ‘ – >’ + @low_threshold_min + ‘ minute(s)’;
—- Now send the email
EXEC msdb.dbo.sp_send_dbmail
@profile_name = @dbmail_profile,
@recipients = @email_recipients,
@subject = @low_threshold_subject,
@attach_query_result_as_file = 0,
@importance = ‘Normal’,
@body = @html,
@body_format=’html’

/*
Update the table to specify that all new queries have had a notification sent
*/
UPDATE dbo.LongRunningQueries SET email_sent = 1, email_time = GETDATE() WHERE email_sent = 0;
END

 

/*
Now get a list of queries that are still running after the second threshold time has elapsed.
Someone REALLY needs to get on these.
*/
CREATE TABLE #HighThresholdQueries (
id INT
);

SET @SQL = N’INSERT INTO #HighThresholdQueries ( id )
SELECT id FROM dbo.LongRunningQueries LRQ
INNER JOIN ‘ + @lrq_table + N’ tempLRQ ON
LRQ.session_id = tempLRQ.session_id
AND LRQ.login_name = tempLRQ.login_name
AND LRQ.database_name = tempLRQ.database_name
AND LRQ.start_time = tempLRQ.start_time
WHERE tempLRQ.start_time < DATEADD(MINUTE,-‘+ @high_threshold_min + N’,GETDATE())
AND lrq.email2_sent = 0′;

EXEC sp_executesql @SQL;

/*
Now send the emails for any long-running queries that have persisted
past the high threshold
*/
— Reset the variable for reuse
SET @html = ”;

—- Using the new SPROC, format the output as HTML for the email,
EXEC QueryToHtmlTable @query = N’SELECT id, LRQ.session_id, LRQ.sql_text,LRQ.blocking_session_id, LRQ.reads, LRQ.writes, LRQ.status, LRQ.host_name, LRQ.database_name, LRQ.program_name,
LRQ.start_time FROM dbo.LongRunningQueries LRQ
WHERE LRQ.id in (select id from #HighThresholdQueries)’ ,
@orderBy = N” ,
@html = @html OUTPUT

IF(LEN(@html) > 1)
BEGIN
SET @high_threshold_subject = @email_Subject + ‘ – >’ + @high_threshold_min + ‘ minute(s)’;
—- Now send the email second email
EXEC msdb.dbo.sp_send_dbmail
@profile_name = @dbmail_profile,
@recipients = @email_recipients,
@subject = @high_threshold_subject,
@attach_query_result_as_file = 0,
@importance = ‘High’,
@body = @html,
@body_format=’html’

/*
Update the table to track that a second email has been sent for a query that has
been running for an extended period of time
*/
UPDATE LongRunningQueries SET email2_sent = 1, email2_time = GETDATE() WHERE id in (select id from #HighThresholdQueries) AND email_sent = 1 AND email2_sent = 0;

END

/*
Drop Temporary Tables
*/
DROP TABLE #HighThresholdQueries;

SET @SQL = N’DROP TABLE ‘ + @lrq_table;
EXEC sp_executesql @SQL;

END;
GO

[/sql]

Ready, Set – Schedule The Job

With all three pieces in place, setup a scheduled job to call the LongRunningQueriesAlert SPROC on a regular basis. I do it every minute and haven’t seen an impact at this point. This allows me to be alerted to a problem within a minute after it crosses any of the thresholds. Adjust as you see fit.

One of the biggest initial benefits has actually been on our development boxes. I’ve been notified a number of times when various SQL is run in development that leaves open transactions unknowingly. That might seem mundane but it’s been very helpful as the team has been growing.

Rinse and Repeat

Having run this for a few weeks, I plan to make the following changes to my scripts given what I know about our environment.

  1. Adjust the threshold times to be a little longer, maybe as high as 10 minutes for the first alert. We have a few jobs that could run that long in big databases. As expected, these queries tend to cause an alert email almost every day.
  2. Modify the schedule to not include our maintenance window in most cases. The script will catch anything that persists from the maintenance period as soon as I start it up again in the early morning.
  3. Make the filtering more easily injectable. sp_WhoIsActive provides some of the filtering that I’m not taking advantage of, and some it will come from improvements in my queries.