Exchanging an OAuth Token in Power BI Custom Data Connectors

Custom Data Connectors can support multiple kinds of authentication. For most modern web apps, OAuth is the obvious choice and the Power Query SDK, coupled with the authentication examples (here and here), make it easy to setup.

Unfortunately, two apps that I have recently created Custom Data Connectors for were partway through an OAuth implementation. While the circumstances were unique to each application, I had to authenticate the user with the OAuth Identity Provider (IdP) and then exchange the access_token for one that the application could use. This isn’t an unusual OAuth “flow”, particularly when the application uses a custom payload in the access_token JWT for application functions.

Whether you agree with an exchange flow or not, I had to figure out how to make the Custom Data Connector work within these constraints.

To accomplish this, I had to interrupt the normal OAuth response flow and exchange that IdP provided access_token for one that the application would accept as through Bearer authorization. I also needed to ensure that the refresh_token flow would continue to work when Power BI encountered a HTTP 401 (unauthorized) response from and API, causing it to perform a refresh grant to retrieve a new access_token and keep data flowing without the user being interrupted.

It just wasn’t clear to me exactly where to do all of this work.

In other OAuth frameworks or SDK’s, it’s usually pretty easy to see where you’d drop into the flow and make the exchange before moving ahead. Given the Github authentication example, my mind wasn’t thinking functionally at the time (it’s still a challenge for me!). It was unclear that the response from the FinishLogin function was being consumed and stored by Power BI directly. Once I made that connection it was easy to accomplish what I needed.

The Basic OAuth Setup

I started with the examples from the articles I’ve linked to above, enabling OAuth and setting up each of the functions.

...    
Authentication = [
        OAuth = [
            StartLogin=StartLogin,
            FinishLogin=FinishLogin,
            Refresh=Refresh,
            Logout=Logout
        ]
...

Inside of the FinishLogin function there is a call to the TokenMethod which takes the code grant and exchanges it with the IdP for an access_token.

TokenMethod = (code) =>
    let
        Response = Web.Contents("https://auth.example.com/oauth2/token", [
            Content = Text.ToBinary(Uri.BuildQueryString([
                client_id = client_id,
                client_secret = client_secret,
                code = code,
                redirect_uri = redirect_uri])),
            Headers=[#"Content-type" = "application/x-www-form-urlencoded",
                                 #"Accept" = "application/json"]]),
        LoginResult = Json.Document(Response)
    in
        LoginResult ;

It took me a while to realize that the Json.Document(Response) was simply parsing the OAuth payload (id_token, access_token, and refresh_token) into a Power Query record that the connector/runtime was storing and using later. Therefore, I had to do my exchange somewhere in this function before returning the Record that Power BI needed.

Exchanging the Access Token

In order to exchange the trusted access_token, another function was needed to call the actual Exchange endpoint of each application. YMMV, but here’s what a basic function might look like.

TokenExchange = (id_token as text) =>
    let
         NewToken = Web.Contents("https://auth.example.com/api/exchangeToken",[
            Content = Text.ToBinary("{id_token: '" & id_token & "'}"),
            Headers = [#"Content-Type"="application/json"]]),
         LoginResult = Json.Document(NewToken),
         AccessToken = LoginResult[token]
    in
        AccessToken

With that in place, I could modify the TokenMethod to do the exchange mid-stream for me like so.

TokenMethod = (code) =>
    let
        Response = Web.Contents("https://Github.com/login/oauth/access_token", [
            Content = Text.ToBinary(Uri.BuildQueryString([
                client_id = client_id,
                client_secret = client_secret,
                code = code,
                redirect_uri = redirect_uri])),
            Headers=[#"Content-type" = "application/x-www-form-urlencoded",
                                 #"Accept" = "application/json"]]),
        body = Json.Document(Response),
        result = if (Record.HasFields(body, {"error", "error_description"})) then 
                    error Error.Record(body[error], body[error_description], body)
                 else
                    // This is a code exchange, we expect a refresh_token
                    if (Record.HasFields(body, {"refresh_token"})) then 
                    [
                        refresh_token=body[refresh_token],
                        access_token = ExchangeToken(body[id_token])
                    ]
                    // This token was obtained by a Refresh request. No refresh_token             
                  else
                    [
                        access_token = ExchangeToken(body[id_token])
                    ]
    in
        result

The key was understanding that the result of the TokenMethod, which is what the FinishLogin method will ultimately respond with, needs to provide a Record with all of the fields necessary to continue the OAuth flow. In this case, simply replacing the access_token and refresh_token allowed Power BI to store this for later use.

Whenever the token expires, producing a HTTP 401 (Unauthorized) response, the RefreshToken logic kicks in, goes back through this same logic, and data keeps flowing.

Conclusion

Learning how to use Power Query to get data from REST APIs has been a fun (if not frustrating at times) learning experience. The documentation continues to improve and the Power Query team and community are very helpful. Each new challenge helps me better understand the building blocks they have provided and how to get to the next milestone.

Now, if I could just have something like a “finally” query that gets executed after an import to create the relationships I need and additional linking tables if needed. Wow, I’d be in heaven! 😉

Organizing Functions in a Power BI Custom Data Connector

The more complex a Power BI Custom Data Connector gets, the more cumbersome the main Power Query file becomes. Without the ability to make a true “project” structure and include or directly reference functions from other files at this point, I wanted to find a way to keep things a little more organized.

In the series of tutorials that the Custom Data Connector team put together, there’s an example of how to include external functions at the end of Tutorial 7. Although it’s not mentioned in this short explanation, I quickly realized that a Power Query project references all files by inclusion in the final build, not the physical path structure of the project or through namespaces. As long as the file is set to be compiled into the final solution, you can reference it by name regardless of where it is actually placed in the file structure of the project itself.

Therefore, to keep my Custom Data Connector projects a little more visually organized (and to help my future self remember how functions relate), I’ve gotten into the habit of organizing my projects using folders that group common functions together.

I’m still tweaking what I call my “standard” set of folders, but between projects the concepts are the same.

The Root

I try to only keep my Resource and Power Query files in the root of the project, making it easy to find as the number of functions grows.

Icons

Next, I always move the icons into a separate folder. In the generic Power Query project template, you end up with 8 icon files taking up space and named after your main PQ file. Clutter.

Old Snippets

This folder isn’t always named consistently, but I keep a folder for functions and snippets of code I don’t need anymore, usually because I found a better way to do something. I’ve found that it can be helpful for me to refer to old examples of how the project evolved in a way that looking at the commit history can’t solve. When snippets are in a file, it’s easy to include it as a function and see it work again.

Helper/Core

I’m still trying to figure out what I really want this folder to be called, but it contains the meat of my application function logic. Inside of this folder I make sub-folders that group common functions together.

  • Table – These are usually the base helper functions provided by the team at Microsoft with some custom modifications.
  • Data Transformations – The heart of a data connector is about transforming data into something usable for… the user. These are separate functions that generally translate to each Query in Power BI.
  • Authentication – Most data sources require some sort of Authentication. Most of these functions are based on authentication examples from the Microsoft team. I’ll tweak them as needed for the specific project.

Making it work

Whatever your specific folder structure becomes, how does this work out in practice?

Simple! Any file that you want to reference in connector code has to be set as Compile in the Build Actions. Once that is set, the file can be referenced by name only in a PQ function. If you ever forget to set a file to be included in the compiled Data Connector, you’ll quickly get a warning.

For files like the Icons, you don’t need to do anything special after setting the build action. As long as the name stays the same, the references in you Connector.Icon method will just keep working.

However, if you want to load one of the other functions, you’ll need to use the temporary workaround as described in Tutorial 7.

// 
// Load common library functions
// 
// TEMPORARY WORKAROUND until we're able to reference other M modules
Extension.LoadFunction = (name as text) =>
    let
        binary = Extension.Contents(name),
        asText = Text.FromBinary(binary)
    in
        Expression.Evaluate(asText, #shared);

With that function available, you can now load other functions with one or two lines of code. If the method requires no inputs, just initiate it to a function variable.

Table.ChangeType = Extension.LoadFunction("Table.ChangeType.pqm");

In some cases, however, your function might need access to another external function or parameter which has to be passed by reference because the scope has changed. In that case, you need to load the function first and then make it available through a second function variable that accepts the external references and any additional parameters.

TokenMethodFunction = Extension.LoadFunction("Auth.TokenMethod.pqm");
TokenMethod = (grantType, tokenField, code) => TokenMethodFunction(grantType,tokenField,code,OAuthConfig,ExchangeToken);

With this workflow, my overall Power Query file is much less cluttered and it helps bring more meaning to my source control commits when functions are broken down into smaller chunks.

The One Downside

I’d be remiss if didn’t mention the one, hopefully obvious, issue with moving files outside of the main Power Query file into .PQM files.

The lack of syntax highlighting and intellisense.

It is annoying. I’ve learned to deal with it for small changes. If I need to work through a problem, I’ll copy the function over until it’s been modified and is working, and then wire it back up through the “LoadFunction”. In reality, once a function is working, it’s rare that I’ll actively modify it.

Programming SQL Server Triggers and Functions: My First Pluralsight Course

I’m thrilled to share that I’ve completed my first Pluralsight course, Programming SQL Server Triggers and Functions, and it has finally been published for subscribers to enjoy! I began the journey of authoring my first course with Pluralsight September 2018 and officially started production of this course in February of this year. It was certainly a challenge to create a course of 3+ hours while changing jobs, having our sixth child and speaking at a few conferences on other topics, but the overall experience was more rewarding than I expected and has only increased my desire to become a better trainer.

Why a Course on Triggers and Functions?

Seriously, right? Seems like an bit of an “oldie but a goodie” place to start my Pluralsight training career… eh?

I’ve grown up teaching and training in one capacity or another most of my life. My undergrad degree is in education and after teaching for a few years, I was blessed to follow my other passion and develop a great career in software and data. In many regards I often feel like I’m at the bottom of the class on most topics, struggling to keep up. But I also realized that no matter how much I have to learn about a subject, there’s someone else that needs to understand this same information for something they’re working on and I might be the one that can help them get to the next step.

After I auditioned for Pluralsight and was accepted as an author, I came to the first meeting with a handful of topics focused on the Microsoft Data Platform that I wanted to share with the world. SQL Server Database Projects, Power BI Custom Data Connectors, Service Broker.

But I was offered a bit of a “fast-track” to my first course if I chose a course from the list dealing with specific Learning Paths, one of which was SQL Server. Of the 10-ish SQL Server topics on the list, only a few were still available to claim and… well… I happen to be a fan of both Triggers and Functions when they’re right for the job. Pluralsight had already created a basic outline of what they wanted the course to cover and how long they thought it might be. I took 6 weeks to fill in the blanks (remember, I was growing my family and switching jobs) and finally began production in February.

What Does the Course Cover?

I’m glad you asked! I ended up covering a lot more ground than I thought I would. Honestly, if the outline hadn’t been provided for me as a starting point, I might have even recommended the course be broken into two different courses. There are just so many topics to cover when dealing with these two features that, in all honesty, it could have been twice as long. (I’m guessing that most authors feel the same way after recording a course like this)

1. DML Triggers

Of the 3+ hours, the first module on DML Triggers is the longest at 54 minutes. I’ve found over the years that there are a lot of misconceptions on the basics of how Triggers work, including some confusion between SQL Server and other databases like Postgres, that it felt like the detail and coverage of these specific topics was necessary. I discuss and demo all of the major parts of using DML Triggers including AFTER and INSTEAD OF, the INSERTED and DELETED virtual tables, and Execution Order. While the concept of how a DML Trigger works regardless of the event that triggered it are easily transferable (INSERT, UPDATE, or DELETE), I felt it was worth trying to provide at least one example of how and why someone might use each of them. I even talk about using INSTEAD OF Triggers on VIEWS, something many SQL developers just don’t understand.

2. DDL and LOGON Triggers

In this second module I focus on the anatomy of a DDL/LOGON Trigger, what makes them different from DML triggers, and when they tend to be useful. Most of the discussion and examples deal with protecting data and schema using these tools, both through logging and restricting access. Some technology might be slowly diminishing the need for DDL/LOGON triggers, but I still think there are valid use cases to consider.

3. Working Smarter with Triggers

Let’s be honest, it’s the year 2019 and many SQL Server professionals would caution you against using Triggers at all. Even Joe Celko, one of the SQL standard authors, thinks you should generally try to avoid them. I’m very aware of the strong opinions against using triggersand have often seen when they are overused and unmaintained… leading to countless headaches.

But as a data professional, they can be a great tool to solve specific problems when used well! In this module I cover Execution Context and security, how MERGE interacts with Triggers, logging information outside of the transaction, avoiding work when possible, and even a small shout-out to Service Broker and the value of using asynchronous triggers.

4. Reusing Code with Functions

This module is simply an intro to Functions within SQL Server and what the differences are between Multi-statement and Single-statement functions to prepare for the next two modules. Again, differences in other database technologies and even misconceptions within the SQL Server community, can make a detailed discussion on Functions difficult without some initial groundwork.

5. Multi-Statement Scalar and Table-Valued Functions

Functions are a good, necessary tool for producing consistent and reusable code. For nearly two decades in SQL Server, most developers have written and used these types of functions and yet many don’t fully understand the impact it could be having on their overall application performance.

Therefore, in this module I talk about how to write both Scalar and Table-Valued functions and how they can be useful. However, I also show why they can be problematic, what to look for and some recent enhancements, starting in SQL Server 2017, that can improve the performance of your current (and future) Multi-statement functions.

6. Single-Statement (Inline) Table-Valued Functions

The course ends with a module that focuses on Single-Statement Table-Valued functions, often referred to as Inline Table-Valued Functions. I discuss how they differ from Multi-Statement functions, how to create them, and then I discuss and demonstrate how to convert a Multi-Statement function into an Inline Table-Valued Function and what the advantages can be.

Am I a Trigger and Function Expert?

No! I’m not going to lie to you. There are a ton of other people in the SQL Server community that are more qualified than I am to teach a course like this. I happen to have dealt with both Triggers and Functions a lot over the last 15 years and I think I have a good foundation to teach from. I was surprised more than once at how much I didn’t know at the beginning of creating this course. Without a doubt, this certainly proved to be a reminder that “the best way to learn something is to teach it”.

I do think, however, that this course will provide a solid foundation and understanding about Triggers and Functions within SQL Server for most beginner/intermediate SQL Server developers… allowing you to do better work.

Will I Be Creating More Courses?

I hope so. I don’t have the best setup in our current house for producing courses, which sometimes required my family to take day-trips so that I could record with guaranteed quiet, and even then I often had to post-process the audio side of the course. I’ve also just taken on a leadership role in my new company which has shifted my focus a bit for now.

But as I said at the beginning of the article, I love teaching and training, and the staff at Pluralsight are top-notch and extremely helpful. I’d be silly not to try and do it again if I can find the right topic.

Because of the change in my current role, the next courses will likely shift away from SQL Server in particular to PostgreSQL, Power BI and maybe even some tooling topics like Azure Data Studio. I’ve committed to my wife that I won’t do a course unless it’s around a topic I already speak about or would be preparing to speak on. Having that material ready at the beginning significantly cuts down the time needed to create demos and slides.

I hope you have an opportunity to take the course or pass it on to others that are starting their SQL Server journey.

It’s a Wonderful Developer’s Life

I had the pleasure of presenting two sessions this past January at CodeMash 2019. This was my fifth year attending the conference and I can’t say enough good things about the team and presenters that give so generously each year. I’ve learned a lot just by watching them!

Four years ago, as I was considering my “mid-career” life, I attended the Lightening Talks that are presented midway through the conference. This is an “open mic” event where any attendee (and even some attendee family members in the past) can present on a topic of their choice in 6 minutes or less.

The talks that year were funny, heartfelt, inspiring, and encouraging. Between those talks and some other encounters at the conference, I started to pursue opportunities to teach and present. Although I’m not able to do it as often as I’d like (family and my day job come first!), it is always a blessing to take something I’ve learned and encourage others to dig deeper.

Fast forward a couple of years, having just struggled through the choice of leaving my long-time employer, and having our sixth child… and I was in the right place to be challenged and inspired in my professional life by some of our favorite Christmas movies.

So much so, that I thought it would make a perfect Lightening Talk. On a personal level, this is one of the most satisfying talks I’ve presented in a while, even at less than 6 minutes.

Slides can be found in my Github repo

I hope you enjoy it.

Learning to give back, because someone in your sphere needs your help!

PostgreSQL for the SQL Server DBA: Tooling, Part 2 – It Gets Better

This post is part of an impromptu series about PostgreSQL and things I am learning coming from a SQL Server background. A complete list of posts can be found on this post.

WOW! I honestly didn’t expect to get quite so much notice and feedback on the first iteration of this post. The power of social media at work. 😉

But I’m so thankful that I did! It allowed me to quickly see that the support for PostgreSQL is stronger than ever, there are many opinions on tooling, and I have a ton to learn. That’s a challenge I’m up for!

I particularly appreciated the post that Rob Conery created in response, gently reminding me that when something starts Open Source and Linux, there’s often a plethora of command line tools ready to help. Much like Git (which is always used as the seminal ‘command line is more powerful’ tool example), something like ‘psql’ is really good at helping me get around to the things I often need to know quickly. I need to remember to consider the roots as I try to grow my understanding deeper.

To that end, I’m all for using command line tools when I can and it makes sense, particularly in database administration. But when I’m knee deep in a new feature and trying to help a young team, I’m still on the lookout for a tool we can use together to better develop and manage the PostgreSQL database day-to-day.

And so, I thought a quick update would be valuable based on the feedback I received here and through Twitter. As always, I’m learning alongside everyone else, trying to be as productive as I can for my employer, my teammates, and the larger community. Hopefully this is helpful part of that larger conversation.

Tooling – Part 2

As you’ll see in the previous post and comments, there are tooling options aplenty – something most of us have come to expect from an open source environment. Here’s where things are at two weeks later for me.

Azure Data Studio

What?? Azure… Microsoft… PostgreSQL?

YES!!

To be honest, most of this post was already written and focused on DBeaver before I got wind of the pending update to Azure Data Studio. I’ll have a second blog post in the next few days about my experience so far. That said…

As you might have noticed at the end of the previous post I mentioned that Shay Rojansky was hired by Microsoft recently. I didn’t expect that he’d start working on tooling specifically, but it did signal to me that they were putting more investment and energy towards PostgreSQL.

Microsoft announced today as part of the MVP Summit the beta release of a PostgreSQL extension for Azure Data Studio. I was able to get a sneak peak a few weeks ago and I’m excited about the potential. It’s clear that the ultimate goal is to provide access to the same great features they are building into this cross-platform data management application. I’ll have a separate post soon about some of my first impressions, but two highlights that I think hold the most unique opportunity for ADS to become a great tooling option for PostgreSQL.

  1. Community Extensions: Much like Visual Studio Code, Azure Data Studio is open source and is building out a platform for extensions built by the community. Some initial quick wins could be extensions for better monitoring and tuning… and a graphical query plan visualizer!! (hint, hint…)
  2. SQL Notebooks: I got to see the first examples of these at PASS Summer last year when Vicki Harp and the team presented and I think it’s an AWESOME idea. Having dabbled in and around data science for a few years now, the power of notebooks to store thought process while also displaying results in-line is so much fun. For the first time, you’ll be able to do this with SQL queries in SQL Server or PostgreSQL. YAY!

DBeaver

https://twitter.com/ryanbooz/status/1100612407934832640?ref_src=twsrc%5Etfw
Yes. I had the exact same thought!

After I got past the name, it turns out that DBeaver Community Editionis now my daily driver. I have no idea what the history is (time has been at a premium lately), but the overall layout and functionality comes close to my SSMS “expectations”. I’m not saying SSMS is the standard to reach for, simply that it helps me be more productive quickly because it’s more intuitive… for me.

The navigation layout follows a predictable pattern of feature separation, query editing just feels more intuitive (which is subjective for sure), and most important to me, DBeaver produces accurate DDL when I inspect database objects. As a developer and administrator ramping up on a new project, this kind of detail is especially important for me. Otherwise, I’m left wondering if my latest modifications were applied correctly.

As with most other PostgreSQL tools, DBeaver is cross-platform and able to connect to multiple SQL and NoSQL databases. I can’t currently speak to its performance on other databases, although I will be trying it with some of these platforms in the near future.

DBeaver is built on the Eclipse ecosystem, whatever your feelings on that are. One advantage is that the plugin architecture allows for some great free plugins, like export to Excel. Scoff as you might, it makes sending data to Product Owners, managers and users just a little be easier when non-standard data is involved.

And finally, DBeaver is one of the few options aside from PgAdmin that has a free, open source Community Edition that is very capable, while providing a reasonably priced Enterprise Edition for additional features and support. Oh, I almost forgot! It can be installed with Chocolaty too!

A Few Highlights

Most of the other tools I’ve looked at have variations on these features, so I’m not trying to claim that they are unique to DBeaver at all. But again, for whatever reason, finding and using them is just more intuitive for me (at this point).

Intuitive Server/DB Navigation

Easy Navigation

It might be a small thing, but being able to navigate objects in the database and refer to them consistently with others on the team is immensely helpful. When I talk about Views, Indexes or Functions, there’s no guessing. Again, remember that I’m often thinking about how I can best help others learn to use the DB effectively and having an easy, common way to get about makes this easy.

Also notice a really small thing within each database in DBeaver, access to two simple queries for Sessions and Locks. Again, coming from the SQL Server world, there are so many community contributors teaching and preaching the benefits of knowing some of these essential pieces of day-to-day work. Other tools certainly have something like this I’m sure, but this gave me a quick way to star to explore performance problems on development databases and elsewhere.

DDL/Data Access

I know it seems like I harp on DDL a ton, but one of my main objectives in this new job is to help bring the database into order. If I can’t easily see the data, the schema definition, and know the state of various objects, that is generally a non-starter for me

Easy access to all schema objects

Double-clicking on any object (table, view, triggers, etc.) will open up the edit window. As you can see, when it’s a table, properties and data preview are close at hand, and all parts of the table are visible through the tabs on the left.

Honestly, this seems like a good option for something like SSMS to have, rather than dealing with each of these objects separately… but I digress.

My one gripe at this point is adding columns to a table through the UI. I haven’t found a quick method to do it graphically. Generally I resort to straight SQL for these operations, but when I was creating a prototype this week, I was jumping back and forth wishing I could just quickly tab through some options. It’s minor.

Result Table Filtering

Inline results grid filtering

To be honest the first time I saw this was with DataGrip. I didn’t think it would be useful at all and mostly just assumed it was added as a checkbox to increase the feature count. But, I’m surprised how often I’m in a database and a co-worker asks if something exists and my first reaction is to double-click the table (display data) and then start typing a column and value to filter. It’s not my main method (I can still crank out “SELECT x FROM table WHERE…” as fast as the next guy…), but it is a useful little time saver now and again.

SQL Formatting and Intellisense

Another small detail that doesn’t work as well in other tools I’ve tried. And, it even bests the default SSMS capabilities (although, Red Gate’s SQL Prompt is a great tool that easily bests what’s built into DBeaver!)

When I’m typing, speed is generally more important to me, which means that I don’t use all CAPS for keywords in the SQL language. Yes, it doesn’t matter. But it does look better when things get complicated and I’m trying to show or teach someone about SQL. DBeaver is the first tool out of the box that allows me to set CAPS for auto-complete on intellisense and it generally just works. In other IDE’s, it will only go CAPS if I start the word with a capital letter. Arg.

When I’m typing, speed is generally more important to me, which means that I don’t use all CAPS for keywords in the SQL language. Yes, it doesn’t matter. But it does look better when things get complicated and I’m trying to show or teach someone about SQL. DBeaver is the first tool out of the box that allows me to set CAPS for auto-complete on intellisense and it generally just works. In other IDE’s, it will only go CAPS if I start the word with a capital letter. Arg.

The Others

A few other suggestions made it into the comments of the previous post(and hopefully this one too!). The short list includes:

Feel free to check them out and blog about them if you feel strongly about it. I’m sure someone in the community will be thankful you did!

At this point I don’t plan to look for others. Between DBeaver and Azure Data Studio, I can work more effectively with my team and provide better, faster service to my internal customers with PostgreSQL. I’m sure I’ll take a few minutes from time to time, seeing if any new contenders are making waves.

Stay tuned for more information and feedback about Azure Data Studio.

PostgreSQL for a SQL Server DBA: The Tooling Stinks

This post is part of an impromptu series about PostgreSQL and things I am learning coming from a SQL Server background. A complete list of posts can be found on this post.

UPDATE: In response to this article and much of the feedback, I’ve created a second post on two other tools. Check that out for more context and options… and keep watching for more information on PostgreSQL tools.

So here’s the deal.

Just. Stop. Looking.

No, really, stop looking for more user friendly tools that provide even 25% of the features as SQL Server Management Studio, Azure Data Studio or even Visual Studio (for database projects). They don’t exist.

Obviously this is my current opinion. (YMMV)

Why Bad Tooling is a Problem

As I’ve mentioned at various times, I come from a teaching background and so I tend to be hyper-aware of things that unnecessarily interfere with learning something. When it comes to technology, that can be a lack of documentation, bad tooling, or good tooling that’s simply too hard to figure out. And yes, SSMS could fall into the later category at times.

In the case of PostgreSQL, I’ve quickly come to the conclusion that bad tooling is one of the main reasons the uptake is so much more difficult and convoluted coming from the SQL Server community. Even the devs I’m currently working with that have no specific affinity for databases at all recognize that PostgreSQL just feels like more of a black box then the limited experience they had previously with SQL Server.

/* Begin Brief Soapbox*/
Honestly, this is by far one of my biggest grips about Open Source software now that I’m older, busier, and don’t want to spin my wheels trying to make something simple work. When the tools make it hard to dig in and work effectively with the database, most developers and shops will default to code-first/ORM only development. In nearly 20 years of software development and leading multiple teams, I’m still surprised how little most developers really care about effectively using a database of any kind. During most interviews only about 30% of applicants can ever answer a few basic SQL questions. And now I think I’m starting to understand why. Most of them have been relegated to an Open Source world with Open Source tooling when it comes to SQL. Yes, it’s cheap and allows projects to spin up quickly, but once those students get past their little pizza ordering app from CompSci 402, they’ll be lost in the real world.
/* End Brief Soapbox */

Quick Aside: Lest anyone misunderstand me, my grip is not with Open Source software at all. I LOVE OPEN SOURCE SOFTWARE! Really! I’ve been using Linux since 1998, built my first kernel in 1999 on a bus headed to North Dakota (long story) and I literally wouldn’t be where I am without OSS. This conversation is specifically about tooling for SQL platforms in the OSS world.

The current contenders

All of the griping and soap-boxing aside, I really do want to provide a short list for others to consider that might be in a similar position. Below is a short list of tools I’ve tried or (eventually) purchased to make using and managing PostgreSQL easier and more straightforward. As with most OSS projects, there are certainly many more than this. But based on feedback from others and doing my own testing, this where things currently sit for me.

PgAdmin

PgAdmin is the web-based GUI for PostgreSQL and certainly the most often used tool. To “run” PgAdmin the application starts a local web server process and then triggers open a browser. From an enterprise perspective, it’s honestly pretty laughable to ask developers to do real day-to-day work managing a database through a web application. Sorry, I’m not going to sugar coat it. Too often I feel like it’s one refresh away from loosing my work and I’m reminded every time I use PgAdmin of an incident 15 years ago whenI dropped tables in a client database using phpMyAdmin because of a refresh bug which reset the context in the database. And all of these years later, I lost work three days into the new job because something caused things to refresh and everything was reset.

To be fair, PgAdmin is a pretty remarkable undertaking and has some useful features. Three quick highlights of basic things PgAdmin provides which I’d expect from any tool include:

  • Easy hierarchical navigation that actually shows you all of the major database object types in context (see all triggers/indexes/columns in context of the table).
  • Quickly displaying the DDL of objects. For instance, it’s the only tool that can easily show me the DDL of a Trigger after it is created, including the predicate.
  • A visual query planner of some sort. Yes, I know about EXPLAIN ANALYZE, but honestly, I want to focus on making queries better, not remembering how to decipher the text plan each time.

So yes, there is a surprisingly good set of features given the delivery method, but it’s just not enterprise ready by any stretch. One potentially bright spot is that a co-worker recently mentioned that there was rumor of some investment capital in the PgAdmin project in some way. Maybe there’s hope of something better? Maybe Microsoft will figure out how to allow Azure Data Studio to become part of the mix?

DataGrip

At this point, the best option I’ve found for Windows is JetBrains DataGrip. They happened to be at PASS Summit last year and a technical sales lead stated that their goal is to replace SSMS as the database management tool of choice. (I’m not even paraphrasing!). While that’s a noble goal, they have a long way to go… but it’s still much better than PgAdmin for most tasks.

A couple of “quick win” useful features include:

  • There are some handy features in DataGrip for viewing and filtering data with an easy double-click of the table.
  • Multiple-tab support is fairly good, however, seeing the settings for the connection on our current tab isn’t as obvious (again making me fear I’ll run something in the wrong database)
  • Managing backups is easier, overall, than PgAdmin or the command-line

One thing that I think is holding me back from using DataGrip more effectively is the lack of good training or documentation. I remember sitting at the JetBrains booth at PASS Summit thinking “those are some interesting features…”, but I can only remember one of them at this point. The only training video I found is a few years old and is essentially a 10-minute technical sales demo. If there were more of those, maybe I’d have a different opinion.

DataGrip is cross-platform, although I haven’t tried the Mac version yet. If you’re on a Mac, Brent Ozar recommended Postico when I asked him what tools his team uses to manage and work with their Aurora instance.

Navicat for PostgreSQL

I also gave Navicat a try for their 14-day trial. In the end, there wasn’t enough value to justify spending another $300 after I had already purchased DataGrip.

One thing I will say is that it is much faster and (seemingly) lightweight in a good way. DataGrip seems to run up against a JVM memory issue often whenever I try to do larger tasks (like run a backup). I didn’t experience any of that slowness when testing Navicat. Unfortunately, there wasn’t enough overwhelming wins to make that one issue a deal-breaker.

Why Hide the Schema?

One particularly annoying thing about both DataGrip and Navicat is that neither of them show all of the major schema objects within the navigation tree… and each shows a different set of objects.

DataGrip shows most of the major schema object within the tree under each table as I’m used to. Unfortunately, interacting with the objects for editing or inspection never works out quite how you’d expect and it doesn’t consistently show the true DDL of the objects.

Navicat? Every object has its own section. What indexes do I have on table ABC? Yeah, you have to go to the “index” module and hope you named your indexes well because they aren’t used in context of the tree.

In total, this just feels like it strengthens my opening monologue about the usability and accessibility a tool provides

A brighter future?

My one hope for improving the tooling is that Microsoft just hired the maintainer/owner of the .Net PostgreSQL driver, Shay Rojansky. While I know that doesn’t translate directly, I can only hope it means they’re thinking about how to improve the tooling to go along with the programming support.

What about you? Any tooling choices that have been a net positive in your transition to PostgreSQL?

PostgreSQL for a SQL Server DBA: A Series

I recently switched jobs, leaving a nearly 15-year career working with SQL Server. Although I have been using it that long, it has only been in the last 4 years that I finally realized how much I didn’t know about my favorite data technology stack, taking every opportunity to dig deeper and get more connected to the #SqlFamily.

Unfortunately, two months before joining KCF Technologies they transitioned to PostgreSQL because of the Aurora database offering from Amazon Web Services. The sales pitch behind Aurora is great and it has, in fact, saved them a ton of money. And at this point the database backing the application isn’t the complicated part of the application, but rather the the millions of real-time data points that stream into the service every minute (using a separate time-series database).

That said, it has been quite a transition trying to re-learn (or unlearn as the case may be) the PostgreSQL way of doing things and actually missing the huge economy of tooling that has grown up around the SQL Server community… tooling that’s largely missing in the Opensource DB world. Don’t get me wrong, I had some similar feelings 15 years ago when I started using SQL Server after spending a few years with PostgreSQL (of all things)… but this has certainly felt more challenging.

As I’m doing the learning, spending hours on Stackoverflow and a handful of other sites trying to get my bearings, I thought it would be good to write some of it down for my future-self and others that have the need to cross over at some level. PostgreSQL isn’t going away as an alternative, and both AWS and Microsoft have provided “one-click” startup options to use it.

There’s no specific agenda, but the first several posts will certainly share many of my frustrations trying to make the leap back(wards). But as I get my mind wrapped around each piece, I promise to share some positive things too. 😉


  1. PostgreSQL for the SQL Server DBA: Triggers
  2. PostgreSQL for the SQL Server DBA: The Tooling
  3. PostgreSQL for the SQL Server DBA: Tooling, Part 2 – It Gets Better

PostgreSQL for a SQL Server DBA: Triggers

This post is part of an impromptu series about PostgreSQL and things I am learning coming from a SQL Server background. A complete list of posts can be found on this post.

While Triggers have gotten a bad rap in many circles and are often discouraged (great read by Joe Celko), they are still a useful tool in a handful of situations. Trying to implement my first Triggers in PostgreSQL has been one of many learning adventures lately. Here are a few key points I want to remember for next time and share with others.

They are implemented as Functions (Stored Procedures)

This caught me off guard at first. I’ve been working with and dealing with triggers in SQL Server since day 1. They are first-class citizens… objects that have their own code blocks and rules.

PostgreSQL approaches it differently. Any reusable code block, regardless of its true purpose is a Function of varying types. Triggers are no different. Therefore, you write the logic of your trigger in a Function and then call it by adding a trigger to the DML event of a table.

CREATE OR REPLACE FUNCTION user_log()
RETURNS TRIGGER AS
  $BODY$
  BEGIN
    INSERT INTO UserLog (....)
  END;

  $BODY$
LANGUAGE plpgsql;

Once the function is available, you “attach” it to the DML event of the table by creating the trigger.

CREATE TRIGGER ti_users
  AFTER INSERT
  ON Users
  FOR EACH ROW
  EXECUTE PROCEDURE user_log();
Triggers can be run per-row

Again, this threw me a little coming from SQL Server which executes triggers on a batch of rows, even if it’s only one row. This also means that when a trigger is the best place to check or modify data in an application, additional work often has to be done to iterate the data inside of the trigger. In PostgreSQL these batch-level triggers are called a statement triggers and don’t provide the same access to the batch of rows that were inserted/modified that SQL Server does with the INSERTED.* and  DELETED.* special tables

With row-level triggers, however, the function you name is called on every row that is modified. Obviously this comes with it’s own caution as well, but in the right circumstances it means that the data is easily accessible because the function is working on one row of data at a time. Row-level triggers also give you access to the row data that you’re used to, but in PostgreSQL you refer to the NEW.* and OLD.* tables.

CREATE OR REPLACE FUNCTION user_log()
RETURNS TRIGGER AS
  $BODY$
  BEGIN
    INSERT INTO UserLog (Username, Message) VALUES (NEW.Username,  NEW.Message)
  END;

  $BODY$
LANGUAGE plpgsql;
Triggers can have predicates!

This is actually one of the few things I like about triggers in PostgreSQL that SQL Server doesn’t have. While it might add confusion, it also makes it very easy to limit when the trigger function is actually called.

In SQL Server, the appropriate trigger is always called when the DML event is performed. We often have to jump through a bunch of hoops (SELECT… EXCEPT) to see if the data we care about has changed before we do some kind of work.

In PostgreSQL it is so much easier. When you create the trigger that attached the function to the table DML, you add the appropriate predicate(s).

CREATE TRIGGER tu_users
AFTER UPDATE
ON Users
FOR EACH ROW
WHEN (OLD.FirstName IS DISTINCT FROM NEW.FirstName)
EXECUTE PROCEDURE user_log();

It’s that simple, and you can add more predicates to the WHEN clause. They just all have to be Boolean in nature.

It’s not all bad

In the end, I’m still missing most of what SQL Server offers in power, tooling and evolving technology. That said, Triggers in PostgreSQL has been one of the bright spots when they’re the right tool for the job.

DROP/CREATE vs. CREATE/ALTER in SQL Server

Over the last two years, as I’ve been working more closely on our application database (a database with nearly 20 years of history), one simple question keeps coming up as I work on old scripts and prepare new changes.

Should we use a DROP/CREATE or CREATE/ALTER pattern for Stored Procedures, Functions, Views, and Triggers?

Because of the tooling we had used previously, every script I had worked with consistently used the DROP/CREATE pattern when doing updates. For example:

IF OBJECT_ID('dbo.Proc123', 'P') IS NOT NULL
   DROP PROC dbo.Proc123
GO
CREATE PROC dbo.Proc123
...
GO

GRANT EXECUTE ---
GO

And I never questioned it. When you manage hundreds of client databases that all share the same schema and send updates multiple times a year to on-premise clients, it was just easier to make sure things were cleaned up and then create the PROC/Trigger/etc. again.

But then I started seeing a new pattern from a number of DBAs and teachers. I think I first noticed it with Adam Machanic’s sp_WhoIsActive and soon after with Brent Ozar’s sp_Blitz family of scripts. They used a different approach similar to this:

IF OBJECT_ID('dbo.Proc123', 'P') IS NULL
  EXEC ('CREATE PROC dbo.Proc123 AS SELECT ''stub version, to be replaced''') 
GO

ALTER PROC dbo.Proc123
  ...
GO
 
GRANT EXECUTE --- 
GO

Knowing these folks to be particularly smart in many things SQL, I figured there was a reason, although my Google-Foo only turned up a few opinions. There was, as best as I could tell, no hard and fast rule for what was ‘right’.

In most cases, the reason most people tended towards the CREATE/ALTER approach was to retain any previously set GRANT’s on the object. That makes a lot of sense and is compelling enough for me to start switching.

But then, during Erin Stellato‘s PASS  2017 Pre-Con on Query Store, she mentioned the most compelling reason I’ve heard thus far.  It’s also something that I’m sure translates into other DMV-like tuning data in versions prior to SQL 2016 that don’t have query store.

Query Store works by tracking the identity of an object (Object_ID) and the performance metrics related to that object. If you are working to tune something like a PROC and want to track its performance in Query Store, you’ll be shooting yourself in the foot if you DROP/CREATE the PROC each time because it will get a new Object_ID and the results of your changes won’t be tied to the previous metrics in the plan cache. It turns out that this is also mentioned as a best practice in the Query Store documentation.

While this should seem obvious once you hear it, this alone was a compelling reason to switch our processes and begin using CREATE/ALTER instead.

One final note

There is a new feature in SQL 2016 CP3, called ‘CREATE OR ALTER’. It allows you to ask the SQL engine to do the CREATE/ALTER logic for you so that you don’t have to write the CREATE check first when rolling out changes. Unfortunately we support versions prior to SQL 2016 so we cannot yet move to make this our default pattern.  Someday!

Long-Running Query Alerts With sp_WhoIsActive

Who doesn’t love sp_WhoIsActive? Nearly every single DBA talk I have attended in the last two years mentions as a “must have!” I do the same during my DBA Newbie talk at SQL Saturdays.

About 18 months ago I came across this post by Tara Kizer and started logging activity every 30 seconds on most of our servers using sp_WhoIsActive. That simple job has helped us diagnose the root cause of server performance issues numerous times and I often query the results just to get a quick idea of queries that are consistent resource hogs. Our application is heavy on ad hoc reporting and this gives us a window into reports being run in ways we didn’t expect so that we can evaluate and tune the base queries better.

Our SaaS environment has hundreds of databases and thousands of users. Inevitably, one of two things usually causes us to scramble when server performance tanks.  One of those ad hoc reports runs amuck and consumes more resources then we have, or we have a partner use an API in a way we never intended, suddenly bringing the server to a halt. Each of these occurrences provides a learning opportunity to improve our application and processes and prevent repeat failures. But we’re not perfect… and I hate when a customer informs us of a problem before we know about it.

I’ve tried a number of suggestions for monitoring long-running queries, from the actual server performance alert to our commercial monitoring software. In many cases I just didn’t have the control over the alerts I wanted (or the alert fired inconsistently) and I kept thinking I could come up with a way to do this with sp_WhoIsActive that was scriptable across all of my servers and fairly low maintenance.

On my first attempt, I assumed I was missing the “right way” to use sp_WhoIsActive from a scheduled SPROC.  When run from within a SPROC, the context of the SPROC and sp_WhoIsActive are different. Therefore, you can’t use a local temp table or table variable to pass information back and forth. Instead, the only solution I found was to use a global temp table and that just “felt wrong”. In my search for solutions I found this post by Kendra Little on outputting the results of sp_WhoIsActive to a global temp table for further processing and that set my mind at ease a bit to choose a similar route.

Armed with all of the above, I’ve created the first version of my long-running query email alert. I’m sure there will be some updates and improvements in the coming months, but I wanted to share this in case someone else finds it useful as a starting example.

First Things First – Initial Setup

This solution expects a standard database on all of our servers to hold monitoring and utility information called DBA_stats, an idea taken from that post by Tara Kizer above.  This table holds both my regular sp_WhoIsActive output (what I run every 30 seconds) and a second table that holds a log of long-running queries.

That second table has a few additional columns that help me manage simple alerting, sending an email at a low threshold (5 minutes) and again at a high threshold (30 minutes) if the query is still running.

***NOTE: The following ASSUMES that you already have sp_WhoIsActive installed. If not, go do that first!  🙂

[sql]

USE master
GO

SET NOCOUNT ON;
DECLARE @retention int = 7,
@destination_table varchar(500) = ‘WhoIsActiveLogging’,
@longrunning_table VARCHAR(500) = ‘LongRunningQueries’,
@destination_database sysname = ‘DBA_stats’,
@schema varchar(max),
@SQL nvarchar(4000),
@createSQL NVARCHAR(500),
@alterSQL NVARCHAR(2000),
@parameters nvarchar(500),
@exists bit;

SET @destination_table = @destination_database + ‘.dbo.’ + @destination_table;
SET @longrunning_table = @destination_database + ‘.dbo.’ + @longrunning_table;

If(db_id(@destination_database) IS NULL)
BEGIN;
PRINT ‘Creating stats database: ‘ + @destination_database;
SET @createSQL = N’CREATE DATABASE ‘ + @destination_database + ‘;

ALTER DATABASE ‘ + @destination_database + ‘ SET RECOVERY SIMPLE;’;

EXEC(@createSQL);
END;

–create the logging table
IF OBJECT_ID(@destination_table) IS NULL
BEGIN;

PRINT ‘Creating periodic logging table: ‘ + @destination_table;

EXEC sp_WhoIsActive @get_transaction_info = 1, @get_outer_command = 1, @get_plans = 1, @format_output=0, @return_schema = 1, @schema = @schema OUTPUT;
SET @schema = REPLACE(@schema, ‘<table_name>’, @destination_table);
EXEC(@schema);
END;

–create the long-running query table
IF OBJECT_ID(@longrunning_table) IS NULL
BEGIN;

PRINT ‘Creating long-running queries table: ‘ + @longrunning_table;

EXEC sp_WhoIsActive @get_transaction_info = 1, @get_outer_command = 1, @get_plans = 1, @format_output=0, @return_schema = 1, @schema = @schema OUTPUT;
SET @schema = REPLACE(@schema, ‘<table_name>’, @longrunning_table);
EXEC(@schema);

SET @alterSQL = N’
ALTER TABLE ‘+ @longrunning_table + ‘ ADD
id INT IDENTITY CONSTRAINT PKC_ID PRIMARY KEY CLUSTERED,
email_sent BIT CONSTRAINT DF_email_sent DEFAULT 0,
email_time DATETIME NULL,
email2_sent BIT CONSTRAINT DF_email2_sent DEFAULT 0,
email2_time DATETIME NULL;

CREATE NONCLUSTERED INDEX IX_SessionID_LoginName_DatabaseName_StartTime ON ‘+ @longrunning_table +’ (session_id, login_name, database_name,start_time);
‘;

EXEC(@alterSQL);
END;

–create index on collection_time
SET @SQL = ‘USE ‘ + QUOTENAME(@destination_database) + ‘; IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(@destination_table) AND name = N”cx_collection_time”) SET @exists = 0′;
SET @parameters = N’@destination_table varchar(500), @exists bit OUTPUT’;
EXEC sp_executesql @SQL, @parameters, @destination_table = @destination_table, @exists = @exists OUTPUT;

IF @exists = 0
BEGIN;
SET @SQL = ‘CREATE CLUSTERED INDEX cx_collection_time ON ‘ + @destination_table + ‘(collection_time ASC)’;
EXEC (@SQL);
END;
GO

[/sql]

 Readability Matters – Pretty HTML Emails

Once we start getting data into the long-running table I want to send email alerts. Raw query output doesn’t look good in email, so we need something to turn this into an HTML table. I looked at number of simple solutions for this task and landed on this example from StackOverflow.com.  The code below has a few tweaks and some CSS styles to help it look more readable in email.

[sql]

USE DBA_Stats
GO

PRINT ‘Creating Stored Procedure QueryToHtmlTable’;
GO

IF OBJECT_ID(‘QueryToHtmlTable’,’P’) IS NOT NULL
DROP PROCEDURE QueryToHtmlTable
GO
— Description: Turns a query into a formatted HTML table. Useful for emails.
— Any ORDER BY clause needs to be passed in the separate ORDER BY parameter.
— =============================================
CREATE PROC QueryToHtmlTable
(
@query nvarchar(MAX), –A query to turn into HTML format. It should not include an ORDER BY clause.
@orderBy nvarchar(MAX) = NULL, –An optional ORDER BY clause. It should contain the words ‘ORDER BY’.
@html nvarchar(MAX) = NULL OUTPUT –The HTML output of the procedure.
)
AS
BEGIN
SET NOCOUNT ON;

IF @orderBy IS NULL BEGIN
SET @orderBy = ”
END

SET @orderBy = REPLACE(@orderBy, ””, ”””);

DECLARE @realQuery nvarchar(MAX) = ‘
DECLARE @headerRow nvarchar(MAX);
DECLARE @cols nvarchar(MAX);

SELECT * INTO #tableSQL FROM (‘ + @query + ‘) sub;

SELECT @cols = COALESCE(@cols + ”, ””””, ”, ””) + ”[” + name + ”] AS ””td”””
FROM tempdb.sys.columns
WHERE object_id = object_id(”tempdb..#tableSQL”)
ORDER BY column_id;

SET @cols = ”SET @html = CAST(( SELECT ” + @cols + ” FROM #tableSQL ‘ + @orderBy + ‘ FOR XML PATH(””tr””), ELEMENTS XSINIL) AS nvarchar(max))”

EXEC sys.sp_executesql @cols, N”@html nvarchar(MAX) OUTPUT”, @[email protected] OUTPUT

SELECT @headerRow = COALESCE(@headerRow + ””, ””) + ”” + name + ””
FROM tempdb.sys.columns
WHERE object_id = object_id(”tempdb..#tableSQL”)
ORDER BY column_id;

SET @headerRow = ”” + @headerRow + ””;

SET @html = ”<html>
<head>
<style type=”text/css”>
table {
color: #333;
font-family: Helvetica, Arial, sans-serif;
width: 2000px;
border-collapse:
collapse; border-spacing: 0;
}

td, th {
border: 1px solid transparent; /* No more visible border */
height: 30px;
}

th {
background: #DFDFDF; /* Darken header a bit */
font-weight: bold;
}

td {
background: #FAFAFA;
text-align: center;
}

/* Cells in even rows (2,4,6…) are one color */
tr:nth-child(even) td { background: #F1F1F1; }

/* Cells in odd rows (1,3,5…) are another (excludes header cells) */
tr:nth-child(odd) td { background: #FEFEFE; }
</style>
</head>
<body>
<table width=”2000″ border=”1″>” + @headerRow + @html + ”</table>
</body>
</html>”;
‘;

EXEC sys.sp_executesql @realQuery, N’@html nvarchar(MAX) OUTPUT’, @[email protected] OUTPUT;
END;
GO

[/sql]

Doing the Work – Finding Long-Running Queries

This solution runs sp_WhoIsActive every minute and saves the output into a global temp table. From there, I look for any processes that have been running for more than the low threshold setting. Any of the processes that have not been identified and stored previously get logged, output to an HTML table, and an email alert sent.

Next, I take a second look at the table for anything that’s been running longer than the high threshold.  If a second email alert has not been sent for these processes, we output the same data and send the email. If two alerts have already been sent for these processes, I don’t do anything else at the moment. One of the next updates to this script will send an alert to our DevOps notification system for anything running longer than some final threshold (or maybe just the high threshold).

[sql]

USE DBA_stats;
GO

IF OBJECT_ID(‘LongRunningQueriesAlert’,’P’) IS NOT NULL
DROP PROCEDURE LongRunningQueriesAlert
GO

PRINT ‘Creating Stored Procedure: LongRunningQueriesAlert’;
GO

CREATE PROCEDURE [dbo].[LongRunningQueriesAlert]
@email_Subject VARCHAR(255) = ‘Long-Running Queries on ‘,
@low_threshold_min VARCHAR(2) = ‘5’,
@high_threshold_min VARCHAR(2) = ’30’,
@dbmail_profile VARCHAR(128) = ‘DB Alerts’,
@email_recipients VARCHAR(500) = ‘[email protected]
AS
BEGIN

SET NOCOUNT ON;

DECLARE @server_name VARCHAR(255),
@schema NVARCHAR(4000),
@SQL NVARCHAR(4000),
@lrq_table VARCHAR(255),
@html NVARCHAR(MAX),
@low_threshold_subject VARCHAR(255),
@high_threshold_subject VARCHAR(255);

SET @server_name = @@SERVERNAME;
SET @email_Subject = @email_Subject + @server_name;

 

/*
Setting this to a global temp table so that it’s available to the select after it is
created and we insert data
*/
SET @lrq_table = QUOTENAME(‘##LongRunningQueries_’ + CAST(NEWID() as varchar(255)));

–create the logging temp table
IF OBJECT_ID(@lrq_table) IS NULL
BEGIN;
EXEC sp_WhoIsActive @get_transaction_info = 1,
@get_outer_command = 1,
@get_plans = 1,
@format_output=0, — Don’t format output so that it works in an email
@return_schema = 1,
@schema = @schema OUTPUT;
SET @schema = REPLACE(@schema, ‘<table_name>’, @lrq_table);

EXECUTE sp_executesql @schema;
END;

— Run WhoIsActive again and put results into the table
EXEC sp_WhoIsActive @get_transaction_info = 1,
@get_outer_command = 1,
@get_plans = 1,
@format_output=0,
@destination_table = @lrq_table,
@not_filter = ‘PITTPROCWIN01’, @not_filter_type = ‘host’;

/*
Insert any new long-running queries that haven’t existed before

The WHERE clause below is very specific at the moment and not very flexible.
Improvements to what we ignore and how we specify it are needed.
*/
SET @SQL = N’
INSERT INTO LongRunningQueries ([session_id], [sql_text], [sql_command], [login_name], [wait_info], [tran_log_writes], [CPU], [tempdb_allocations], [tempdb_current], [blocking_session_id], [reads], [writes], [physical_reads], [query_plan], [used_memory], [status], [tran_start_time], [open_tran_count], [percent_complete], [host_name], [database_name], [program_name], [start_time], [login_time], [request_id], [collection_time])
SELECT tempLRQ.*
from ‘ + @lrq_table + N’ tempLRQ
LEFT JOIN LongRunningQueries LRQ ON
LRQ.session_id = tempLRQ.session_id
AND LRQ.login_name = tempLRQ.login_name
AND LRQ.database_name = tempLRQ.database_name
AND LRQ.start_time = tempLRQ.start_time
WHERE LRQ.session_id IS NULL
AND tempLRQ.start_time < DATEADD(MINUTE,-‘ + @low_threshold_min + N’,GETDATE())
AND tempLRQ.database_name NOT in (”master”,”msdb”,”tempdb”,”DBA_Stats”)
AND tempLRQ.program_name NOT LIKE ”%Service Broker%”
AND tempLRQ.program_name <> ”SQBCoreService.exe”’;

EXEC sp_executesql @SQL;

/*
Now send the emails for any new long-running queries
*/
—- Using the new SPROC, format the output as HTML for the email,
EXEC QueryToHtmlTable @query = N’SELECT id, LRQ.session_id, LRQ.sql_text,LRQ.blocking_session_id, LRQ.reads, LRQ.writes, LRQ.status, LRQ.host_name, LRQ.database_name, LRQ.program_name,
LRQ.start_time FROM dbo.LongRunningQueries LRQ
WHERE LRQ.email_sent = 0′ ,
@orderBy = N” ,
@html = @html OUTPUT

IF(LEN(@html) > 1)
BEGIN
SET @low_threshold_subject = @email_Subject + ‘ – >’ + @low_threshold_min + ‘ minute(s)’;
—- Now send the email
EXEC msdb.dbo.sp_send_dbmail
@profile_name = @dbmail_profile,
@recipients = @email_recipients,
@subject = @low_threshold_subject,
@attach_query_result_as_file = 0,
@importance = ‘Normal’,
@body = @html,
@body_format=’html’

/*
Update the table to specify that all new queries have had a notification sent
*/
UPDATE dbo.LongRunningQueries SET email_sent = 1, email_time = GETDATE() WHERE email_sent = 0;
END

 

/*
Now get a list of queries that are still running after the second threshold time has elapsed.
Someone REALLY needs to get on these.
*/
CREATE TABLE #HighThresholdQueries (
id INT
);

SET @SQL = N’INSERT INTO #HighThresholdQueries ( id )
SELECT id FROM dbo.LongRunningQueries LRQ
INNER JOIN ‘ + @lrq_table + N’ tempLRQ ON
LRQ.session_id = tempLRQ.session_id
AND LRQ.login_name = tempLRQ.login_name
AND LRQ.database_name = tempLRQ.database_name
AND LRQ.start_time = tempLRQ.start_time
WHERE tempLRQ.start_time < DATEADD(MINUTE,-‘+ @high_threshold_min + N’,GETDATE())
AND lrq.email2_sent = 0′;

EXEC sp_executesql @SQL;

/*
Now send the emails for any long-running queries that have persisted
past the high threshold
*/
— Reset the variable for reuse
SET @html = ”;

—- Using the new SPROC, format the output as HTML for the email,
EXEC QueryToHtmlTable @query = N’SELECT id, LRQ.session_id, LRQ.sql_text,LRQ.blocking_session_id, LRQ.reads, LRQ.writes, LRQ.status, LRQ.host_name, LRQ.database_name, LRQ.program_name,
LRQ.start_time FROM dbo.LongRunningQueries LRQ
WHERE LRQ.id in (select id from #HighThresholdQueries)’ ,
@orderBy = N” ,
@html = @html OUTPUT

IF(LEN(@html) > 1)
BEGIN
SET @high_threshold_subject = @email_Subject + ‘ – >’ + @high_threshold_min + ‘ minute(s)’;
—- Now send the email second email
EXEC msdb.dbo.sp_send_dbmail
@profile_name = @dbmail_profile,
@recipients = @email_recipients,
@subject = @high_threshold_subject,
@attach_query_result_as_file = 0,
@importance = ‘High’,
@body = @html,
@body_format=’html’

/*
Update the table to track that a second email has been sent for a query that has
been running for an extended period of time
*/
UPDATE LongRunningQueries SET email2_sent = 1, email2_time = GETDATE() WHERE id in (select id from #HighThresholdQueries) AND email_sent = 1 AND email2_sent = 0;

END

/*
Drop Temporary Tables
*/
DROP TABLE #HighThresholdQueries;

SET @SQL = N’DROP TABLE ‘ + @lrq_table;
EXEC sp_executesql @SQL;

END;
GO

[/sql]

Ready, Set – Schedule The Job

With all three pieces in place, setup a scheduled job to call the LongRunningQueriesAlert SPROC on a regular basis. I do it every minute and haven’t seen an impact at this point. This allows me to be alerted to a problem within a minute after it crosses any of the thresholds. Adjust as you see fit.

One of the biggest initial benefits has actually been on our development boxes. I’ve been notified a number of times when various SQL is run in development that leaves open transactions unknowingly. That might seem mundane but it’s been very helpful as the team has been growing.

Rinse and Repeat

Having run this for a few weeks, I plan to make the following changes to my scripts given what I know about our environment.

  1. Adjust the threshold times to be a little longer, maybe as high as 10 minutes for the first alert. We have a few jobs that could run that long in big databases. As expected, these queries tend to cause an alert email almost every day.
  2. Modify the schedule to not include our maintenance window in most cases. The script will catch anything that persists from the maintenance period as soon as I start it up again in the early morning.
  3. Make the filtering more easily injectable. sp_WhoIsActive provides some of the filtering that I’m not taking advantage of, and some it will come from improvements in my queries.