Building community – PostgreSQL edition

Community.

I believe we’re all created for it. I believe most people thrive when they find it. In fact, for many people, community has been their lifeline over the last year.

Truth be told, however, community is hard to build and maintain.

When I jumped onto a new career path in September 2020 as a Developer Advocate with Timescale, I had dreamy ideas about how easy it was going to be to build community within the PostgreSQL world. I’ve been a database user for 20+ years, lead a number of teams, had the fortune to be involved in some excellent tech communities over the years, and was growing in my rekindled fondness for Postgres. (don’t get me wrong, the journey back wasn’t all roses)

What I didn’t anticipate, however, was the lack of a Postgres “owner” to drive some cohesion within the community. Nor did I fully understand how the flexibility of the platform created so much diverse usage. Sometimes it’s easy to get stuck in traditional database developer/architect patterns to see the bigger opportunities. You know, the old “can’t see the forest for the trees” problem.

With six months under my belt at Timescale, I’m learning… that I have a lot to learn about the Postgres community. 😃

While the umbrella that encompasses “PostgreSQL” is large, I’ve quickly noticed that I have a tendency to focus on one subset of the user base (new to the fold, new to SQL, looking to grow) because it’s where I’m most comfortable. Instead, I’d rather be thinking about it more holistically, working to understand where everyone started, where they are at currently, and where they are headed with Postgres.

That’s why I’m so excited about The State of PostgreSQL 2021 Survey that we just launched for everyone, far and wide, to participate in.

The State of PostgreSQL 2021 Survey

At Timescale, we believe that investing in and building up the Postgres community helps everyone. Understanding how developers and businesses use and interact with Postgres is key to strengthening the community at large.

https://twitter.com/TimescaleDB/status/1372163878516359171

The survey covers four key areas:

  1. Who you are as a developer or user?
  2. How did you start with Postgres?
  3. What tools and features do you use most?
  4. What communities are you a part of and how have they been helpful?

In total, the survey is ~35 questions and should take you about 10 minutes to complete. Most of the questions are multiple choice, while others allow you to provide free form answers. For example:

  • What is the main reason you chose to use Postgres?
  • How easily do you find it to connect with the community?
  • What tools do you use and recommend to other users?
  • What’s the size of your database deployment?
  • What cloud provider have you used Postgres with?
  • Are you in the PostgreSQL or Postgres camp??? (you’ll notice I flow easily between them both! 😉)

Open Source results for everyone!

There’s no agenda here for Timescale. Within the community there are any number of groups or companies that could (or have) done similar things in the past. We love to see the growth and excitement around PostgreSQL and want to understand the community better so that we can find more ways to help.

But… we want to make sure everyone else is empowered to help too!

Once the survey closes, all results will be anomalized and made available for anyone to review and slice as they see fit.

So, what are you waiting for?

Let your voice be heard, helping us to provide valuable insights to the entire community. Let’s make the Postgres community stronger, together!

Amazon Babelfish: First Thoughts

Posted on Categories AWS, PostgreSQL, SQL ServerLeave a comment on Amazon Babelfish: First Thoughts

Amazon Babelfish promises to help customers seamlessly move from SQL Server to PostgreSQL. I’m not excited about it, but probably not for the reasons you think!

Amazon re:Invent 2020 is in full swing and although some of their announcements impact my current career more than others (still waiting to hear about Timestream updates…), one item did blow up my Twitter feed a bit on Tuesday (12/1/2020).

AWS goes after Microsoft’s SQL Server with Babelfish

Techcrunch

Want more PostgreSQL? You just might like Babelfish

Matt Asay, AWS

My first reaction was, “Whoa… this is a big deal!” Having spent a lot of my career with SQL Server (and still a big fan of the community), the jump back to PostgreSQL wasn’t as smooth as I wanted it to be. Two years later, things have changed dramatically and I’m feeling much more at home (and enthusiastic) about PostgreSQL and the community. So much so that I switched jobs and careers to join the awesome team at Timescale!

https://twitter.com/ryanbooz/status/1305694338417123328?s=20

So back to Babelfish. 1 week into thinking about the announcement, and I’m a lot less excited. Let me explain.

The problem isn’t Babelfish

The imputes for creating the tool is clear for AWS. Provide a way for customers to easily connect a SQL Server app to Aurora Postgres, saving big on licensing fees and reducing total cost of ownership. Assuming the tool is successful at some level, I’m sure it will provide a revenue boost for Amazon and some customers might (initially) feel a win. No harm, no foul on Amazon for leading the effort. Free markets, baby!

No matter how clever Babelfish is, however, I just can’t see how this is ultimately a win for SQL Server or PostgreSQL… or the developers that will ultimately need to support these “hybrid” apps.

(Quick Disclaimer: I wasn’t able to attend the live streamed deep dive session from re:Invent on Aurora Postgres and Babelfish (I’m waiting for the recording to post), so I’m sure there are details I’m overlooking, which may impact what I’m about to say. And, one other caveat… I have no doubt that the team spearheading this project is smart, hardworking, and doing everything they can to make Babelfish as awesome as possible.)

For some Amazon customers pricing may actually be reduced, at least initially, resulting in the promised financial win. Depending on the application and code, there might be a negligible performance impact (or maybe even an improvement??) Honestly, there might be plenty of apps that recreate the schema, flip the connection string, and cruise into the sunset РPi̱a Colada in hand.

Even if this all works better than expected, I still think there will be more lost than gained for one reason:

SQL illiteracy.

The problem is SQL illiteracy

I’ve been involved in software development for nearly 20 years and always with an affinity for databases and SQL in general. Even then, in a room of 100 SQL folks (SQL Server or PostgreSQL), my level of expertise is probably in the bottom 20%. (but my, how I do love learning more every day!) Still, in every job I’ve had, teaching and mentoring have been a focus that I’ve enjoyed and treasured.

While SQL may be the third most popular language in 2020, I haven’t run across many traditional developers that understand the nuances of each SQL platform, even one they’ve been “using” for their entire career. In most cases the culprit for this SQL illiteracy is an ORM. They are convenient and extremely helpful with the mundane tasks like selecting lists of objects or saving data without writing your own abstraction later. And this is often really good stuff because, after all, we’ve been taught over and over to abstract away the mundane. ORM’s help us do that.

But beyond select * from... and INSERT INTO..., things almost always start to fall apart at some point. It’s nearly impossible to create a well written query that takes advantage of specific SQL dialect and features when you’re trying to be all things to all people. Too often DBA’s are called in to debug a slow query (some bad enough that they bring down servers under load), only to find a query created by an ORM. Who doesn’t love un-nesting 15 levels of sub-select queries when a simple (and more readable) version using a CTE or CROSS APPLY (SQL Server) or LATERAL JOIN (PostgreSQL) would have done the job? Most of us have been there.

Let’s not even talk about PIVOT and translating to crosstab! 😃

So why do ORMs come to mind when thinking about Babelfish? Because I think it’s going magnify the SQL illiteracy of most developers making this transition. This is primarily a marketing effort and a way to bring down spend and make CFOs happy… initially.

Babelfish won’t help SQL Server developers understand the features of PostgreSQL better, or why portions of the app don’t work well after the transition (let alone how to improve them). They won’t understand the pluggable extension architecture that PostgreSQL thrives on. (trigram index anyone? Easy GIS queries? Time-series data?) It doesn’t point them to the best resources and trainers on the internet for PostgreSQL (in the last month alone, I’ve seen multiple requests that start, “Who is the Brent Ozar of Postgres?”)

There is also be plenty of nuance that goes the other way, too… features that the SQL Server community spends a lot of time teaching about and building consulting careers around. Clustered indexes (vs. MVCC)? Columnstore Indexes (“just use them!”)? Querystore (which AWS won’t replicate any time soon)? Built-in query plan visualization? In short, the SQL Server community has spent a lot of effort teaching and learning about the internals of the database query engine to create performant apps. Translating those features will take a lot of effort that I probably doesn’t interest AWS at all.

Instead, Babelfish will simply return an error if it can’t correctly figure out what to do. I suspect a lot of developers will be getting those errors.

In the end, nobody will be happy

As a result, I think we’ll have at least three different problem sets when these apps start to implement Babelfish:

  1. ORMs creating complex, (already?) poorly performing queries for SQL Server being translated to (poorly) performing PostgreSQL queries. Lose/lose for everyone
  2. Targeted, hand-crafted queries that perform well on SQL Server that are hard (or impossible) to translate to PostgreSQL
  3. Stored Procedure-based applications that will require manual updates and still not take advantage of PostgreSQL features for improved query performance – features that SQL Server just doesn’t have. (👋 SQL Server folks, did you know you can tune PostgreSQL planner values for functions and stored procedure?!?!)

Putting it all together, in the short-term (at least) we’ll have a few initial outcomes:

  1. More revenue for AWS and likely higher long-term costs for users as the need to scale up PostgreSQL instances increases in order to deal with inefficient queries.
  2. Frustrated SQL Server developers that assume PostgreSQL is an inferior database
  3. A PostgreSQL community that thinks SQL Server developers are inferior and “stupid” for not understanding how things really work. (To be honest, a lot of SQL Server devs may be shocked that there’s not as much #SQLFamily community in the #Postgres world. Sorry.)

As I said, I haven’t seen the deep-dive from AWS yet and I’m not currently using Aurora Postgres to give it a test run, so these are preliminary thoughts.

Be the solution! Level-up your SQL

In the end, my greatest desire when I lead a team or teach about a technology is for developers to better understand the tools and technology they use, enabling them to create awesome applications. I get no greater joy for others (or myself honestly) than when new insights are gained and core technology is understood.

If you’re a SQL Server dev that might get caught in this transition, spend some time getting to know PostgreSQL a little better. While I haven’t quite found Brent’s replacement yet, there are a lot of good places to start learning. Start with my old post on PostgreSQL tooling (Part 1 & Part 2) and have a look at the responses to this Reddit post.

If you’re a PostgreSQL dev or instructor, have patience with the folks that will be thrust into learning the basics of Postgres for the first time. Remember that they probably didn’t choose this transition for themselves!

Hopefully as more details emerge and the Babelfish implementation is better understood, the performance worry is misplaced. Hopefully it drives better understanding of both products – because they each have a lot to offer users depending on the use case.

Still, I’m afraid we’ll be missing an opportunity to level-up on skills, drive deeper understanding with developers, and increasing the ROI for all parties.

What are your thoughts?

Exchanging an OAuth Token in Power BI Custom Data Connectors

Custom Data Connectors can support multiple kinds of authentication. For most modern web apps, OAuth is the obvious choice and the Power Query SDK, coupled with the authentication examples (here and here), make it easy to setup.

Unfortunately, two apps that I have recently created Custom Data Connectors for were partway through an OAuth implementation. While the circumstances were unique to each application, I had to authenticate the user with the OAuth Identity Provider (IdP) and then exchange the access_token for one that the application could use. This isn’t an unusual OAuth “flow”, particularly when the application uses a custom payload in the access_token JWT for application functions.

Whether you agree with an exchange flow or not, I had to figure out how to make the Custom Data Connector work within these constraints.

To accomplish this, I had to interrupt the normal OAuth response flow and exchange that IdP provided access_token for one that the application would accept as through Bearer authorization. I also needed to ensure that the refresh_token flow would continue to work when Power BI encountered a HTTP 401 (unauthorized) response from and API, causing it to perform a refresh grant to retrieve a new access_token and keep data flowing without the user being interrupted.

It just wasn’t clear to me exactly where to do all of this work.

In other OAuth frameworks or SDK’s, it’s usually pretty easy to see where you’d drop into the flow and make the exchange before moving ahead. Given the Github authentication example, my mind wasn’t thinking functionally at the time (it’s still a challenge for me!). It was unclear that the response from the FinishLogin function was being consumed and stored by Power BI directly. Once I made that connection it was easy to accomplish what I needed.

The Basic OAuth Setup

I started with the examples from the articles I’ve linked to above, enabling OAuth and setting up each of the functions.

...    
Authentication = [
        OAuth = [
            StartLogin=StartLogin,
            FinishLogin=FinishLogin,
            Refresh=Refresh,
            Logout=Logout
        ]
...

Inside of the FinishLogin function there is a call to the TokenMethod which takes the code grant and exchanges it with the IdP for an access_token.

TokenMethod = (code) =>
    let
        Response = Web.Contents("https://auth.example.com/oauth2/token", [
            Content = Text.ToBinary(Uri.BuildQueryString([
                client_id = client_id,
                client_secret = client_secret,
                code = code,
                redirect_uri = redirect_uri])),
            Headers=[#"Content-type" = "application/x-www-form-urlencoded",
                                 #"Accept" = "application/json"]]),
        LoginResult = Json.Document(Response)
    in
        LoginResult ;

It took me a while to realize that the Json.Document(Response) was simply parsing the OAuth payload (id_token, access_token, and refresh_token) into a Power Query record that the connector/runtime was storing and using later. Therefore, I had to do my exchange somewhere in this function before returning the Record that Power BI needed.

Exchanging the Access Token

In order to exchange the trusted access_token, another function was needed to call the actual Exchange endpoint of each application. YMMV, but here’s what a basic function might look like.

TokenExchange = (id_token as text) =>
    let
         NewToken = Web.Contents("https://auth.example.com/api/exchangeToken",[
            Content = Text.ToBinary("{id_token: '" & id_token & "'}"),
            Headers = [#"Content-Type"="application/json"]]),
         LoginResult = Json.Document(NewToken),
         AccessToken = LoginResult[token]
    in
        AccessToken

With that in place, I could modify the TokenMethod to do the exchange mid-stream for me like so.

TokenMethod = (code) =>
    let
        Response = Web.Contents("https://Github.com/login/oauth/access_token", [
            Content = Text.ToBinary(Uri.BuildQueryString([
                client_id = client_id,
                client_secret = client_secret,
                code = code,
                redirect_uri = redirect_uri])),
            Headers=[#"Content-type" = "application/x-www-form-urlencoded",
                                 #"Accept" = "application/json"]]),
        body = Json.Document(Response),
        result = if (Record.HasFields(body, {"error", "error_description"})) then 
                    error Error.Record(body[error], body[error_description], body)
                 else
                    // This is a code exchange, we expect a refresh_token
                    if (Record.HasFields(body, {"refresh_token"})) then 
                    [
                        refresh_token=body[refresh_token],
                        access_token = ExchangeToken(body[id_token])
                    ]
                    // This token was obtained by a Refresh request. No refresh_token             
                  else
                    [
                        access_token = ExchangeToken(body[id_token])
                    ]
    in
        result

The key was understanding that the result of the TokenMethod, which is what the FinishLogin method will ultimately respond with, needs to provide a Record with all of the fields necessary to continue the OAuth flow. In this case, simply replacing the access_token and refresh_token allowed Power BI to store this for later use.

Whenever the token expires, producing a HTTP 401 (Unauthorized) response, the RefreshToken logic kicks in, goes back through this same logic, and data keeps flowing.

Conclusion

Learning how to use Power Query to get data from REST APIs has been a fun (if not frustrating at times) learning experience. The documentation continues to improve and the Power Query team and community are very helpful. Each new challenge helps me better understand the building blocks they have provided and how to get to the next milestone.

Now, if I could just have something like a “finally” query that gets executed after an import to create the relationships I need and additional linking tables if needed. Wow, I’d be in heaven! 😉

Organizing Functions in a Power BI Custom Data Connector

The more complex a Power BI Custom Data Connector gets, the more cumbersome the main Power Query file becomes. Without the ability to make a true “project” structure and include or directly reference functions from other files at this point, I wanted to find a way to keep things a little more organized.

In the series of tutorials that the Custom Data Connector team put together, there’s an example of how to include external functions at the end of Tutorial 7. Although it’s not mentioned in this short explanation, I quickly realized that a Power Query project references all files by inclusion in the final build, not the physical path structure of the project or through namespaces. As long as the file is set to be compiled into the final solution, you can reference it by name regardless of where it is actually placed in the file structure of the project itself.

Therefore, to keep my Custom Data Connector projects a little more visually organized (and to help my future self remember how functions relate), I’ve gotten into the habit of organizing my projects using folders that group common functions together.

I’m still tweaking what I call my “standard” set of folders, but between projects the concepts are the same.

The Root

I try to only keep my Resource and Power Query files in the root of the project, making it easy to find as the number of functions grows.

Icons

Next, I always move the icons into a separate folder. In the generic Power Query project template, you end up with 8 icon files taking up space and named after your main PQ file. Clutter.

Old Snippets

This folder isn’t always named consistently, but I keep a folder for functions and snippets of code I don’t need anymore, usually because I found a better way to do something. I’ve found that it can be helpful for me to refer to old examples of how the project evolved in a way that looking at the commit history can’t solve. When snippets are in a file, it’s easy to include it as a function and see it work again.

Helper/Core

I’m still trying to figure out what I really want this folder to be called, but it contains the meat of my application function logic. Inside of this folder I make sub-folders that group common functions together.

  • Table – These are usually the base helper functions provided by the team at Microsoft with some custom modifications.
  • Data Transformations – The heart of a data connector is about transforming data into something usable for… the user. These are separate functions that generally translate to each Query in Power BI.
  • Authentication – Most data sources require some sort of Authentication. Most of these functions are based on authentication examples from the Microsoft team. I’ll tweak them as needed for the specific project.

Making it work

Whatever your specific folder structure becomes, how does this work out in practice?

Simple! Any file that you want to reference in connector code has to be set as Compile in the Build Actions. Once that is set, the file can be referenced by name only in a PQ function. If you ever forget to set a file to be included in the compiled Data Connector, you’ll quickly get a warning.

For files like the Icons, you don’t need to do anything special after setting the build action. As long as the name stays the same, the references in you Connector.Icon method will just keep working.

However, if you want to load one of the other functions, you’ll need to use the temporary workaround as described in Tutorial 7.

// 
// Load common library functions
// 
// TEMPORARY WORKAROUND until we're able to reference other M modules
Extension.LoadFunction = (name as text) =>
    let
        binary = Extension.Contents(name),
        asText = Text.FromBinary(binary)
    in
        Expression.Evaluate(asText, #shared);

With that function available, you can now load other functions with one or two lines of code. If the method requires no inputs, just initiate it to a function variable.

Table.ChangeType = Extension.LoadFunction("Table.ChangeType.pqm");

In some cases, however, your function might need access to another external function or parameter which has to be passed by reference because the scope has changed. In that case, you need to load the function first and then make it available through a second function variable that accepts the external references and any additional parameters.

TokenMethodFunction = Extension.LoadFunction("Auth.TokenMethod.pqm");
TokenMethod = (grantType, tokenField, code) => TokenMethodFunction(grantType,tokenField,code,OAuthConfig,ExchangeToken);

With this workflow, my overall Power Query file is much less cluttered and it helps bring more meaning to my source control commits when functions are broken down into smaller chunks.

The One Downside

I’d be remiss if didn’t mention the one, hopefully obvious, issue with moving files outside of the main Power Query file into .PQM files.

The lack of syntax highlighting and intellisense.

It is annoying. I’ve learned to deal with it for small changes. If I need to work through a problem, I’ll copy the function over until it’s been modified and is working, and then wire it back up through the “LoadFunction”. In reality, once a function is working, it’s rare that I’ll actively modify it.

Programming SQL Server Triggers and Functions: My First Pluralsight Course

I’m thrilled to share that I’ve completed my first Pluralsight course, Programming SQL Server Triggers and Functions, and it has finally been published for subscribers to enjoy! I began the journey of authoring my first course with Pluralsight September 2018 and officially started production of this course in February of this year. It was certainly a challenge to create a course of 3+ hours while changing jobs, having our sixth child and speaking at a few conferences on other topics, but the overall experience was more rewarding than I expected and has only increased my desire to become a better trainer.

Why a Course on Triggers and Functions?

Seriously, right? Seems like an bit of an “oldie but a goodie” place to start my Pluralsight training career… eh?

I’ve grown up teaching and training in one capacity or another most of my life. My undergrad degree is in education and after teaching for a few years, I was blessed to follow my other passion and develop a great career in software and data. In many regards I often feel like I’m at the bottom of the class on most topics, struggling to keep up. But I also realized that no matter how much I have to learn about a subject, there’s someone else that needs to understand this same information for something they’re working on and I might be the one that can help them get to the next step.

After I auditioned for Pluralsight and was accepted as an author, I came to the first meeting with a handful of topics focused on the Microsoft Data Platform that I wanted to share with the world. SQL Server Database Projects, Power BI Custom Data Connectors, Service Broker.

But I was offered a bit of a “fast-track” to my first course if I chose a course from the list dealing with specific Learning Paths, one of which was SQL Server. Of the 10-ish SQL Server topics on the list, only a few were still available to claim and… well… I happen to be a fan of both Triggers and Functions when they’re right for the job. Pluralsight had already created a basic outline of what they wanted the course to cover and how long they thought it might be. I took 6 weeks to fill in the blanks (remember, I was growing my family and switching jobs) and finally began production in February.

What Does the Course Cover?

I’m glad you asked! I ended up covering a lot more ground than I thought I would. Honestly, if the outline hadn’t been provided for me as a starting point, I might have even recommended the course be broken into two different courses. There are just so many topics to cover when dealing with these two features that, in all honesty, it could have been twice as long. (I’m guessing that most authors feel the same way after recording a course like this)

1. DML Triggers

Of the 3+ hours, the first module on DML Triggers is the longest at 54 minutes. I’ve found over the years that there are a lot of misconceptions on the basics of how Triggers work, including some confusion between SQL Server and other databases like Postgres, that it felt like the detail and coverage of these specific topics was necessary. I discuss and demo all of the major parts of using DML Triggers including AFTER and INSTEAD OF, the INSERTED and DELETED virtual tables, and Execution Order. While the concept of how a DML Trigger works regardless of the event that triggered it are easily transferable (INSERT, UPDATE, or DELETE), I felt it was worth trying to provide at least one example of how and why someone might use each of them. I even talk about using INSTEAD OF Triggers on VIEWS, something many SQL developers just don’t understand.

2. DDL and LOGON Triggers

In this second module I focus on the anatomy of a DDL/LOGON Trigger, what makes them different from DML triggers, and when they tend to be useful. Most of the discussion and examples deal with protecting data and schema using these tools, both through logging and restricting access. Some technology might be slowly diminishing the need for DDL/LOGON triggers, but I still think there are valid use cases to consider.

3. Working Smarter with Triggers

Let’s be honest, it’s the year 2019 and many SQL Server professionals would caution you against using Triggers at all. Even Joe Celko, one of the SQL standard authors, thinks you should generally try to avoid them. I’m very aware of the strong opinions against using triggersand have often seen when they are overused and unmaintained… leading to countless headaches.

But as a data professional, they can be a great tool to solve specific problems when used well! In this module I cover Execution Context and security, how MERGE interacts with Triggers, logging information outside of the transaction, avoiding work when possible, and even a small shout-out to Service Broker and the value of using asynchronous triggers.

4. Reusing Code with Functions

This module is simply an intro to Functions within SQL Server and what the differences are between Multi-statement and Single-statement functions to prepare for the next two modules. Again, differences in other database technologies and even misconceptions within the SQL Server community, can make a detailed discussion on Functions difficult without some initial groundwork.

5. Multi-Statement Scalar and Table-Valued Functions

Functions are a good, necessary tool for producing consistent and reusable code. For nearly two decades in SQL Server, most developers have written and used these types of functions and yet many don’t fully understand the impact it could be having on their overall application performance.

Therefore, in this module I talk about how to write both Scalar and Table-Valued functions and how they can be useful. However, I also show why they can be problematic, what to look for and some recent enhancements, starting in SQL Server 2017, that can improve the performance of your current (and future) Multi-statement functions.

6. Single-Statement (Inline) Table-Valued Functions

The course ends with a module that focuses on Single-Statement Table-Valued functions, often referred to as Inline Table-Valued Functions. I discuss how they differ from Multi-Statement functions, how to create them, and then I discuss and demonstrate how to convert a Multi-Statement function into an Inline Table-Valued Function and what the advantages can be.

Am I a Trigger and Function Expert?

No! I’m not going to lie to you. There are a ton of other people in the SQL Server community that are more qualified than I am to teach a course like this. I happen to have dealt with both Triggers and Functions a lot over the last 15 years and I think I have a good foundation to teach from. I was surprised more than once at how much I didn’t know at the beginning of creating this course. Without a doubt, this certainly proved to be a reminder that “the best way to learn something is to teach it”.

I do think, however, that this course will provide a solid foundation and understanding about Triggers and Functions within SQL Server for most beginner/intermediate SQL Server developers… allowing you to do better work.

Will I Be Creating More Courses?

I hope so. I don’t have the best setup in our current house for producing courses, which sometimes required my family to take day-trips so that I could record with guaranteed quiet, and even then I often had to post-process the audio side of the course. I’ve also just taken on a leadership role in my new company which has shifted my focus a bit for now.

But as I said at the beginning of the article, I love teaching and training, and the staff at Pluralsight are top-notch and extremely helpful. I’d be silly not to try and do it again if I can find the right topic.

Because of the change in my current role, the next courses will likely shift away from SQL Server in particular to PostgreSQL, Power BI and maybe even some tooling topics like Azure Data Studio. I’ve committed to my wife that I won’t do a course unless it’s around a topic I already speak about or would be preparing to speak on. Having that material ready at the beginning significantly cuts down the time needed to create demos and slides.

I hope you have an opportunity to take the course or pass it on to others that are starting their SQL Server journey.

It’s a Wonderful Developer’s Life

I had the pleasure of presenting two sessions this past January at CodeMash 2019. This was my fifth year attending the conference and I can’t say enough good things about the team and presenters that give so generously each year. I’ve learned a lot just by watching them!

Four years ago, as I was considering my “mid-career” life, I attended the Lightening Talks that are presented midway through the conference. This is an “open mic” event where any attendee (and even some attendee family members in the past) can present on a topic of their choice in 6 minutes or less.

The talks that year were funny, heartfelt, inspiring, and encouraging. Between those talks and some other encounters at the conference, I started to pursue opportunities to teach and present. Although I’m not able to do it as often as I’d like (family and my day job come first!), it is always a blessing to take something I’ve learned and encourage others to dig deeper.

Fast forward a couple of years, having just struggled through the choice of leaving my long-time employer, and having our sixth child… and I was in the right place to be challenged and inspired in my professional life by some of our favorite Christmas movies.

So much so, that I thought it would make a perfect Lightening Talk. On a personal level, this is one of the most satisfying talks I’ve presented in a while, even at less than 6 minutes.

Slides can be found in my Github repo

I hope you enjoy it.

Learning to give back, because someone in your sphere needs your help!

PostgreSQL for the SQL Server DBA: Tooling, Part 2 – It Gets Better

This post is part of an impromptu series about PostgreSQL and things I am learning coming from a SQL Server background. A complete list of posts can be found on this post.

WOW! I honestly didn’t expect to get quite so much notice and feedback on the first iteration of this post. The power of social media at work. 😉

But I’m so thankful that I did! It allowed me to quickly see that the support for PostgreSQL is stronger than ever, there are many opinions on tooling, and I have a ton to learn. That’s a challenge I’m up for!

I particularly appreciated the post that Rob Conery created in response, gently reminding me that when something starts Open Source and Linux, there’s often a plethora of command line tools ready to help. Much like Git (which is always used as the seminal ‘command line is more powerful’ tool example), something like ‘psql’ is really good at helping me get around to the things I often need to know quickly. I need to remember to consider the roots as I try to grow my understanding deeper.

To that end, I’m all for using command line tools when I can and it makes sense, particularly in database administration. But when I’m knee deep in a new feature and trying to help a young team, I’m still on the lookout for a tool we can use together to better develop and manage the PostgreSQL database day-to-day.

And so, I thought a quick update would be valuable based on the feedback I received here and through Twitter. As always, I’m learning alongside everyone else, trying to be as productive as I can for my employer, my teammates, and the larger community. Hopefully this is helpful part of that larger conversation.

Tooling – Part 2

As you’ll see in the previous post and comments, there are tooling options aplenty – something most of us have come to expect from an open source environment. Here’s where things are at two weeks later for me.

Azure Data Studio

What?? Azure… Microsoft… PostgreSQL?

YES!!

To be honest, most of this post was already written and focused on DBeaver before I got wind of the pending update to Azure Data Studio. I’ll have a second blog post in the next few days about my experience so far. That said…

As you might have noticed at the end of the previous post I mentioned that Shay Rojansky was hired by Microsoft recently. I didn’t expect that he’d start working on tooling specifically, but it did signal to me that they were putting more investment and energy towards PostgreSQL.

Microsoft announced today as part of the MVP Summit the beta release of a PostgreSQL extension for Azure Data Studio. I was able to get a sneak peak a few weeks ago and I’m excited about the potential. It’s clear that the ultimate goal is to provide access to the same great features they are building into this cross-platform data management application. I’ll have a separate post soon about some of my first impressions, but two highlights that I think hold the most unique opportunity for ADS to become a great tooling option for PostgreSQL.

  1. Community Extensions: Much like Visual Studio Code, Azure Data Studio is open source and is building out a platform for extensions built by the community. Some initial quick wins could be extensions for better monitoring and tuning… and a graphical query plan visualizer!! (hint, hint…)
  2. SQL Notebooks: I got to see the first examples of these at PASS Summer last year when Vicki Harp and the team presented and I think it’s an AWESOME idea. Having dabbled in and around data science for a few years now, the power of notebooks to store thought process while also displaying results in-line is so much fun. For the first time, you’ll be able to do this with SQL queries in SQL Server or PostgreSQL. YAY!

DBeaver

https://twitter.com/ryanbooz/status/1100612407934832640?ref_src=twsrc%5Etfw
Yes. I had the exact same thought!

After I got past the name, it turns out that DBeaver Community Editionis now my daily driver. I have no idea what the history is (time has been at a premium lately), but the overall layout and functionality comes close to my SSMS “expectations”. I’m not saying SSMS is the standard to reach for, simply that it helps me be more productive quickly because it’s more intuitive… for me.

The navigation layout follows a predictable pattern of feature separation, query editing just feels more intuitive (which is subjective for sure), and most important to me, DBeaver produces accurate DDL when I inspect database objects. As a developer and administrator ramping up on a new project, this kind of detail is especially important for me. Otherwise, I’m left wondering if my latest modifications were applied correctly.

As with most other PostgreSQL tools, DBeaver is cross-platform and able to connect to multiple SQL and NoSQL databases. I can’t currently speak to its performance on other databases, although I will be trying it with some of these platforms in the near future.

DBeaver is built on the Eclipse ecosystem, whatever your feelings on that are. One advantage is that the plugin architecture allows for some great free plugins, like export to Excel. Scoff as you might, it makes sending data to Product Owners, managers and users just a little be easier when non-standard data is involved.

And finally, DBeaver is one of the few options aside from PgAdmin that has a free, open source Community Edition that is very capable, while providing a reasonably priced Enterprise Edition for additional features and support. Oh, I almost forgot! It can be installed with Chocolaty too!

A Few Highlights

Most of the other tools I’ve looked at have variations on these features, so I’m not trying to claim that they are unique to DBeaver at all. But again, for whatever reason, finding and using them is just more intuitive for me (at this point).

Intuitive Server/DB Navigation

Easy Navigation

It might be a small thing, but being able to navigate objects in the database and refer to them consistently with others on the team is immensely helpful. When I talk about Views, Indexes or Functions, there’s no guessing. Again, remember that I’m often thinking about how I can best help others learn to use the DB effectively and having an easy, common way to get about makes this easy.

Also notice a really small thing within each database in DBeaver, access to two simple queries for Sessions and Locks. Again, coming from the SQL Server world, there are so many community contributors teaching and preaching the benefits of knowing some of these essential pieces of day-to-day work. Other tools certainly have something like this I’m sure, but this gave me a quick way to star to explore performance problems on development databases and elsewhere.

DDL/Data Access

I know it seems like I harp on DDL a ton, but one of my main objectives in this new job is to help bring the database into order. If I can’t easily see the data, the schema definition, and know the state of various objects, that is generally a non-starter for me

Easy access to all schema objects

Double-clicking on any object (table, view, triggers, etc.) will open up the edit window. As you can see, when it’s a table, properties and data preview are close at hand, and all parts of the table are visible through the tabs on the left.

Honestly, this seems like a good option for something like SSMS to have, rather than dealing with each of these objects separately… but I digress.

My one gripe at this point is adding columns to a table through the UI. I haven’t found a quick method to do it graphically. Generally I resort to straight SQL for these operations, but when I was creating a prototype this week, I was jumping back and forth wishing I could just quickly tab through some options. It’s minor.

Result Table Filtering

Inline results grid filtering

To be honest the first time I saw this was with DataGrip. I didn’t think it would be useful at all and mostly just assumed it was added as a checkbox to increase the feature count. But, I’m surprised how often I’m in a database and a co-worker asks if something exists and my first reaction is to double-click the table (display data) and then start typing a column and value to filter. It’s not my main method (I can still crank out “SELECT x FROM table WHERE…” as fast as the next guy…), but it is a useful little time saver now and again.

SQL Formatting and Intellisense

Another small detail that doesn’t work as well in other tools I’ve tried. And, it even bests the default SSMS capabilities (although, Red Gate’s SQL Prompt is a great tool that easily bests what’s built into DBeaver!)

When I’m typing, speed is generally more important to me, which means that I don’t use all CAPS for keywords in the SQL language. Yes, it doesn’t matter. But it does look better when things get complicated and I’m trying to show or teach someone about SQL. DBeaver is the first tool out of the box that allows me to set CAPS for auto-complete on intellisense and it generally just works. In other IDE’s, it will only go CAPS if I start the word with a capital letter. Arg.

When I’m typing, speed is generally more important to me, which means that I don’t use all CAPS for keywords in the SQL language. Yes, it doesn’t matter. But it does look better when things get complicated and I’m trying to show or teach someone about SQL. DBeaver is the first tool out of the box that allows me to set CAPS for auto-complete on intellisense and it generally just works. In other IDE’s, it will only go CAPS if I start the word with a capital letter. Arg.

The Others

A few other suggestions made it into the comments of the previous post(and hopefully this one too!). The short list includes:

Feel free to check them out and blog about them if you feel strongly about it. I’m sure someone in the community will be thankful you did!

At this point I don’t plan to look for others. Between DBeaver and Azure Data Studio, I can work more effectively with my team and provide better, faster service to my internal customers with PostgreSQL. I’m sure I’ll take a few minutes from time to time, seeing if any new contenders are making waves.

Stay tuned for more information and feedback about Azure Data Studio.

PostgreSQL for a SQL Server DBA: The Tooling Stinks

This post is part of an impromptu series about PostgreSQL and things I am learning coming from a SQL Server background. A complete list of posts can be found on this post.

UPDATE: In response to this article and much of the feedback, I’ve created a second post on two other tools. Check that out for more context and options… and keep watching for more information on PostgreSQL tools.

So here’s the deal.

Just. Stop. Looking.

No, really, stop looking for more user friendly tools that provide even 25% of the features as SQL Server Management Studio, Azure Data Studio or even Visual Studio (for database projects). They don’t exist.

Obviously this is my current opinion. (YMMV)

Why Bad Tooling is a Problem

As I’ve mentioned at various times, I come from a teaching background and so I tend to be hyper-aware of things that unnecessarily interfere with learning something. When it comes to technology, that can be a lack of documentation, bad tooling, or good tooling that’s simply too hard to figure out. And yes, SSMS could fall into the later category at times.

In the case of PostgreSQL, I’ve quickly come to the conclusion that bad tooling is one of the main reasons the uptake is so much more difficult and convoluted coming from the SQL Server community. Even the devs I’m currently working with that have no specific affinity for databases at all recognize that PostgreSQL just feels like more of a black box then the limited experience they had previously with SQL Server.

/* Begin Brief Soapbox*/
Honestly, this is by far one of my biggest grips about Open Source software now that I’m older, busier, and don’t want to spin my wheels trying to make something simple work. When the tools make it hard to dig in and work effectively with the database, most developers and shops will default to code-first/ORM only development. In nearly 20 years of software development and leading multiple teams, I’m still surprised how little most developers really care about effectively using a database of any kind. During most interviews only about 30% of applicants can ever answer a few basic SQL questions. And now I think I’m starting to understand why. Most of them have been relegated to an Open Source world with Open Source tooling when it comes to SQL. Yes, it’s cheap and allows projects to spin up quickly, but once those students get past their little pizza ordering app from CompSci 402, they’ll be lost in the real world.
/* End Brief Soapbox */

Quick Aside: Lest anyone misunderstand me, my grip is not with Open Source software at all. I LOVE OPEN SOURCE SOFTWARE! Really! I’ve been using Linux since 1998, built my first kernel in 1999 on a bus headed to North Dakota (long story) and I literally wouldn’t be where I am without OSS. This conversation is specifically about tooling for SQL platforms in the OSS world.

The current contenders

All of the griping and soap-boxing aside, I really do want to provide a short list for others to consider that might be in a similar position. Below is a short list of tools I’ve tried or (eventually) purchased to make using and managing PostgreSQL easier and more straightforward. As with most OSS projects, there are certainly many more than this. But based on feedback from others and doing my own testing, this where things currently sit for me.

PgAdmin

PgAdmin is the web-based GUI for PostgreSQL and certainly the most often used tool. To “run” PgAdmin the application starts a local web server process and then triggers open a browser. From an enterprise perspective, it’s honestly pretty laughable to ask developers to do real day-to-day work managing a database through a web application. Sorry, I’m not going to sugar coat it. Too often I feel like it’s one refresh away from loosing my work and I’m reminded every time I use PgAdmin of an incident 15 years ago whenI dropped tables in a client database using phpMyAdmin because of a refresh bug which reset the context in the database. And all of these years later, I lost work three days into the new job because something caused things to refresh and everything was reset.

To be fair, PgAdmin is a pretty remarkable undertaking and has some useful features. Three quick highlights of basic things PgAdmin provides which I’d expect from any tool include:

  • Easy hierarchical navigation that actually shows you all of the major database object types in context (see all triggers/indexes/columns in context of the table).
  • Quickly displaying the DDL of objects. For instance, it’s the only tool that can easily show me the DDL of a Trigger after it is created, including the predicate.
  • A visual query planner of some sort. Yes, I know about EXPLAIN ANALYZE, but honestly, I want to focus on making queries better, not remembering how to decipher the text plan each time.

So yes, there is a surprisingly good set of features given the delivery method, but it’s just not enterprise ready by any stretch. One potentially bright spot is that a co-worker recently mentioned that there was rumor of some investment capital in the PgAdmin project in some way. Maybe there’s hope of something better? Maybe Microsoft will figure out how to allow Azure Data Studio to become part of the mix?

DataGrip

At this point, the best option I’ve found for Windows is JetBrains DataGrip. They happened to be at PASS Summit last year and a technical sales lead stated that their goal is to replace SSMS as the database management tool of choice. (I’m not even paraphrasing!). While that’s a noble goal, they have a long way to go… but it’s still much better than PgAdmin for most tasks.

A couple of “quick win” useful features include:

  • There are some handy features in DataGrip for viewing and filtering data with an easy double-click of the table.
  • Multiple-tab support is fairly good, however, seeing the settings for the connection on our current tab isn’t as obvious (again making me fear I’ll run something in the wrong database)
  • Managing backups is easier, overall, than PgAdmin or the command-line

One thing that I think is holding me back from using DataGrip more effectively is the lack of good training or documentation. I remember sitting at the JetBrains booth at PASS Summit thinking “those are some interesting features…”, but I can only remember one of them at this point. The only training video I found is a few years old and is essentially a 10-minute technical sales demo. If there were more of those, maybe I’d have a different opinion.

DataGrip is cross-platform, although I haven’t tried the Mac version yet. If you’re on a Mac, Brent Ozar recommended Postico when I asked him what tools his team uses to manage and work with their Aurora instance.

Navicat for PostgreSQL

I also gave Navicat a try for their 14-day trial. In the end, there wasn’t enough value to justify spending another $300 after I had already purchased DataGrip.

One thing I will say is that it is much faster and (seemingly) lightweight in a good way. DataGrip seems to run up against a JVM memory issue often whenever I try to do larger tasks (like run a backup). I didn’t experience any of that slowness when testing Navicat. Unfortunately, there wasn’t enough overwhelming wins to make that one issue a deal-breaker.

Why Hide the Schema?

One particularly annoying thing about both DataGrip and Navicat is that neither of them show all of the major schema objects within the navigation tree… and each shows a different set of objects.

DataGrip shows most of the major schema object within the tree under each table as I’m used to. Unfortunately, interacting with the objects for editing or inspection never works out quite how you’d expect and it doesn’t consistently show the true DDL of the objects.

Navicat? Every object has its own section. What indexes do I have on table ABC? Yeah, you have to go to the “index” module and hope you named your indexes well because they aren’t used in context of the tree.

In total, this just feels like it strengthens my opening monologue about the usability and accessibility a tool provides

A brighter future?

My one hope for improving the tooling is that Microsoft just hired the maintainer/owner of the .Net PostgreSQL driver, Shay Rojansky. While I know that doesn’t translate directly, I can only hope it means they’re thinking about how to improve the tooling to go along with the programming support.

What about you? Any tooling choices that have been a net positive in your transition to PostgreSQL?

PostgreSQL for a SQL Server DBA: A Series

I recently switched jobs, leaving a nearly 15-year career working with SQL Server. Although I have been using it that long, it has only been in the last 4 years that I finally realized how much I didn’t know about my favorite data technology stack, taking every opportunity to dig deeper and get more connected to the #SqlFamily.

Unfortunately, two months before joining KCF Technologies they transitioned to PostgreSQL because of the Aurora database offering from Amazon Web Services. The sales pitch behind Aurora is great and it has, in fact, saved them a ton of money. And at this point the database backing the application isn’t the complicated part of the application, but rather the the millions of real-time data points that stream into the service every minute (using a separate time-series database).

That said, it has been quite a transition trying to re-learn (or unlearn as the case may be) the PostgreSQL way of doing things and actually missing the huge economy of tooling that has grown up around the SQL Server community… tooling that’s largely missing in the Opensource DB world. Don’t get me wrong, I had some similar feelings 15 years ago when I started using SQL Server after spending a few years with PostgreSQL (of all things)… but this has certainly felt more challenging.

As I’m doing the learning, spending hours on Stackoverflow and a handful of other sites trying to get my bearings, I thought it would be good to write some of it down for my future-self and others that have the need to cross over at some level. PostgreSQL isn’t going away as an alternative, and both AWS and Microsoft have provided “one-click” startup options to use it.

There’s no specific agenda, but the first several posts will certainly share many of my frustrations trying to make the leap back(wards). But as I get my mind wrapped around each piece, I promise to share some positive things too. 😉


  1. PostgreSQL for the SQL Server DBA: Triggers
  2. PostgreSQL for the SQL Server DBA: The Tooling
  3. PostgreSQL for the SQL Server DBA: Tooling, Part 2 – It Gets Better

PostgreSQL for a SQL Server DBA: Triggers

This post is part of an impromptu series about PostgreSQL and things I am learning coming from a SQL Server background. A complete list of posts can be found on this post.

While Triggers have gotten a bad rap in many circles and are often discouraged (great read by Joe Celko), they are still a useful tool in a handful of situations. Trying to implement my first Triggers in PostgreSQL has been one of many learning adventures lately. Here are a few key points I want to remember for next time and share with others.

They are implemented as Functions (Stored Procedures)

This caught me off guard at first. I’ve been working with and dealing with triggers in SQL Server since day 1. They are first-class citizens… objects that have their own code blocks and rules.

PostgreSQL approaches it differently. Any reusable code block, regardless of its true purpose is a Function of varying types. Triggers are no different. Therefore, you write the logic of your trigger in a Function and then call it by adding a trigger to the DML event of a table.

CREATE OR REPLACE FUNCTION user_log()
RETURNS TRIGGER AS
  $BODY$
  BEGIN
    INSERT INTO UserLog (....)
  END;

  $BODY$
LANGUAGE plpgsql;

Once the function is available, you “attach” it to the DML event of the table by creating the trigger.

CREATE TRIGGER ti_users
  AFTER INSERT
  ON Users
  FOR EACH ROW
  EXECUTE PROCEDURE user_log();
Triggers can be run per-row

Again, this threw me a little coming from SQL Server which executes triggers on a batch of rows, even if it’s only one row. This also means that when a trigger is the best place to check or modify data in an application, additional work often has to be done to iterate the data inside of the trigger. In PostgreSQL these batch-level triggers are called a statement triggers and don’t provide the same access to the batch of rows that were inserted/modified that SQL Server does with the INSERTED.* and  DELETED.* special tables

With row-level triggers, however, the function you name is called on every row that is modified. Obviously this comes with it’s own caution as well, but in the right circumstances it means that the data is easily accessible because the function is working on one row of data at a time. Row-level triggers also give you access to the row data that you’re used to, but in PostgreSQL you refer to the NEW.* and OLD.* tables.

CREATE OR REPLACE FUNCTION user_log()
RETURNS TRIGGER AS
  $BODY$
  BEGIN
    INSERT INTO UserLog (Username, Message) VALUES (NEW.Username,  NEW.Message)
  END;

  $BODY$
LANGUAGE plpgsql;
Triggers can have predicates!

This is actually one of the few things I like about triggers in PostgreSQL that SQL Server doesn’t have. While it might add confusion, it also makes it very easy to limit when the trigger function is actually called.

In SQL Server, the appropriate trigger is always called when the DML event is performed. We often have to jump through a bunch of hoops (SELECT… EXCEPT) to see if the data we care about has changed before we do some kind of work.

In PostgreSQL it is so much easier. When you create the trigger that attached the function to the table DML, you add the appropriate predicate(s).

CREATE TRIGGER tu_users
AFTER UPDATE
ON Users
FOR EACH ROW
WHEN (OLD.FirstName IS DISTINCT FROM NEW.FirstName)
EXECUTE PROCEDURE user_log();

It’s that simple, and you can add more predicates to the WHEN clause. They just all have to be Boolean in nature.

It’s not all bad

In the end, I’m still missing most of what SQL Server offers in power, tooling and evolving technology. That said, Triggers in PostgreSQL has been one of the bright spots when they’re the right tool for the job.