As you probably know, T-SQL Tuesday is a monthly event where bloggers are invited to write about a set topic. Our host for April 2021 is Steve Jones (b|t) this time, and he has asked us to write about Jupyter notebooks: do we use them, would we want to; and why or why not?
Let’s start with the basics, because I’m not convinced that everyone reading this already knows what Jupyter notebooks are.
Simply put, a notebook is a single file where you can store a combination of blocks with formatted text and blocks with code, where the code can be multiple languages (e.g. powershell, T-SQL, etc). The code in the code blocks is actually executable; the results appear within the notebook, and they can even be included when you save the notebook.
Microsoft has implemented support for notebooks in Azure Data Studio (ADS). And if you’ve watched any online conferences lately, or other events where demo code is shown, you will have noticed that Microsoft staff is pushing the concept of “ADS with notebooks” hard: using them whenever appropriate, and often even when not appropriate at all. Basically an extension of how they have in the past years already been forcefully shoving ADS in front of our eyes whether it was warranted or not.
There are many good use cases for notebooks. Within the Data Platform / SQL Server world where I “live”, I have identified three main use cases.
Perhaps you think after reading this that I am overlooking important use cases. If so, then please let me know, that’s what the comments of this blog are for! (I’ll also use all other T-SQL Tuesday blogs about this topic with interest).
Notebooks as runbooks for production support
Imagine that you are a production DBA. The company you work for has some staff available at night to handle more standard tasks, but when things go wrong that are above their level, you will get the famous 3 AM call and you are expected to come online and fix it.
At one point you realize that there are some tasks that the on-site staff should be able to fix without calling you, if they have proper instructions. Training them will not work; there’s too much churn and the tasks happen too infrequently for the information to stick. You tried documenting things in the past but found that the staff was too prone to mis-typing the query code needed or clicking the wrong buttons.
Notebooks can be the key to solve this. You use the text boxes to explain the conditions and the steps. You include the query in code boxes. The on-site staff merely needs to press the run button, then compare the results to what you write in the next code box and then continue as appropriate. Sure, it will take some time to set up. But once done, the benefit might be a few more uninterrupted nights.
For me personally, this is not an interesting use case. Due to the work I do and the clients I have, this scenario simply doesn’t apply to me.
Notebooks for the data analyst
Based on what I’ve heard, data analysts love notebooks. I won’t try to explain in detail as I’m not a data analyst and I’m pretty sure you’ll find ample good explanations from other bloggers today.
The basic idea is that the notebook can “talk” a user step by step through a data analysis, showing intermediate results as numbers or in graphs. This is possible because notebooks support languages such as Python, that are very powerful for specifically the work of data analysts, and that enable relatively easy ways to plot graphs from the results of running the code. All of which can be done from the notebook. And because the notebook can be put into source control, it’s also easy to use this as a tool in a team, track changes, merge changes, etc etc.
But I am not a data analyst. So for me, this use case is not relevant either.
Notebooks as a presentation tool
And that brings me to the third and final use case for notebooks that I see. The only one that I would actually like to use. As a tool for presentations.
When I present at virtual or (one day in the future) live conferences, I tell a story. My words are the main story; it is supported by slide deck and demo code. In the demo code, the queries and the results shown are key, but I often add comments to make it easier to follow what the demos show even when someone downloads the demo code and runs it without attending the presentation.
Notebooks, due to their mix of code blocks and (formatted) text blocks, make it easier to make those comments stand out more. And perhaps replace some of the slides with explanations or keywords in the text part of the notebook.
I absolutely 100% see how notebooks could be a great tool for me to help make my presentations better.
Notice how I use the words “could be”, and not “are”? Yeah. Let’s get to that.
If you are one of those old-fashioned hardcode Microsoft fans who gets upset at critical words about anything Microsoft does, you might want to stop reading here. The rest of this post is not going to be pretty.
I am a fan of Microsoft too, or at least of their SQL Server / Data Platform activities and products. But I am the type of fan that believes in tough love. And the “love” I’m giving below is really, really, really tough indeed!!
Execution plans in notebooks
When I present, it is often about execution plans. Or about performance, which is impossible to cover adequately without showing execution plans. And even when I talk about other things, I do like to point out performance implications, simply because performance matters. So in short, when I present, expect to see execution plans.
Which brings me to the topic of execution plan support for notebooks in ADS
There is none.
“None?” you ask?
If I wanted to present using notebooks in ADS, I’d still have to copy/paste the code to a regular query window and execute it there to see the execution plan. Not a great experience. Breaks the flow of my talk. And reflects very poorly on the quality of the tool I use, and by extension of the company that made the tool.
Execution plans in Azure Data Studio
What makes the already clumsy experience above even worse is the current state of support for execution plans in ADS. And when I use the words “support for” above, I’m actually being incredibly generous.
The visual representation of execution plans in ADS is ugly (okay, that may be personal), inefficient (much too large icons making it impossible to see relevant parts of a plan in their entirety), incomplete (no icons for many operators), and bugged (in so many ways that I’m not even going to list them here).
Many feedback items have been given, by many users. If you click the link to inspect the list, don’t be misled by the happy “37 closed” mark. Many of those issues were not closed after being fixed, but closed with the comment “Please try SentryOne Plan Explorer extension, which should support this scenario. This extension is our recommended query plan viewer. Thanks!”
Really? Are you bleeping serious? You’re a billion dollar company. You release software that is bugged as hell. And your response to your customers is: “Yeah just install a third party tool because we can’t be bothered enough to fix our own shit, not to negotiate with that third party to get their stuff as standard functionality in our product”?
Oh, and let’s not forget that the SentryOne Plan Explorer extension doesn’t seem to work in recent builds of ADS. After a recent upgrade the extension didn’t work anymore for me, and I have not been able to get it fixed, nor to get anyone interested in even looking at the issue.
Is Azure Data Studio bloatware now?
As of recent, I also have a new issue with ADS. One that is more a matter of principle, although I also have practical issues with it.
Since October 2020, with the 18.7 release of SQL Server Management Studio (SSMS), whenever you install or update SSMS, ADS will be installed or upgraded too. Not as an optional extra. Not through one of those heinous small preselected tick boxes for crappy bloatware that will get installed if you don’t notice it and untick it. No, simply as a fixed addition that you cannot even opt out if no matter how much you want to.
Almost as if whenever you buy a loaf of bread you have to buy furniture as well, except you’re not told that; you only notice when you get home with your bread that your house is redecorated and your credit card is depleted.
So when I wrote in the section above that I broke the Plan Explorer extension in ADS when I installed an upgrade, I actually lied a bit. I didn’t upgrade ADS. I only upgraded SSMS, or rather that’s what I wanted. It was Microsoft that decided to conveniently break a previously working ADS version, because they like it better when I don’t have the ability to tune my queries.
I’m not the only one who is mildly upset by this. Erik Darling blogged about this, and opened a user voice feedback item that already has 321 upvotes, and lots of telling comments. If you want to vote for it, be quick: Microsoft is planning to stop using user voice and migrate to a new feedback platform. If past experience is anything to go by, this probably means that they’ll start with an empty slate, conveniently “forgetting” all past feedback. (Note that I would love to be proven wrong on this prediction!)
Slightly off topic for this post: I think it’s also time to retire the old standard advice to not run SSMS on a server but install it anyway just in case it’s ever your only option. My new advice to my customers is to not install SSMS on servers at all, because based on precedent I have no way of knowing what other crap the installer will put on their production server. Until Microsoft returns to the practice of giving us a build for SSMS that we can trust to only install what it says on the tin, I will treat it as a shady download that might do anything to the server.
Notebooks in ADS do not support execution plans.
ADS itself has a level of “support” for execution plans that is perhaps even worse than simply not showing them at all. And feedback items on these issues are ignored for years, then closed without fixing them, pointing to a third party tool that not even works properly.
Whether intended or not, the message sent by this is that Microsoft doesn’t want you to tune your queries. When you are using a non-Windows machine to do work on an Azure SQL Database, you simply have no way to tune your queries, other than going in the admin panel and moving that performance slider all the way to the right.
To Microsoft’s defence, I don’t think this is the message they intend to send. I regularly speak to many Microsoft people in the Data Platform space and I’m really convinced that they do want customers to be able to tune their workloads. They do want customers to use the slider for more performance (and more money) only when they really need more performance and there is no viable other, cheaper way to get there. But that only makes it sadder that they still do relay that very different message.
For me to start using ADS notebooks when presenting, several things need to change.
First, ADS itself has to get better. It has to be improved to a level where people will want to install it voluntarily. So that Microsoft no longer has to sneak it on our systems when all we want to do is upgrade our SSMS.
Second, execution plan in ADS has to be improved. By a lot. If working with execution plans is better, easier, and more compelling in SSMS than it is in ADS, then why would I ever move out of SSMS?
Third, that same execution plan support has to also be enabled in notebooks.
With those conditions met, we are looking at a fine future. Imagine a notebook that has text with information about the topic I present on. It then shows a query, and I can run it during the presentation to show the execution plan. But that same execution plan is also included in the text, with annotations and remarks about interesting things.
One can dream, right?