No I can!

Allow me to interrupt my normal technical posts to share a personal story.

I can’t sing

Starting in my early youth and continuing all through my adulthood, people have been telling me that I cannot sing. Classmates, colleagues, random people I met. But also people I trust and love.

For example, my mother loves to talk about when my, my brother, and my sister were still young children and we used to have times when the us three and mum and dad would sit together and sing songs. She loves the memory, and one part of his we sang “polyphonic” – five different voices and none on-key because she is the only one who can keep key but not in that company.

And my wife and I have joked, our entire marriage, that we both can’t sing. Both our children turned out to be great singers, resulting in even more jokes (usually in bad taste, but that’s just how our marriage rolls).

And my children, once they grew older, started participating in those jokes. Who can blame them? They’re just children, following the example their parents set. And once they grew a bit older … well, puberty is a bitch, right?

Can’t I sing?

Now I am sure that my mother, my wife, my children, and all those others never meant their words in a bad way.

I am sure that as a child I did really sing those children songs off-key, and that this amused my mother rather than bother her. I am sure my wife means it just as jokingly as I do when we jib about each other’s lack of singing skills. And I am definitely sure that my singing does sound terrible when compared to my children. And those classmates, coworkers, and random strangers probably just wanted some quiet so they could concentrate.

But all these people, unknowingly and unintended, sent me a message. They didn’t say that I was singing off key, or that I was too loud. They said that I cannot sing. And here’s the thing. When enough people tell you something, you believe it.

So I stopped singing in public, or even around my family. And eventually I even stopped singing when I was alone, in the car, in the shower, or in any other location. And I started to mock myself to others. Saying that a theatre would only hire me to be a better fire alarm: if the building needs to be evacuated fast, put me on stage because people will run faster then when they trip the fire alarm.

Can I sing?

As mentioned above, my children are actually excellent singers. My son is trying hard to pursue a professional career in musical theatre, as the ominous “triple threat”. My daughter has chosen a different profession but has been performing in amateur musical plays for most of her life.

Last year both children played together in an amateur production. For reasons that are not relevant here I was present during several rehearsals. And I noticed how much fun this company was. It reminded me of the fun I had acting during my youth, and it awoke an itch that I though was long dead.

So after the final show I asked the director whether she was open to more cast members, and whether she would consider me. I explained that I can’t sing but that I would love to do a role that requires only acting and dancing but no singing. I still remember her reply: “Everyone is welcome, but you will need to go through the same audition process as everyone. And that does include singing. No exceptions”.

I warned her that I would not accept liability for damage to her ears or brain after hearing me sing. And then I started to get nervous. Knowing I can’t sing, I could have chosen to not prepare at all. But I am a perfectionist. Whatever I do, I want to do the best possible job. So I took my singing audition just as serious as anything else. I selected a song. I recorded a karaoke version of it on my MP3 player. And every time I was alone in my car, I would play the song and sing along. To learn the words. To learn the rhythm. And to learn the pitch.

I asked my children to give me feedback, and I kept practicing.

And though I had no idea whether or not I improved, and whether or not the crows would drop dead when hearing me, I did notice one thing: I loved doing this.

My entire adult life, more than 30 years already, I had forbidden myself to sing because others told me I can’t sing. I did not just stop singing in their presence, I stopped singing completely. And I missed out on 30 years of the fun of singing – even if it’s just during my commute.

I can sing!

Then came the time for the audition. I already knew that the entire group, including the director and the vocal coach, are very relaxed. They also did a great job of helping auditionees control their nerves.

We did some group activities, we played a short scene that we had prepared … and then came the time for the singing auditions. I did my song to the best of my capabilities, and then went home. Hoping that I had showed sufficient acting skills to land a nice role, and fully expecting to be kept out of any singing activity.

Boy, was I surprised when the script and the roles were distributed. Not only was I given a much bigger role than I, as a new member of the group, would have expected. No, I was also expected to sing along in the ensemble songs. And I was even given a solo. Admittedly, a short solo. But still a solo.

I am still pretty sure that my audition song must have been terrible. But the vocal coach must have heard potential. She must have realized that she could teach me. (Or perhaps she is just very masochistic and looking for huge challenges?)

Through the year, we worked on all parts of the play, including the songs. I also found some time for a few extra private singing lessons with the vocal coach.

The result? I now feel confident in my singing skills. And while I am aware that I am not good enough of a singer to ever hope to make a career out of it, I no longer feel that I should never sing in company of others. (I’ll still shut up if you ask me to give you some quiet though).

Oh, and tonight I will, for the first time in my life, sing on stage. We open tonight, and then we have three more shows on Saturday evening, Sunday afternoon, and Sunday evening. I will act, dance, sing in ensembles, and sing my solo. And I am pretty confident that I will sing in at least acceptable quality. (Well, except perhaps that one insanely high note in the song where I have been moved from the low voice to the high voice just four weeks ago).

The morale

Sometimes people tell you that something is not possible. And usually they have good reason for it. But that does not mean that you can and should never do it.

If something is your dream, then do not let other people’s opinion hold you back. Of course, this does not mean that everything you want to do is possible. No matter how drunk you are, you will never be able to fly off a 20 stock building. No matter how long I train, I will never compete in the Olympics. Sometimes, when people tell you that you can’t do something, it’s wise to listen.

But don’t listen blindly. If you really want something, then wait until you are sober, make an honest assessment of what is and what is not a reasonable target to shoot for … and then just go for it!

Oh, and then there’s the other part of the story. Think about what you tell others. I am sure nobody had the intention of totally discouraging me from singing. But they did. Based on this experience, I am now trying to choose my words to others with more care.

But … can I really sing?

You might now wonder whether I have actually learned to become a good singer … or whether I am just delusional and surrounding by very polite people.

If you happen to be in the neighborhood, then there is an easy way to find out. There still are some tickets available! (The site is in Dutch, but then … so is the show…)

Removing multiple patterns from a string

Recently one of my clients had a request that was a challenge to do effectively. I don’t think it’s a very common requirement but I still decided to blog about it. Who knows, maybe one of you will actually benefit.

Pattern removal

Within a existing query, one of the columns in the result set holds string data that includes sequences and patterns that need to be removed in the output. Most of those patterns can occur more than once, and then all of them need to be removed. Some of these patterns overlap with each other and then the longest one needs to be removed. One of the patterns can only occur at the end of the string, the rest can occur anywhere.

The actual use case and the list of patterns that I had to remove are considered a confidential competitive advantage by my client, so I will just make up a list here to illustrate the problem. In this fake requirement, the following patterns must all be removed if anywhere in the string: “-@@%”, “@@%”, “-@%”, “@%”, “No.X#X”, and “^^^”. Also, “@X” needs to be removed when at the end of the string. In these patterns, @ is a placeholder for a numeric symbol, X for an alphabetic symbol, and # for an alphabetic or numeric symbol. All other symbols are to be taken literally. (So “%” is the percent mark, not the LIKE pattern for any character!).

Challenges

When looking at the requirements, there were a few solutions that I briefly considered and then rejected.

  • I had previously done substring removal with REPLACE, which has the benefit that it will automatically replace all occurrences it finds – but it is limited to finding fixed strings; it does not support pattern recognition.
  • I though about doing a query with a PATINDEX expression and a join to a numbers table to find all matching patterns. This worked fine, until I started testing with a string that included two copies of the same pattern. With PATINDEX I was only able to find the first matching pattern, because it does not support a parameter to specify the start position for the search.
  • Conversely, CHARINDEX does support a starting position – but is once more limited to finding fixed stings, and does not support pattern matching.
  • A fairly straightforward and obvious choice would have been to create a user-defined function (either a T-SQL user-defined function or even a CLR user-defined function). The code for such a function would be quite simple. However, the pattern removal had to be done on the result of a query, and adding a user-defined function in a query wrecks performance in more ways than you can count. So this was not an option for me.

I realized that I had to think “out of the box” for this one. The final solution I crafted effectively breaks each input string into its individual characters, then uses a WHERE clause to filter out each character that is part of at least one of the given patterns, and then concatenates together all the characters that remain.

Here is the query I used (along with a simple test frame). In the paragraphs below I will walk through and explain the key elements.

Specifying the patterns

I assume that the CTE where the patterns are defined needs no explanation. Note that for matching only the character % I use [%] instead of relying on an escape character. Both options are valid.

For one of my earlier attempts I had to ensure that for potentially overlapping patterns (e.g. @% and @@%, or @% and -@%) the longest pattern comes first. However, my final solution no longer relied on this order so patterns can be specified in any order.

You will notice that I specify a pattern length as well. This is required in the main query. It is probably possible to derive this by parsing the pattern, but that would just add complexity for no gain. It was easy enough for me to count the length of each pattern.

Splitting the string

The main query starts by joining the input data to a table of numbers. Each row in the result set represents a single character from the original string. You can easily verify this by replacing t.PartNo in the above fragment with a string constant.

Finding the patterns

The fragment above is where I test whether the current character of the string is included in any of the specified patterns. I use PATINDEX for that test, which returns either 0 if no match is found, or the starting position otherwise. However, as mentioned before, PATINDEX does not accept a starting position so there is no simple way to find a second occurrence of the same pattern.

To work around that, I use STUFF to remove all characters that precede the first character that we care about for this specific pattern. So for example, to test for pattern “-@@%” at position 9 in the string ‘AB345%6%-44%^^^^No.XXXNo.W8RNo.8R8D’, I remove the first 5 positions so that I search the pattern in the remainder of the string: ‘%6%-44%^^^^No.XXXNo.W8RNo.8R8D’. This will then match the location of “-44%” which was the second occurrence of this pattern in the original string, but is the first occurrence of it after stripping the first 5 characters.

How much to strip?

The STUFF function used above takes four arguments: input string, starting and ending position of substring to replace, and replacement substring. The latter is empty because I just want to remove part of the string. The starting position is 1 because I want to remove a substring starting at the first character. But what about the ending position? How much to remove?

When I am evaluating position 9 in the input string and I want to know if that position is part of the pattern “-@@%”, then I need to take into account that the pattern has 4 characters. So character 9 can be the first, second, third, or fourth character, if the pattern starts at position 9, 8, 7, and 6 respectively. The left-most starting position is character 6. This means I can remove characters 1 through 5. Or rather, 1 through (current position minus pattern length).

However, at the start of the string that expression results in negative numbers, and STUFF throws an error if the end position is negative. So I use a CASE expression to ensure that all these negative numbers are replaced by zero. STUFF does not mind starting position 1 and ending position 0, that simply does nothing – which is exactly what I want when testing patterns at the start of the string. When looking for pattern “-@@%” at position 3, I simply want to start searching at the left-most position.

Recognize a match

If PATINDEX finds no match at all, then it returns zero. If it does find a match, then it returns the starting position of that match (which is the first in case of multiple matches). So how do we use this to determine whether the position we are now looking at is a match?

Let’s return to the previous example where I look for pattern “-@@%” in position 9 of the input string. We have already removed the first 5 characters, so the original position 9 is now position 4 in the remainder of the string.

This means that is a match is found that starts at positions 1, 2, 3, or 4, then this character is part of it. If a match is found at position 5 or later, then there is a pattern match somewhere to the right of the original position 9, but position 9 itself is not part of it. And if no match is found, zero is returned.

Bottom line: the current position is a match if, after removing the irrelevant left-hand part of the string, a match is found starting at a position between 1 and the length of the pattern.

Exception

The above method results in false positives at the start of the string. For a pattern of length 4, no left-hand part is stripped when looking at positions 1 through 4; for each of these the PATINDEX is applied to the original string. Now let’s say that a matching pattern starts in position 3. That means PATINDEX returns 3 for each of these 4 positions, but only positions 3 and 4 are actually part of the pattern.

This is the reason why we need the CASE expression above. For positions 5 and beyond, it returns 4 (the pattern length). But for positions 1 to 4, it returns the position itself. Which is correct in this case. Position 2 should be excluded if the pattern is found to start at position 1 or 2, but not at 3 or 4.

Putting it back together

The final step in the logic is to concatenate all the characters that were not removed due to being included in a pattern together, in their original order. That is what the above fragment does. I am not going to say a whole lot about it – this method of string concatenation is fairly well-known.

Do note that, because this “abuses” XML, you can get some weird results if special XML characters are used. For example, spaces, ampersand characters, or less-than signs, etc. If these are in your data, then you can ensure that they are not changed by using the TYPE attribute on the FOR XML clause, which returns the data as XML; you will then also need to CAST the final result of this subquery back to an appropriate string data type.

(I use varchar(50) above because that is the data type of the input, and removing patterns can never increase the length).

Conclusion

There’s a lot of complex stuff going on in this query, and if I would encounter this query in code I inherit without any comments or explanation I would not be happy. But it does get the task done, in a set-based way and with very acceptable performance.

As uncommon as the requirement was to me, there is always a chance that someone else one day has to do something similar. If they do, then I hope that they’ll find this blog post so that they don’t need to spend any further time to find a solution!

T-SQL Tuesday #111: Why, tell me why

Time flies. It feels like the new year has just started, and yet we’re already at the second T-SQL Tuesday of the year. Our host this February is Andy Leonard (b|t), and his assigned topic is, simply: Why?

Or rather: What is your why? What motivates you, what makes you tick? And most of all: What makes you do the things you do?

The Execution Plan Reference

I do a lot of things. I have a one-person consultancy business, I have a wife and kids, I have a personal life with several hobbies, and I could talk about my motivation for all those choices, and more. But I decided to focus this post on one thing, and one thing only: why do I build the Execution Plan Reference?

Researching, verifying, and writing down all the information in the Execution Plan Reference takes time. A lot of time. And I get nothing in return. I am not paid by anyone for the time I work on this. The site doesn’t run ads, so I get nothing when people use it. In fact, I even have to pay the hosting cost out of my own pocket. It’s easy to understand why people would want to know what drives me to do this.

Inspiration

A full answer to any “why” question usually involves more than just a motivation. Yes, a motivation is the reason why you decide to do something once you have the idea. And what makes you continue to do it after the initial enthusiasm wears off. But what is it that gives you the idea in the first place? That can be something as simple as “someone asked me to”, but that is not where I got the idea for the Execution Plan Reference

Training day

The first time I gave a pre-conference session / training day / full-day workshop (or whatever other name you want to give it) on execution plans was back in 2014. I have given it several times since, and I am still submitting it to conferences. Obviously, the content has changed significantly over the past five years. But the main outline has not. I am convinced that people are smart enough that they can read and understand all execution plans, when given the proper knowledge and tools.

There is this well known saying “give a man a fish and he has food for a day; teach a man to fish and he will never be hungry”. My philosophy takes this even one step further. I don’t show people the best fishing spots. I teach people how fish behave, how weather, climate, and other factors affect them, and how that determines where they’ll go – so they will always be able to work out the best fishing spots, even when conditions change.

When I teach about execution plans, I spend only a little time on execution plans as a whole. Most of the day is then spent on individual operators. Because that is key to understanding execution plans. If you don’t understand what an operator does, how it behaves, you’ll never fully understand the execution plans it appears in. And conversely, if you understand the behavior of all operators in an execution plan, you should always be able to work out how the execution plan as whole works. And assuming you’re looking at the execution plan of a slow query (which for most people is the only reason to ever look at an execution plan), that should then also explain why it is slow.

Pluralsight

For those who never heard of Pluralsight, this is a company that sells access to online training videos. I have created a course on relational database design for them. And after the first few deliveries of my full-day training, I started talking to Pluralsight to see if they would be interested in creating a course on execution plans, based on my training day materials but even deeper and even more complete. There is only so much I can do in a single day, and I was excited by the idea of being able to go even deeper, be even more complete, in an online course.

Pluralsight and I did reach an agreement, and I started work, but due to reasons in my personal life I was unable to meet deadlines and then the contract expired. I later tried to renegotiate, but we never came to a mutually acceptable agreement anymore. I was still in a rough spot in my personal life, so I could not commit to even the most generous deadlines Pluralsight offered. And at the same time, Pluralsight noticed that the market has much more interest in short courses and short clips, then in the long and in-depth material I wanted to created, so their interest in my content waned.

Grant Fritchey

Once I realized that my proposed course would never go to Pluralsight, I did explore some alternatives for a short while, such as starting a YouTube channel – but I quickly realized that I lack the resources and skills to make good enough content to ever make that work. So I started to come to terms that perhaps I would never be able to make my training day material available outside of the limited circle of participants.

But at that time, I was already working in a tech editor role on Grant Fritchey’s excellent book “SQL Server Execution Plans, Third Edition”. And I have lost track of how often I was looking at a code sample and execution plan, reviewing Grant’s description, and then adding a comment that started with “Okay Grant, this is well beyond the scope of what we want in the book but you are going to love this. I noticed that …”

But I realized that Grant is not the only one who loves those details. There are more geeky people out there that love that kind of information. And there are even more people out there that don’t care about it, until they run into an issue and need exactly that information to help them over the bump. The book was not the right place to put that information. But if I kept it only in my head, or only between Grant and me, then the SQL Server community would never get to benefit from the effort I put into discovering all that stuff. I had to find a way to make this knowledge available to the community.

Exactly that moment, one of the many times I was editing Grant’s book and finding stuff that was too deep for the book but still very relevant, is when I got the idea of the Execution Plan Reference. It was not called that initially. I didn’t know yet what form it would take. But I did realize: I have knowledge in my head that not many people have. That information needs to be made available to the SQL Server community. And if I don’t do it, then who will?

That’s where I got the inspiration. Now let’s review the motivation.

Relevance

We all love good performance. We all hate slow queries. So when we see slow queries, we want to fix them. And it’s not just us. End users hate it if they have to wait for their screen to refresh. Managers hate applying for faster hardware. And if you think you can just migrate to the cloud and then adjust the performance slider when stuff is slow, you’ll soon learn that the CFO starts to hate you after receiving the next Azure bill.

There are lots of reasons why people want to tune slow workloads. And there are lots of ways in which people do this tuning. But not all are equally good.

We have all been in the possession of devices that sometimes mysteriously stop working, but magically start working again after a good slam with a hammer. Well, usually. Not always. And if they still refuse after multiple slams, there is no other alternative but to bring them in for repairs. A technician then opens the device, does some measurements and other smart research stuff, and then sees a place where two wires are touching to cause a short-circuit. A good slam would often separate them again, for a time. But this technician would then reattach those wires in their proper place and now they can’t even short-circuit anymore.

Many of the tuning tools and methods I find when I search the web, when I go to conferences, when I read blogs, or even when I look at official Microsoft material, is like slamming the device with a hammer. It works, often, but you have no clue why, and because you don’t really fix the root issue, it might resurface when you least expect it. Execution plans are for us what an ohmmeter is for the technician. Execution plans are the tool we can use to find out exactly what is going wrong. And once we know that, we can fix the root cause in a way that is more future proof then just trying some indexes or doing some random rewrites of our query.

Books Online

Given the relevance of execution plans for understanding and fixing performance issues, one would expect that Microsoft gives this tool a lot of live. And to a part, that is true – recent released of SQL Server have added more and more information to the execution plans, making them even better (lots better!) as a tuning instrument than they were before.

But this love does not extend to documentation. The SQL Server Documentation (previously known as Books Online) is a great resource. It really is. But not when it comes to execution plans.

Books Online has just one single page on execution plan operators. Just one. And it’s not even good. It still uses the pre-2018 versions of the icons. Some of the information there is severely outdated. Some operators are missing. And the worst part, this page even includes a few completely false and misleading statements!

None of that by itself is sufficient motivation to build a “competing” site. There are feedback mechanism that I could have used to try to get this information corrected. But the worst thing, the thing that really bugs me about the execution plan information in Books Online, is that this is the only information they have! There is so much information not included in Books Online at all, that it is simply impossible to understand many execution plans based on this information only.

Other sources

There are of course other sources. There is lots of information available on blogs. Much of it even on official Microsoft blogs from former and current employees. And even more on blogs from other people.

But if something is worth describing, it is worth describing in a simple, easily accessible location. If you are working on a slow query, and you see an operator in the execution plan that you don’t really understand, do you want to have to search the internet, read a dozen or so different blogs and whitepapers, try to filter out the good stuff from the mistakes, and then try to piece everything together? Or do you want all the relevant information at your fingertips?

I believe that Microsoft should have done a much better job of documenting execution plans. But they didn’t. And I can’t change that they don’t. I can, however, change that nobody does. So now I’m doing it.

Conclusion

When I read Andy’s question, two associations popped in my head. The first is a song that (unfortunately) was very popular in the Netherlands when it was released. I heard it way more often than I wanted to on the radio. I have deliberately not included a link because I don’t want anyone to blame me after clicking it. But I’ve had it stuck in my head for almost a week now, so thanks Andy!

But the question also made me think about a story I once heard, about George Mallory, an English mountaineer. In the years 1921 to 1924, he participated in three expeditions to Mount Everest, eventually dying on his third and last attempt. In 1922, when asked why he put all that effort into attempting to climb Mount Everest, his famous answer was: “Because it’s there”.

I am not a mountaineer. I have never liked the cold, I do like some hiking, but in very moderate amounts. And I am by now starting to reach the age where I have to pause for a breather halfway each flight of stairs. So I guess George Mallory and I are as far apart as two people can be. And that extents to motivation. My main reason for building the Execution Plan Reference is the opposite of George’s answer. I am building it, because it’s NOT there.

Tuning queries is important. The only really dependable tool for tuning queries is looking at, and understanding, execution plans. For that understanding, documentation is required. And that documentation is unfortunately not made available by Microsoft.

So why am I building the Execution Plan Reference? Because it’s not there. And because I believe it should.

Plansplaining, part 10. Just passing through

Welcome to part ten of the plansplaining series. Each of these posts takes an execution plan with an interesting pattern, and details exactly how that plan (or pattern) works.

In this post we will look at a query and execution plan that may appear perfectly normal and unexpected at first sight, but that has some perhaps confusing execution counts.

Sample query

The sample query below will (as almost always) run in all versions of the AdventureWorks sample database. It returns a list of all staff, and for people that have a sales-related job title it adds the total of sales they made.

The execution plan for this query (captured on SQL Server 2017) is shown below; as always I added numbers (equal to the NodeID of each operator) to the screenshot, to make it easier to refer to operators within this plan.

At first sight, this execution plan is not very surprising. At the top right, we see a scan on the Employee table that drives a Nested Loops join into a seek on Person – this is a very simple and direct implementation of the join in the query. The result then goes through another Nested Loops, into a branch that reads SalesOrderHeader, aggregates it, and then does a computation – this obviously has to be the subquery within the CASE expression. Finally, Compute Scalar #0 probably does the logical evaluation of the CASE. There, without even looking at a single property I already have a fair idea what this execution plan does.

The index spool

The above explanation doesn’t mention Index Spool #8. The most common cause for Index Spool operators in a plan is when SQL Server wants an index really bad. The SalesOrderHeader table does have an index on the SalesPersonID column but it is not covering for this query, so the optimizer was faced with the choice between either using that index and accepting the cost of a lookup, or using a scan … and then it decided to go for door number three, the Index Spool that basically builds the index on the spot and throws it away when the query is done.

That explanation is indeed correct for this case. But it’s not the whole story as you will see when you read on. But let’s first look at one other oddity in this plan.

The missing operator

When the optimizer assigns each operator a NodeID value, it always works left to right and bottom to top, always starts at zero, and always increments by one. And yet, the execution plan above appears to have a gap. There is no operator with NodeID 3. And that is always a sign that something interesting is going on. It means that an operator was removed from the plan in a final cleanup phase, after the plan had already been chosen.

In this case, the reason for the missing operator can be found in the properties of Nested Loops #2. The missing operator #3 is related to prefetching, an optimization used to improve the performance of physical I/O on the inner (bottom) input of the Nested Loops operator. SQL Server implements two types of prefetching: ordered and unordered. Neither of these result in a fixed order being imposed on the data, however ordered prefetching does guarantee that the order of rows from the upper (top) input is preserved, whereas unordered prefetching might change the order of rows.

The WithOrderedPrefetch and WithUnorderedPrefetch properties of a Nested Loops join only have any effect on the performance if data to be read from the inner input is not yet in the buffer pool. Instead of waiting for the Nested Loops operator to activate its inner input, then request data, and then fetch it into the buffer pool, data flowing into the outer input of the Nested Loops operator is inspected in advance to check whether it will result in physical I/O. If it does, then it will already start an asynchronous I/O request, in the hope that the read will  have completed and the data will already be in the buffer pool by the time it is actually needed.

When SQL Server adds prefetching to a Nested Loops operator, it adds s special “read ahead logic” operator in the execution plan. This operator will be positioned at the outer input of the Nested Loops, so in this case between Nested Loops #2 and Clustered Index #4. This special operator stays there until after the plan selection has been made; at that time a final cleanup phase will fold this special operator into the Nested Loops operator, leaving a gap in the otherwise consecutive NodeID numbering. So that’s why there is no operator #3 in this execution plan.

If you want to learn more about prefetching, I recommend reading Craig Freedman’s blog post, Paul White’s blog post, and Fabiano Amorim’s article. The rest of this blog post will focus on other areas of this execution plan.

How many executions?

If you look at all the various counters in the execution plan, you might see some unexpected numbers. The screenshot to the right shows the data flowing out of Nested Loops #2 into the outer input of Nested Loops #1. The Actual Number of Rows is 290. So naturally, you would expect the operators on the inner input of Nested Loops #1 to execute a total of 290 times, right?

Turns out … you’d be wrong!

If you look at the far right, at Clustered Index Scan #9, you will see that the Number of Executions is 1, and that this matches the Estimated Number of Executions. This part is not the surprise. Clustered Index Scan #9 is called by Index Spool #8, an eager spool. We know that this spool on its first execution will fetch all data from Clustered Index Scan #9, store it in a worktable with an index to support its Seek Predicate, and then on all later executions it will fetch data from that worktable without invoking Clustered Index Scan #9 again. So the single execution of this operator is no surprise.

But let’s now look at Index Spool #8 itself. As you can see in the screenshot to the right, this operator was estimated to execute 278.482 times, which matches the Estimated Number of Rows in the outer input of Nested Loops #1. But … the actual execution count of this Index Spool is not the expected 290; we only see a mere 18 executions! What happened to the other executions?

The first step is a quick reality check. Yes, I did not look in the wrong place, Nested Loops #1 really receives 290 rows on its outer input. It also returns 290 rows to its parent (Compute Scalar #0), and I can see that the final result set of the query has the full 290 rows.

The next step is to look where the number changed. The properties of Stream Aggregate #7 show the same numbers for estimated and actual number of executions; the properties of Compute Scalar #6 also show the same estimate, and include no actual count at all (not unusual for a Compute Scalar; an in-depth explanation of this curious behavior is properly better left for a future post). For now, it appears that Nested Loops #1 is the root cause. Even though it processes and returns 290 rows, it only actually executes its inner input 18 times. Let’s look in a bit more detail.

Nested Loops #1

Hovering the mouse over Nested Loops #1 shows a property that is not often seen on Nested Loops operators. In fact, many people have never seen this property and are unaware of its existence. It is the Pass Through property, a property I have so far only seen on Nested Loops operators (and only for Logical Operation Inner Join or a Left Outer Join).

When the Pass Through property is present, the Nested Loops will for each row it reads on the outer input first evaluate the condition listed in this property. If the condition evaluates to true, then it will, as the name of the property implies, just pass its input row on, unchanged, to the parent operator. On the next call, it will then immediately move on to the next row on its outer input.

So simplified, you could say that Pass Through means: if this condition is true, then you can just skip all the join logic for this row and pass it through, unchanged.

For the columns in the Output List that take their values from the inner input, no value can be given if the join was skipped due to the Pass Through property. I don’t know if they are always set to NULL in that case or if they are simply left at whatever they are and the rest of the plan makes sure to never look at that data.

In this case, the Pass Through property uses a function called “IsFalseOrNull”. This value can only be used internally within execution plans (I personally would not mind having it available in T-SQL as well, though!). It is intended to help in dealing with three-valued logic. The net effect of the entire condition is to check whether the JobTitle column includes the word “Sales”. If so, then that condition is true, which means that IsFalseOrNull returns False, which in turn means that the pass through logic is not invoked and the inner input of the Nested Loops operator executes as normal. However, when JobTitle does not include the word Sales, or when it is NULL, then the LIKE condition is either False or Unknown, and then IsFalseOrNull evaluates to True, which means that the pass through condition kicks in and the inner input is skipped.

If you look at the data produced by joining the Employee and the Person tables, you will notice that there are 290 rows total, but only 18 have a JobTitle that includes the word Sales. The Pass Through condition kicked in on the remaining 272 rows, and that is why the operators on the inner input of Nested Loops #1 executed only 18 times instead of the expected 290 executions.

We all love better performance!

The Pass Through expression in this execution plan is clearly intended to improve performance. And at first sight that is exactly what it does. If you remove this property and leave the rest of the plan unchanged, then Compute Scalar #6, Stream Aggregate #7, and Index Spool #8 will all execute 272 additional times. And each of those times they will produce a result that then subsequently is ignored – because of the CASE expression in the query, Compute Scalar #0 only actually does something with the value returned from these operators if “Sales” is included somewhere in the JobTitle, which is only the case for the 18 rows where the Pass Through property did not prevent execution of this branch.

And obviously, the reason that the optimizer even included this Pass Through property is that it, too, understood the significance of the CASE expression. The optimizer knew that the value returned by this branch will only be used in some cases, and hence needs not be computed at all in other cases. (What I do not understand, though, is why the Estimated Number of Executions for the operators in this branch do not reflect that understanding…)

But you might want to ask why the CASE expression is even there at all? It is of course possible that it is needed for correctness. Perhaps the application does sometimes allow a sale to be registered to an employee who does not have “Sales” as part of their job title. If we definitely do NOT want to see those sales included in the report, then the query definitely has to be written as it currently is. End of discussion.

But is it?

However, a query such as this is also very often the result of a developer looking at the query, looking at the data, looking at the business rules, and then deciding to help the optimizer a bit. In the case of the AdventureWorks sample database, all employees that have any sales to their name actually do have “Sales” in their job title. So for this database, the query could also have been written as this, and it would return the same result.

Perhaps the query started like this. And then, perhaps, there was a developer looking at this query and deciding “You know what, SQL Server is going to tally up SalesOrderHeader for all staff members, even though most of them are not in sales and will therefor not have any sales in there. I’ll just step in and add a CASE expression to help SQL Server skip the executions for all those non-sales folk!”

Now before I go on, I am actually happy with these developers. I really prefer developers that give thought to performance over those that don’t care. However, in this case the developer was wrong. Good intentions, but bad results. And when I find the original query in my production code, I’ll find the developer, compliment them on their attention to detail, and then remind them that they should definitely also test.

This is the execution plan I got when running the version of the query without the CASE expression. The logic to join Employee and Person is unchanged, but now the logic to compute the total sold for each employee in a sales role has changed. Instead of a Nested Loops with multiple passes. We now get a Merge Join with a single pass, and with early aggregation on a Hash Match. And the Index Spool has been completely removed.

I won’t go into the details of this execution plan. But if you compare the two versions of the query by running them with SET STATISTICS IO ON and SET STATISTICS TIME ON, you will see that both queries have the same amount of I/O on the base tables, but the first query also has a lot of logical reads on a worktable, due to the Index Spool. The second query does have both a worktable and a workfile – these are there in case the Sort or the Hash Match operators need to spill. But neither does spill, so there is no actual I/O on those.

Mainly as a result of all the additional I/O on that worktable, the original query (with a CASE expression in the query, and with a Pass Through expression and an Index Spool operator in the execution plan) also takes more time – 114 ms vs 46 ms on my laptop.

However, do not forget that all this bases on one very important assumption: that given the way the application works, we can be sure that the two versions of the query always return the same data. Based on the query alone that is not guaranteed, so unless the application gives us additional guarantees the two queries are not equal, and you should always opt for the one that returns correct results, regardless of performance.

Conclusion

The main topic of this post is the Pass Through property. It is not a very widely known property, but it can be important to understand how an execution plan really works. After all, you would not want to invest a lot of effort to tune a branch of an execution plan only to find out later that it only executes for a fraction of the rows.

However, we also saw an example of what people might call “premature optimization”: a change to a query intended to help the optimizer but insufficiently verified as to whether it really meets that intended goal. Query optimization is complex, and changes that appear to be beneficial for performance do not always result in an execution plan that is always faster. And it is also very important to realize that in this case the two queries were only the same based on the data currently in the database; unless there are constraints in place (preferably in the database, but sometimes enforced by the application) to ensure that this is always the case, the two queries might return different results and only one of them should be considered correct.

Finally, we had a short discussion on Nested Loops prefetching and how that results in a gap in the NodeID numbers of the operators in an execution plan.

That concludes part 10 of the plansplaining series. I still have a few ideas left on my list for future episodes, but it is growing shorter – so please, help me out if you can! If you see an unusual and interesting pattern in an execution plan, let me know so I can add it to my list!

Tags:

SQL injection

One of the many sites where I occasionally answer database-related (and sometimes other) questions is Quora – also known as “yet another question and answer site because we all know that what the world really needs is yet another question and answer site”.

On that site (and probably many others), some topics just keep resurfacing. One of those “evergreen” topics is SQL injection. The questions on this topic are not all the same, but they all do fall into a few broad categories. So instead of repeating the same reply over and over again, I have decided to write a blog that answers all commonly repeated questions on SQL injection, so I can then in the future link to it.

It is not really within the narrow target focus of SQL Server performance and SQL Server execution plans that I usually have, but I guess it’s safe to say that SQL injection is important enough that the topic is valid for everyone who works with SQL databases in any way.

SQL injection explained

The first question that I often see is “how would you explain SQL injection?” – often followed by a requirement such as “in non-technical terms”, “in a way that my grandmother would understand”, or “to a twelve-year-old”. That’s actually a good requirement to add, because that should result in an explanation that everyone understands – even the highly technical developers that are the most important target audience.

So here is my attempt to provide the ultimate explanation of SQL injection, in simple, non-technical terms to a twelve-year-old grandmother.

Setting the scene

Imagine I am setting up staff for a tax agency. I have decided to hire cheap, unskilled workers where possible. I found several people that turn out to be excellent at following procedures and instructions to the letter, and I have instructed them such that, despite their lack of experience and insight, they’ll get the job done. An important part of the job is to ensure that workers in other departments get the information they need from the archive, but (since tax and income data is sensitive) will not get access to data they are not authorized for. I have taken care to ensure that sensitive data is protected.

John works the front desk. When a worker needs information, they need to go through John. I have taught John how he can verify whether someone who makes a request is authorized for that type of request. If they are, then John will take a special form from the locked drawer in his desk, write out the request, and pass it to Angie.

Angie works in the archive room. She can access all data stored there. But she is instructed to never touch anything, never look at anything, never do anything … except for requests on the special form that only John has access to. Those requests need to be executed as instructed.

Obviously, I have also taken appropriate measures to ensure that nobody except John can get access to the special forms, and that nobody except Angie can enter the archive room.

The happy flow

Henry needs income information for a tax payer he is auditing. He approaches John to request a copy of their last three income statements. John checks to see that Henry is allowed access to income statements, then asks Henry to provide tax payer number, and last name for verification purposes. Henry says “tax payer number is: ‘1234’. Last name is: ‘Brown’”. John, following his instructions, grabs a form and writes “Angie, please verify that tax payer 1234 has last name Brown. If so, then get their last three income statements from the archive and send them over”. He sends that form to the archive and a few minutes later Angie shows up and hands Henry the requested information.

Next, Shirley approaches John’s desk. She needs to check whether a tax payer has moved out of the city. This is data she has access to, so John again asks for tax payer number and name, then grabs a form and writes “Angie, please verify that tax payer 5678 has last name White. If so, then look up the current address of residence and tell us the city”. When Shirley then proceeds to ask for the past income statements of this same tax payer, John politely reminds her that she is not authorized for that information, and refuses to fill a form.

The hacker

Shirley is not happy. She really wants the income information of Mr. White. And she is smart, so after observing how John works and realizing he is not the sharpest tool in the shed, she decides to try her luck. So she approaches John with one more request. “Okay John, I have one more request for residence information”. John smiles, confirming that she does have access to this data. “Tax payer number and name, please?”. Okay, the tax payer number is: ‘9012’. And the name, sorry, it’s a bit long but please write it down exactly as I put it, it: ‘Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.’”

John follows his instructions. He grabs a form and writes on it: “Angie, please verify that tax payer 9012 has last name Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.. If so, then look up the current address of residence and tell us the city”.

Angie receives the form. It is the official form so she will do what it says. She checks whether the last name of 9012 is Williams, but since she is not instructed to care about the result of that comparison she then shrugs and moves on. The next step is to get some income statements and send them over, cool, she can do that. And then there is more but since the note says to ignore that part, she doesn’t care. And so, Shirley suddenly was able to get access to data she has no clearance for.

Oh, and if she is truly bad, then after seeing that this works her next step might be to get data for all tax payers in the city. Or to get her own due payments waived, and to enter tax returns for all her friends and family. Or even to just unlock the door to the archive room so Shirley can personally enter and then do as she pleases. John and Angie will continue to follow the instructions I have given them.

I thought I had a secure and safe system, but Shirley found a loophole.

How to prevent SQL injection

Another question I see often is: what can I do to prevent SQL injections. (Or variations, usually of the type “I have implemented countermeasure X, am I now safe?”).

To find out the best way to prevent SQL injection, we first need to look at the root cause.

Root cause

The root cause of all this is my decision to use a simple form with a large text box where John has to write instructions (from a predefined set of possible tasks), and with data coming from outside sources inserted in the middle of those instructions. This allows Shirley to abuse the system, by providing John with data that looks like instructions to Angie.

In technical terms

Translating this back to technical terms, the root cause of actual SQL injection vulnerabilities is a front-end program (often, but not always, a web application) that allows the user to enter data, forms a query by concatenating that data into pre-made query strings, and sends the result to the backend database server.

The fix

Once you know the root cause, the fix becomes rather obvious. I need to find a way so that whatever Shirley tells John, once the information reaches Angie she cannot mistake it for instructions. The idea to accomplish that is quite simple (but unfortunately a lot of work). Instead of giving John one stack of special forms, I design several special forms. One for each type of request that can come in. He gets a stack of each. On those forms, the instructions for Angie are pre-printed, and there are some empty spaces, marked by a black border and a light yellow background color. These boxes are where John needs to write the specific data for each request.

Angie is instructed to always first verify that forms she receives are not modified. Extra data for each request should be written inside the yellow boxes. Any writing or other modification outside of those boxes voids the form, and she’ll ignore it.

Now when Shirley tries her devious scheme, John takes the form for “city of residence request” and starts writing in very small letters to squeeze the entire name as given by Shirley in the box. Angie then receives a form that reads (preprinted): “Angie, please verify that the tax payer with the number in the first box has the last name in the second box. If so, then look up the current address of residence and tell us the city”. The first box contains the value 9012; and the second box (she needs to squint to make it out) reads “Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.”

A few minutes later, Angie calls John and says “sorry, there is no tax payer with number 9012 and last name “Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.”, so I cannot give you any information.

Shirley’s plan backfired. We are safe.

In technical terms – parameterization

Let’s again translate this to technical terms. The application program does not use preprinted forms where John writes stuff. Most application programs have several branches where, depending on what the user needs to do, a query is sent to the database to get the needed information, or make the required modification. And as stated above, the vulnerable code would simply concatenate the user-supplied information in that string. So the application code probably has code to concatenate the hardcoded “WHERE UserName = ‘”, the data entered in the UserName input field, and “’ AND Active = 1;”.

To prevent SQL injection, the application developer has to do some extra work. They need to define what is called a parameter object (the link is to a Microsoft resource related to ADO and SQL Server, but the generic principle applies to and is supported by every decent library and every decent database management system). They can then use that parameter in the query string. So the query string changes to “WHERE UserName = @UserName AND Active = 1;”, and the front-end code then has to add a few lines to define @UserName as a parameter, set its data type, give it a value, and pass it to the database along with the query string.

Limitations of parameterization

Most database management systems allow the use of parameters in specific places only. A parameter is typically used to represent data. It is not used to represent a table or column name. So when I tell people to parameterize their queries, I am often confronted by someone pointing at their code where the user has to input a table name (or part of it), or a column name.

Relational databases (which are the type I work with) always have a strong schema. I think that should apply to the code as well. So any code that can be used on different tables is in my opinion highly suspect, and my first inclination is to refactor the underlying database schema. Most of the time, there are serious design flaws.

But that is not always the case. Sometimes, we need to work with what we have. Or we just need to get it done for now and fix the underlying issues later. In those cases, we sometimes have to make do with a sub-optimal solution.

Cleansing the input – a false sense of security

I already mentioned that a lot of questions related to SQL Injection are of the type “I have implemented countermeasure X, am I now safe?”. And every time I see such a question, the specific countermeasure is what is generally called “input cleansing”.

There are, unfortunately, a lot of websites where input cleansing is suggested as a good way to prevent injection attacks. I say unfortunately because I do not consider them safe. Plus, they can have unwanted side effects. But let’s first explain what I mean.

Back to my tax office example, another way for me to deal with the problem would have been to tell John to simply deny requests if certain characters or words are included in the name supplied in the request. I would then need to make a list of all words that might confuse Angie into thinking that what follows is an instruction. And then ban all of those.

One problem with this is that it is very hard for me to come up with a complete list. If Shirley knows or suspects that I use a list like this, she might be tempted to get her thesaurus and try to find a phrase I overlooked. But also, banning words can be dangerous. For instance, I might have told John that any request containing the letter sequence “and” is dangerous. Soon after that I’ll face angry customers because now suddenly John tells them that Grand Rapids is not a valid city name, so their address change cannot be processed.

In technical terms – blacklists and escaping

The technical equivalent to using a list of banned words is using a blacklist, a list of banned characters or character combinations. Have you ever been at a website that told you that O’Leary is not a valid last name? Or that airport Chicago O’Hare is invalid? This is a side effect of a lazy programmer who bans the single-quote character because that is one of the more common characters in SQL injection attacks.

It does not work. Hackers know other, more concealed ways to get their executable code within the query string. But it sure does work to chase away real existing customers!

Escaping is an alternative method. Escaping means writing the code to look at each character in the input and if it’s a dangerous character then do “something” so that the database server (Angie) knows how to deal with it. This would help to retain the O’Leary family as customers, but you still run the risk of missing a character or character combination that a hacker might abuse.

The better solution – whitelisting

The alternative to blacklisting is whitelisting. This means that there is a specific set of terms that ARE allowed, and nothing else is. This can only be used in very specific cases. In the case where the user has to provide a last name, it is impossible to have the front end program verify this against a whitelist of allowed names. But there is no need for a whitelist in this case, we already have a better solution: parameterize the query.

Whitelisting is an adequate solution in the cases where parameterization does not work. In cases where, due to a design that we cannot or are not allowed to change, the user has to indicate which column needs to be returned. Or which table needs to be queried. In these cases, most databases (as mentioned above) do not allow for parameterization.

But in those cases we also know pretty well what the allowed values are. If a user has to tell the system which table to check, it has to be one of the tables that we created in the system. So when they enter the table name, we can verify the entry against a list of all valid table names and reject their request if it doesn’t match. (Or take it one step further and build a dropdown so they can select the table without typing).

Do I really need to care?

The third category of questions I see on the topic of SQL injection is quite broad, ranging from “how common are vulnerabilities and/or attacks”, “is this really such a bug issues”, all the way to “how is it possible that this is still an issue”?

Is this really such a big issue?

SQL injection is not new. The issue has been known for at least two decades now. The risks are known, and so are the countermeasures.

Surely, most websites and applications are safe by now, right?

Well, I can write a long reply here. Or I can just leave this link that I got from Erik Darling: a list of companies that had their data compromised because they were vulnerable to SQL injection. If you look over the list, you will see big names. Both commercial companies and government. And you’ll see huge amounts of leaked data.

So yes, this is still a big issue.

Why is unsafe code still being written?

So you followed the link above? And you have now recovered from the shock, and left wondering how this is possible?

Well, I think there are a few reasons for this.

  1. Copy/paste coding. When a developer needs to solve a problem, they find code that solves a similar problem in the existing code, copy it, and modify what is needed for the specific problem. This is not always a bad thing. There often are multiple ways to solve a problem, and code that consistently uses the same approach is easier to maintain than code that uses a different method each time. But that does work from the presumption that the existing code that you copy is fully safe. If the existing code happens to have a vulnerability to SQL injection that has not yet been found, then copying that introduces more vulnerable code.
  2. Bad resources. A lot of code samples found on the internet, and even some code in textbooks and other official courseware, show vulnerable code. Sometimes because the author doesn’t know better (which by the way is no excuse). And sometimes because the author wants to focus on something else and thinks that adding a parameter object would detract from the message. I think that might sometimes be okay, but only if you plaster the sample code with warnings. And that is typically not done, so people will use those code samples as the basis for their code and believe they are following a trustworthy resource.
  3. Lack of education. Some developers lack the basic training. Either their bosses throw them in on the deep end and make them do work they are not trained for, or they have applied for a job with a pimped resume and now need to pretend that they can do what they promised. Either way, they are in way over their head and now need to cope. And yes, there are lots of resources available online. And most of them do give the good advice. But believe it or not, some people still don’t use the web as a learning tool.
  4. And finally, cutting corners. Once you know it, parameterizing queries is not even very hard. But yes, it does require a few extra lines of code. And some developers are just plain lazy; they underestimate the danger and think they can get away with a quick and dirty solution. Or they build a quick and dirty solution for a proof of concept or for a prototype, and then their manager tells them to deploy to production and move on, while ignoring the protests and warnings from the developer.

For most cases where vulnerable code makes it into production, I guess a combination of one or more of the above is to blame.

How can I attack a vulnerable site?

The last category of questions I want to cover is the category of “please help me abuse vulnerable sites”. Questions like what tools can I use, how can I find vulnerable sites, how can I check for vulnerabilities and then use them to my advantage?

Well, sorry, but I’m not going to go there. If someone leaves a window open, they make it easier for you to burgle their homes. But the act of burglary is still illegal. And if I tell you how to squeeze yourself through a narrow open window, I expose myself to a charge of accessory to your burglary.

Hacking someone else’s database is illegal. The exact laws that apply vary between countries, but if you want to be safe and within legal boundaries … just don’t do it! And for that same reason, I am not going to help you with it.

I will, however, try to help those who work hard to prevent their systems from being hacked. Hence this post.

Conclusion

Even one system that is vulnerable to SQL injection is already too many. And unfortunately, the world out there has not one, but lots of systems that are vulnerable.

This is really a shame. SQL injection is not a very complex and weird attack method that is hard to defend against. It is in fact very easy to carry out such an attack. But, and that is the good news, even easier to defend against.

And whereas I list multiple reasons above why systems are still vulnerable to SQL injection, none of these reasons is a good excuse. There is no good excuse. The information is known, for decades already. The countermeasures are very easy to implement.

If you deploy unsafe code, or see unsafe code and fail to change it, they I believe that you are personally, at least partially responsible. Don’t do this. Ever.

Collation change script

Normally when I blog, I spend a lot of time making sure that get as close to perfection as I can. This post is an exception. The problem I am talking about is too far out of my main specialization area that I want to spend more time on it than I already have, yet what I have so far appears to be valuable enough already to warrant posting. Even in it’s current, unfinished and not fully tested state.

The problem

One of my customers called me to look at a problem where they were unable to upgrade a third party application due to some mysterious “collation conflict” error messages during the upgrade process. Those error messages originated from the SQL Server database used as the backend for the third party application. And since I am supposed to be the SQL Server expert, they were all looking at me for a solution.

I won’t bother you with the long process we went through for troubleshooting. Let’s just say that with better service providers, we probably would have found the issue earlier. (In fact, with better service providers it probably would never have happened). So let me jump straight to the root cause.

About two years ago, they had moved the database backend for this application from their old physical hardware to hosted hardware (not true cloud, a virtual cloud from a hosting provider). For weird reasons they had not asked me to set up the SQL Server instance, but instead asked the hosting provider. Who apparently felt confident that just clicking next should be good enough. They installed SQL Server, restored a backup of the old database, and called it a day.

Problem is: the new instance was set up with a default collation that was different from the default collation of the old instance. And hence different from the database default collation. And when that resulted in errors, they responded by changing the default collation. Now everything ran and they were happy.

Fast forward two years. The database now has lots of “old” tables that use the original default collation for all string columns. But also lots of “new” tables that have their string columns using the new default collation. And the upgrade failed because somewhere in the new code is a comparison between two character columns that now have incompatible collation.

The fix

I contacted the vendor to find out what collations are recommended and/or supported. I rebuilt the master database to set the default instance collation to the recommended collation, then switched the database default collation to match it. But I had one problem remaining: thousands of columns defined with the wrong collation.

Finding the columns with incorrect collations is easy enough. Scripting the ALTER TABLE statements to change their collation is also not a really big problem. But to change the collation, one has to first drop all indexes that contain the column, and drop all foreign key constraints that reference the column. That’s where it became challenging.

So what does an IT person do when faced with a challenge? Yes: they turn to Google. I found several sites with various tips and scripts. The one that looked most promising was found at StackOverflow. The script posted by Philip C appeared to do everything I needed, and probably even more.

But we all know the golden rule of internet: never trust anything or anyone blindly. So obviously I did not just run this on production. I restored a copy of the database to a test server, so I could experiment with the script. And boy, was I glad I did. There were various issues with the script. I tweaked a bit here, made a small change there … and then realized there was a basic problem that actually required me to do a quite major overhaul. I still was happy to have found the script because it absolutely gave me a head start, but it did end up taking more time than I had hoped for.

The bugs

The first bug I fixed was not actually a real bug. It was a suboptimal choice. The script generates lots of SQL statements in a temporary table, then at the end sends them all to the SSMS window where you can select the output, copy and paste into a new query window, and then execute. When outputting the statements, it sorts them by statement type to ensure that objects are first dropped, then collations are changed, and then objects are recreated in an order that ensure that objects with dependencies are generated after the objects they depend on.

But I disagree with some of the ordering choices. For instance, when recreating objects the script first creates indexes, then unique constraints, and then primary key constraints. That is not wrong. But in my experience, the primary key is most often the clustered index. And when you build multiple indexes on the same table, it is smart to start with the clustered index – if you first build a nonclustered index, then that index has to be rebuilt when you later add a clustered index.

I did not try to add logic to look at which indexes are actually clustered and nonclustered to get the guaranteed best order. But I did switch the order. My version of the script now first generates primary key constraints, then unique constraints, and then the remaining indexes. So in most cases, this will build the clustered index first. Oh, and I did a similar change (using the reverse logic) in the part where constraints and indexes are dropped.

Another change I made quite quickly was to fix a small sloppiness. When recreating the foreign key constraints that originally were enabled and trusted, the code generated would force SQL Server to check the existing data twice. I removed the extra check.

I also did a small usability change. The script as posted returns two columns of output to SSMS. I removed the column with the sort order so that it outputs just the generated T-SQL statements. This makes it very easy to copy those statements and paste them in a new query window where I can review and then execute them.

After making those tweaks, I started testing the script. And that’s when the actual bugs popped up. I didn’t track all of them. One I remember is that wrong code was generated when a multi-column unique constraint overlaps with a multi-column primary key. After fixing that, I got errors where e.g. the same constraint was generated twice if it was a multi-column constraint (not sure if that’s an original bug or introduced by me when I fixed the first bug).

As I noticed more and more small issues, almost always related to multi-column constraints, I realized that the problem is that the script was originally completely column-based. For each column, it would check indexes, constraints, etc. That results either in duplication within the script (often, but not always, removed by using DISTINCT), or complex code to try to prevent the duplication.

So I decided that the best way to fix everything is to keep the outline of the script, but make the changed needed to do everything per table instead of per column, while still making sure not to do unneeded work (e.g. indexes on non-affected columns are never dropped and rebuilt).

After completing that work, I had a script that did what I needed: change the collation of all affected columns, including the dropping and recreating of objects that would prevent this collation change. I ran it on my test database and then used SQL Compare (I think this action alone already earned back the entire license fee) to verify that the two databases were now exactly the same except for the collations.

The script

Below you will find the final version of the script that I ended up using. There are some important points that I want to point out. Please take heed when you consider using the script:

  • The original version of the script had some weird exceptions that I didn’t really understand, and that were not always consistent. Probably due to the various sources that Philip C used when he assembled the script. I removed most of them.
  • The original version of the script also had logic to drop and recreate manually created statistics objects. I didn’t have any of those in the database I was working on, so to save some time I simply commented out that part of the code and didn’t try to figure out if there were any bugs there. Since I based some of the logic in the loop within the script, this part of the code almost certainly doesn’t work now. So if you need to use this script on a database that does have manually created statistics, you will have to uncomment these parts of the script and then make the necessary changes to get them to work.
  • I have tested this scrip on the database that I needed to fix. This database is quite complex and big, but I have no guarantee that all possible situations are covered. Please, do not ever run this script on important production databases without first testing it on a test copy of that database, and without having a good backup available.
  • The script is currently set to automatically find all columns that have a collation that is different from the database default collation. This is quite easy to change if your requirements are different.
  • The script posted below only generates the code to make the changes; it does not automatically execute it. The generated output is sent to SSMS as a single result set. I recommend setting SSMS to use “results to text” (Ctrl-T), and checking the query options to make sure that the maximum number of character per column is suffienctly large – SSMS defaults this to 256 which is insufficient; I always set this to the maximum allowed value which is 8192.
  • As mentioned above, I can not justify investing more time in this script now. The problem is fixed so I can’t improve it on customer time. And I need my own time to prepare my presentation for SQLBits next month.

The code

So without further ado, here is a script you can use, with the aforementioned precautions, to find all columns that do not have the database default collation and change them to that collation; including the dropping and recreating of objects as needed for this change.

Off Topic: Charity and snooker

So this will be a very unusual post for this blog. It will be completely off topic.

One of the things I do in my spare time is to play snooker. Mind you, I am not good at the game. But I enjoy it, and it’s a nice break from sitting at my laptop.

Marathon

So why do I share this tidbit about my personal life? Why do I think anyone cares?

The reason is that I need some help.

Next month, on February 15 and 16, 2019, I am participating in a 24-hour snooker marathon. And with my participation in that marathon, I am trying to raise money for charity. I have set up a GoFundMe campaign and I hope to get lots of people who chime in to sponsor my marathon and contribute to War Child.

War Child

Perhaps you wonder why I chose War Child. There are lots of charities. All of them are always deserving of more money. I had to pick one. So why this one?

Obviously, hundreds of other charities qualify just as well. Bottom line, it’s my choice and the most important reason is that I really like the work War Child does.

But there are some reasons beyond just a personal hunch that made me select War Child for this campaign:

  • War Child is non-religious. Lots of charities are firmly inspired by religion. I am okay with that. At the end of the day, if you do good things for the world I do not care whether you do them out of religious or other motives. It’s what you do that counts. But there are people that do care about religious affiliation. People that don’t want to contribute to charities that belong to the “wrong” religion. By choosing a neutral charity, I hope to avoid excluding anyone for religious reasons.
  • War Child is unique. There are lots of areas where multiple charities are active (and, luckily, usually cooperating fine). But I have never seen any other charity that does the kind of work that War Child does.
  • War Child works to protect children. A lot of charities work on behalf of groups or areas where some potential donors might argue that the group or area does not need protection, that it can take care of itself, or that there are worthier goals. But I cannot imagine anyone making that statement on children.
  • War Child fights a horror that is made by humans, and that is one of the worst horrors one can imagine: children, always taken to be the dictionary definition of innocence, being stripped of that very innocence by placing them in the middle of war, and often even forcing them to participate in those wars.

War Child tries to rescue children that are forced to fight as child soldier. War Child also tries to help countries pass and enforce legislation to prevent children from being forced to fight. But War Child also works with children that have been damaged by war, supporting them and helping them regain their innocence. Or, simply put, War Child helps children be children again.

Please help

Please help me raise money for War Child. Go to the GoFundMe page and donate. After the snooker marathon is over, I will first get some sleep – but then I will withdraw the funds and donate every single penny to War Child.

I know people are always concerned with overhead. And yes, there is some overhead. As per their standard rules, GoFundMe will take a 2.9% off the donations to help cover the costs of running their service, plus a $0.30 processing fee for each transaction. But that’s all; there are no other costs at all. Everything that remains after these GoFundMe fees will go to WarChild.

I hope to see lots of donations. Thanks in advance!

T-SQL Tuesday #110 – “Automate All the Things”

It is once more the second Tuesday of the month, better known as T-SQL Tuesday. This month, we got our writing assignment from Garry Bargsley (b|t). He asks us to write about automation: what automation are we proud of or do we want to complete, and using what technology?

However, I found that I was more triggered by the title of his request post than by the content, so I’ll allow myself the liberty of applying a very loose interpretation of the topic.

All the things?

Garry titled his invitation “Automate All the Things”, and I plagiarized that title because it is the word “all” that I want to focus on. As Garry points out, automating everything appeared to be a major focus area (or, if you prefer, buzzword) in 2018. But is it really possible to automate everything? And if it is, then where does that leave us, the workers? And more specifically, what does it mean for me, and my geeky hobby of dissecting execution plans?

Automation and me

I often say, somewhat jokingly, that a good developer never does the same thing twice. If I need to do a repetitive task, I will always look for ways to automate it. And I’ll admit that this sometimes backfires – investing three hours to avoid repeating a five-minute task twenty times is not really the best possible time investment. (Though it does reduce the risk of errors, and it’ll keep me interested in my job for longer).

I won’t go into the tools I use. I typically just pick whatever I have available (based on both what is installed and what I already have knowledge of). Probably very often not the best pick for the job, so I feel no need to brag about this. Plus, as already mentioned, I want to focus on another aspect of automation. But I did want to include this “about me” bit just so you understand that I do love automation. I really do.

Automation and execution plans

Tuning queries has for a long time been purely manual work. Skilled workers, also known as tuning wizards, would look at indexing-related DMVs, pore over execution plans, go into a trance-like state of meditation for a while, and then they’d have the magical combination of a new index and a rewrite of your query to make it go five thousand percent faster. (Or five percent – but any gain is useful, right?)

Early years

Microsoft have been trying to assist here by providing automated tools, and they have been doing this for a long time. Anyone remember the “Index Tuning Wizard”? Don’t be ashamed if you don’t, I myself am still trying time to erase it from my memory. Let’s just call it a pre-alpha release and move on.

But the Index Tuning Wizard got replaced by the Database Tuning Advisor – a new name but also a completely new tool, trying to do the same. And while still far from perfect, it is definitely a much better tool. And it wasn’t just Microsoft anymore, based on the available data in DMV’s such as sys.dm_db_missing_index_details, sys.dm_db_index_usage_stats, and others, community members and tool vendors started building their own methods to automate the work of finding suggestions for adding, removing, or changing indexes. Not all equally successful – but it’s the thought that counts!

Azure

Things really started taking off with Azure. Azure SQL Database, or whatever the name was back then, became popular. Microsoft now saw themselves hosting not dozens, not thousands, but probably millions of databases. That had two immediate results. The bad result (for Microsoft) was that queries that were slow and consumed lots of resources started hurting them way more than ever before. The good result was that they were able to start collecting loads upon loads of metric data. Data they could analyze (using automating, of course) to find common patterns in those slow workloads.

Microsoft employs a huge amount of extremely smart people. As soon as they have both a problem caused by bad performance and the data to help find a fix, it is only a matter of time before they will find solutions. And they have. And luckily, they are not keeping them for their own use only – the fruits of their labor have been starting to see the light.

In SQL Server 2016, the Query Store was added to the box product. In SQL Server 2017, Adaptive Query Processing was implemented – granted, very limited at that time but again a step forwards. SQL Server 2019 takes that to the next level and renames it to Intelligent Query Processing. Also added in SQL Server 2017, a feature that is called “Automatic Tuning” (but that in reality only does automatic plan forcing based on Query Store data, plus some verification after the fact). Finally, and currently only available in Azure SQL Database, there is an option called “Automatic index management”, that will create new indexes and drop unused ones.

Future

So what does this mean for the future? Did I make a bad choice when I chose to spend time on creating the Execution Plan Reference?

I think not. I think that all those tools will be very good at finding and solving the 90% of performance issues that I myself, and many others, would be able to solve relatively easy. It isn’t hard to find a candidate for a covering index, and it doesn’t need much understanding of execution plans to do so. It isn’t rocket science to find unused indexes or to identify almost-overlapping indexes that can be combined. There is little challenge in finding that a query is slow because the fixed cardinality estimate of a multi-statement table-valued function is far off from reality. If Microsoft, or other vendors, can automatically find those cases, that is fine with me. I would build such tools for myself if I were smart enough and had access to the data that Microsoft has.

But based on my experience with execution plans I have seen in my career, I am convinced that there are still lots of execution plans out there that do … weird stuff. Things that are so uncommon that no tools, no machine learning algorithms, will be able to find a root cause and a fix from looking at those execution plans. Perhaps, one day, that will change. And I know that the pace of change has been increasing over the past century and still increases. But I still think that it will take a long time before machines have sufficient fake intelligence that they would be able to tune each and every query.

For the years (and decades?) to come, I expect that automation of the query tuning process will mean that we get to look at far less problem queries than before. But the ones we look at will be more complex, more interesting, and harder to get to the bottom of. So once you do get a query on your desk, being able to look at the execution plan and understand what is really happening will be even more important than it was in the past.

I’m not wasting time on the Execution Plan Reference. In the future, it is possible that less people will need it. But those that do, will need it even harder than they need it now!

Conclusion

Okay, I didn’t really stick to the topic that Garry suggested. But his title and intro gave me stuff to think about, and then to write about, so I still consider this a T-SQL Tuesday post.

Thanks for the inspiration, Garry!

Looking back to 2018

We are approaching the end of 2018. It is December 30 as I start writing this (I’ll probably finish and publish it December 31), traditionally a time to look back at the past year. I often refrain from doing what everybody does because a certain date is on the calendar (my dad would sigh and suggest that I’m still not over puberty, and he’d probably be right). But in this case I’ll make an exception.

For me, 2018 was mainly the year I started building my own website. The site where I publish my blog posts, but more important: the site where I build the SQL Server Execution Plan Reference.

This website officially went live December 14, 2017, so actually already a bit more than 1 full year ago – but close enough to call today a good day to look back. At first, the website was used for my blog posts only. But I was then already working on my bigger project, my main reason to migrate – and exactly 7 months ago, on May 31, 2018, I officially announced and opened the Execution Plan Reference. Small and incomplete back then, but growing ever since.

Execution Plan Reference

When I went live with the Execution Plan Reference, I only had nine pages of content: four that describe an operator, four with generic information, and one background page. I had not originally planned to go live with so little content, but a T-SQL Saturday challenge made my change my mind.

As part of that same T-SQL Saturday challenge, I also committed to adding at least one page every month. I added pages on Segment in June, Assert in July, Table Spool and Index Spool in November, and Bitmap in December. I also published a long article on spools in October; this article was the primary reason why I failed to meet my objective of new content in August and September. That’s not a valid excuse – but it is the explanation.

I am currently working on the Adaptive Join page. I don’t expect to be able to finish it in December, but that obviously does mean that I will easily meet my goal of at least one new page per month in January 2019.

I am not happy with the progress of the Execution Plan Reference so far. When I set my goal of at least one new page per month, I was deliberately easy on myself; my unpublished goal was to go much faster. There currently exists over 70 operators. Not all of them warrant a separate page, but adding just one per month feels too slow – I like to get the reference at least fairly close to complete one day, and not in a five-year plan.

Blogging

Since I migrated my blog to this site, I published a total of 25 new blog posts (this one included). Eight of them were inspired by T-SQL Tuesday writing assignments. Though the distinction is sometimes hard to make, I’d qualify 16 of my blog posts as “technical”.

I didn’t set myself a goal for how often to blog, However, I think that two per month, and at least one technical per month, is okay. Especially my technical posts can get very deep and very long, so it would not be realistic to expect much more than this.

I did not meet my self-imposed goal of participating in all T-SQL Tuesday assignments. Some of the selected topics simply didn’t resonate with me, and I didn’t want to write something I myself would not like to read, just to meet some self-imposed goal.

I also did not meet my self-imposed goal of publishing a Plansplaining post on the first Monday of every month. This did go well at first, until I realized that I really needed to finish my article on spools first so I could then use that as the basis for the 9th episode. That article turned out to take a lot of time and effort to get thoroughly researched and hence I had to delay episode 9.

Traffic

My worry when I started this site is probably the same that every site owner has: will I get traffic? Producing the content I publish requires a lot of time investment, and my worse fear was that nobody would ever benefit from it. Luckily, it turned out that my fears were unwarranted.

I was truly amazed that I already had 1000 views in the first month, December 2017. Since then, the number started going up slowly for a few months – until I announced the Execution Plan Reference. Traffic went over the roof for a few days, with an amazing 1649 views on one single day as the top day so far. June 2018 came close to a total of 6000 views. After that traffic normalized, but now well above 2000 per month – and, even more important, the numbers just kept on growing. I broke through the 4000 views in November, and despite the expected lower traffic because of Christmas, December is already ahead of November with still one more day to go!

I must say that I am both humbled and grateful when I see these numbers. Thank you to all those who visited this site. Thank you to those who subscribed, or who bookmarked the site, or who use the RSS feed to know when I add or update content. Thanks to those who helped spread the word, by sharing a link or otherwise alerting potentially interested people to find my site.

When I started this site, I had absolutely no idea what to expect. I never really looked at traffic when I still blogged at sqlblog.com. So I obviously had not set any targets for myself, other than “oh I hope people will find my site and like it”.

But you, all you collective readers of this site, have proven that it is possible to exceed expectation even when you have no expectations! Again: THANKS!

Public speaking

About mid 2017, I realized that I’d have to cut down on, and later fully stop, my public speaking. This was for personal reasons, and though the choice was hard, it was not difficult.

But fortunately, during 2018 it became apparent that my personal situation would allow me to return to attending conferences. I started submitting to lots of events scheduled for the second half of the year, expecting to be turned down by many. And then I got accepted by so many that I had to quickly remove my submissions for a whole bunch, to avoid having entire months without ever being home in the weekend.

I really had a lot of fun going to and speaking at so many events. I am happy that organizers still have the confidence to select me, that the audience still chooses to sit in my sessions, and that I have so many fantastic interactions with attendants, organizers, and other speakers at those events. Although it’s always great to return home after being away for a long time, it never takes long before I start to feel that itch again, and start looking forward to the next conference.

Conferences seem to always come in waves. Apparently some periods in the year are more popular than others. Fall is always peak season, winter and summer are low season. But I nevertheless already have a few speaking slots confirmed for the near future. Both SQL Konferenz and DataGrillen are already fully confirmed and public, at least two others are already confirmed but not yet public, and some more are in the pipeline.

New goals

As I was writing this overview, I realized that I had not set myself many specific targets. That has a very obvious benefit: without targets, I will never fail my targets. But I know that once I do set targets, especially when I share them in public, I can use them to motivate myself.

So for 2019, I will now set myself some targets, and share them here.

  • Execution Plan Reference: I absolutely want to keep up with the previously set target of at least one new page each month. But as I already indicated, that is a defensively low target. I considered doubling it, but seeing how much work a single page can be and how much other things sometimes go on in my life, that would be unrealistic.
    However, I do increase my target in another way.
    For 2019, I want to add at least 1 new pages to the Execution Plan Reference each month, and at least 15 new pages in the entire year.
  • Blogging: I no longer set myself the target of doing one “Plansplaining” post each month. I still have a few ideas for this series on my list, but not enough for 12 episodes; and so for I have not received any suggestions from readers. But I do like it to be on a schedule, so I now opt for once every two months – every even-numbered month. It will still go live on the first Monday of the month, at 17:00 Dutch time (CET or CEST).
    However, I do not want the amount of technical content to go down, so I will write other technical posts instead. I also want to continue contributing to the T-SQL Saturday events, though my participation will depend on the subject.
    For 2019, I want to write at least one technical blog post per month. In even-numbered months, this will be a “Plansplaining” post.
    I’ll try to attempt to participate in the T-SQL Tuesday challenge each month, though I allow myself to opt out if the subject doesn’t resonate with me.
  • Traffic: Since this is hard to directly influence, any target I set here is more something I will try to achieve than something I can ensure I achieve. As such, I set my targets at a moderately low level.
    In 2019, I want traffic to sqlserverfast.com to continue to increase month over month. I hope to double my total traffic over the year, so in December 2019 I hope to have a total of 8000 monthly views.
  • Public speaking: I love doing this, and at times I wish I could go to a conference every week. But I also love my wife. I love my kids, who have moved out of the house and often only visit on weekends. And I love sleeping in my own bed. Plus, most conferences I go to have no budget to compensate speakers. I often pay for travel and hotel out of my own pocket, and I lose billable hours when travelling and attending the conferences. So I have to strike a balance.
    I feel that approximately once per month should be a good balance. More than that would probably come at a too great cost for my personal life, and my business financials. But I cannot spread them out exactly like that, so my goal will be based on the year.
    I also hope to be able to present a few pre-cons this year. Though exhausting, it is also very rewarding to be able to teach about execution plans for a full day. And I cannot deny that the compensation speakers receive for teaching a pre-con make it easier to justify the cost of attending conferences.
    My aim for 2019 is to speak at 10 to 15 conferences, and to present at least 2 pre-cons.

Conclusion

For me, 2018 was a good year for my SQL Server community activities. Setting up my own website and moving my blog has worked out well. The Execution Plan Reference was well-received in the community. Returning to public speaking was really fun. And though not mentioned above, completing the technical review work on Grant Fritchey’s “SQL Server Execution Plans, 3rd Edition” and finally seeing the final version completed was extremely rewarding.

For 2019, I have set myself targets on the ongoing community work I do, and I have published them here. Hopefully, that will help keep me from slacking!

 

Dear reader, thank you for visiting my site, thank you for reading my ramblings, thank you for contributing to the little statistics that tell me my community work is not in vain.

I wish everyone reading this an excellent (but safe!) New Year’s Eve celebration, and a fantastic 2019!

Plansplaining, part 9. Recursive CTEs

I had to skip three months, but finally it is here: part eight of the plansplaining series. Each of these posts takes an execution plan with an interesting pattern, and details exactly how that plan works.

I am pretty sure that (almost) everyone reading this blog knows that a CTE (Common Table Expression) is an independent subquery that can be named and then referenced (multiple times if needed) in the main query. This makes CTEs an invaluable tool to increase the readability of complex queries. Almost everything we can do with a CTE can equally well be done by using subqueries directly in the query but at the cost of losing readability.

However, there is also one feature that is actually unique to CTEs: recursion. This blog post is not about what this feature is, what it does, or how to write the code. A brief description is given as part of the complete explanation of CTEs in Books Online, and many bloggers have written about recursive CTEs as well.

For this post, I’ll assume the reader knows what a recursive CTE is, and jump into the execution plan to learn how this recursion is actually implemented.

Sample query

The query below is modified from one of the examples in Books Online. This query can be executed in any version of the AdventureWorks sample database. It returns the exploded Bill of Materials for the “Road-550-W Yellow, 44” bicycle (which has ProductID 800).

The execution plan for this query is quite complex, as you can see below. (As in earlier episodes, I have added the NodeID numbers into the execution plan to make it easier to reference individual operators).

This execution plan does several things that are fairly easy to understand. And then there is some other, much more interesting stuff to make the magic of recursion happen. To prevent this post from becoming too long, I will describe the simpler parts very briefly. Feel free to run this query on your own system to dive deeper in the parts I skip. Make sure you understand them, and don’t hesitate to leave a question in the comments section if anything is unclear to you!

The top line

As always, execution of the query starts of the top left with operators calling their child operator (from the first input in the case of Concatenation #1), until Clustered Index Seek #4 is invoked. The properties of this operator show that it uses an index on (ProductAssemblyID, ComponentID, StartDate) to quickly find all rows that match the predicate of the anchor query. It processes them in order of ascending ComponentID, returning the three columns from this table that are used in the query.

As always, it is important to remember that an operator never just returns all rows; it waits until it is called, does the work needed to produce and return the next row, and then saves state and halts until it is called again. In this case we will see that the operators on the top row of the execution plan keep calling each other until all matching rows are processed so it not especially relevant in this case but it’s still important to always be aware of this!

Compute Scalar #3 receives rows from Clustered Index Seek #4. It adds one extra column, named Expr1001, that is always set to 1 (one). We will later see that this Expr1001 column will eventually be returned to the client as the Lvl column; the fixed value of 1 in this operator confirms what we already suspected, that the top line evaluates the anchor query.

Compute Scalar #2 then adds yet another column, Expr1012, that is 0 (zero). We will later see what this column is used for.

The rows are then returned to Concatenation #1, which implements the UNION ALL expression in the query. Like any Concatenation operator, it simply reads rows from its first input and returns them unchanged. Once the first input is exhausted it moves to the second input (which I will discuss further down). Concatenation operators can have more than two inputs, but in this execution plan it has just two.

A side effect of a Concatenation operator in an execution plan is that it renames all columns, which can make it harder to follow what is going on. I often end up writing down the column names in a file on a second monitor, or just on a simple scrap of paper. In this case, the most important thing to remember is that the “new” column Expr1015 refers to the original Expr1012 (from Compute Scalar #2) for rows from the first input, and Expr1014 for the second input.

The other four columns, Recr1008 to Recr1011, correspond to the four columns returned by the query.

Index Spool #0

The final operator on the top line of the execution plan is Index Spool #0. Looking at its properties we see that the With Stack property is True, which means that this is actually a “stack spool”. As you can see in the description of this processing type on the Index Spool page in the Execution Plan Reference, this means that it uses a worktable indexed by the first column in the Column List, which supposedly represents the recursion level. This is Expr1015, originating from Expr1012 and always zero for the top line – this bit will get more interesting later on. For now, because all rows have the same value the engine will add a uniqueifier to each row in the worktable to ensure uniqueness and to guarantee that rows can be retrieved in the correct order.

The logical operation is Lazy Spool; this means that it returns its input rows immediately, after storing them in the worktable.

The Output List property says that all five columns returned by Concatenation #1 are stored in the worktable and returned to the calling operator. However we know from the query and its results that only the four columns Recr1008 to Recr1011 are actually returned to the client; Expr1015 is not and is apparently filtered out in the top-left SELECT operator, though this is not exposed in the execution plan.

This construction is due to a limitation of the Index Scan operator: the Output List property defines both the structure of the worktable and the columns returned. In this case the Expr1015 column is not needed by any parent of the operator and hence doesn’t need to be in the output. But it IS needed in the worktable (we’ll see why further down), and hence needs to be in the Output List property.

Top line condensed

Taking all the operators of the top line together, we can see that each call goes all the way from SELECT to Clustered Index Seek #4, which reads and returns a row that then propagates all the way back to SELECT and to the client. This process repeats a total of 14 times, reading and returning the 14-row result set of the anchor part of the recursive CTE. If we could pause execution at this time we would see a result set as shown to the right.

The worktable created by Index Spool #0 contains these same rows at this point, but with two additional columns: one for Expr1015 (equal to zero for all these rows) and one with a four-byte uniqueifier on all rows except the first, to ensure that the clustered index of the worktable keeps these rows in place. (See this article for more details on the exact internal storage of the various spool types).

After returning the 14th row the execution once again propagates all the way through to Clustered Index Seek #4, but this time it returns an “end of data” signal because no more rows satisfy the conditions in its Seek Predicates and Predicate properties. The two Compute Scalar operators simply relay this “end of data” signal to their parents. Once Concatenation #1 receives it, it responds by moving to its next input, calling Assert #5.

The bottom part

Most operators when called to return a row start by calling their own child operator for a row. The operators on the second input of Concatenation #1 are no exception, so once Assert #5 is called the first time control immediately passed through Nested Loops #6 and Compute Scalar #7 to Table Spool #8, the second spool operator in this plan.

Table Spool #8

Since this Table Spool has no inputs, it must work as a consumer. This is confirmed by seeing the Primary Node ID property that points to Index Spool #0 as the builder of the worktable that is read by this Table Spool. You can also see that the With Stack property set to true (as is expected– a “stack spool” always consists of a builder Index Spool combined with a consumer Table Spool).

The behavior of the Table Spool part of a stack spool combination is to remove the previously read row from the worktable and then read the row with the highest key value. Remember that the key is on the Expr1015 column which is supposed to be the recursion level, and that the system uses an internal uniqueifier to distinguish between duplicates which means that within each Expr1015 value the last added row will have the highest uniqueifier. So by reading the highest key value, the operator effectively reads the most recently added row within the highest recursion level.

You may also note that the column names in the Output List property are different from the column names in the Output List of Index Spool #0 where the worktable is created. This is normal behavior. The columns are matched by order. So for example, the third column listed in the Output List of Index Spool #0 is Recr1009 (which was the name assigned by Concatenation #1 to the ComponentID column from both the first and the second input). The third input listed in Table Spool #8 is Recr1004, so now Recr1004 contains the ComponentID value.

For this first call, there is no previously read row so there is nothing to delete. The operator reads the row with the highest key value – all rows currently in the worktable have Expr1015 equal to zero, so the last added row will be read and returned to the calling operator. This is the row with ComponentID 994.

The join logic

Table Spool #8 passes this row to its parent, Compute Scalar #7. Its Defined Values property shows that this operator computes a new expression, Expr1014, and sets its value to Expr1013 + 1. Since Expr1013 was 0, Expr1014 will be 1.

The modified row is returned to the parent operator (Nested Loops #6). This operator uses the Outer References property to push several values from its outer (top) input into the inner (bottom) input. These values are then used by Clustered Index Seek #10 to quickly find rows with ProductAssemblyID equal to the Recr1004, which we already saw to be the new name for the column holding the ComponentID. The Clustered Index Seek #10 also has a residual Predicate property for the additional filtering on EndDate IS NOT NULL. Based on my knowledge of the data, I expect that for the current Recr1004 value of 994 this Clustered Index Seek #10 will find two rows; on the first call it will return the first of these two matches, a row with ProductAssemblyID 994, ComponentID 3, and PerAssemblyQty 10.00.

This row is passed through Compute Scalar #9, where Expr1007 is computed as Recr1006 + 1; if you are following along with a full mapping of all column names you will realize that this is the computation for the Lvl column in the query. Nested Loops #6 then combines this row with the current row from its outer input and returns the result (to be precise, only the columns that are listed in its Output List property) to its parent operator.

Assert #5

An Assert operator is used in execution plans to check for conditions that require the query to be aborted with an error message. Based on the query you may not have expected to see an Assert in this execution plan, since the query includes nothing that calls for a potential runtime error.

Let’s check the properties of this operator to see if we can find out why it’s there. As you see, it does a simple check on the Expr1014 column: if this column is larger than 100, the query will end with a runtime error. In the current row this column has the value 1 so processing continues. But seeing this condition should help you realize why it’s there: not because of something I included in the query but because of something I omitted. If you are familiar with recursive CTEs, you will know that by default they support a maximum recursion level of 100. This is to prevent us from ourselves – without it, a coding error that results in infinite recursion would cause the query to actually run forever, using more and more resources along the way. The maximum recursion level can be changed, or removed (at your own peril!). If you add the query hint OPTION (MAXRECURSION 0) to the query, the protection is disabled and the Assert operator is removed from the execution plan.

The place of this operator is very important. The check is not done immediately when the new recursion level is computed, but only after they are joined to matching rows from the BillOfMaterials table. This means that if Table Spool #8 reads a row that already has recursion level 100, Compute Scalar will immediately compute the next recursion level at 101, but execution will not stop. First Nested Loops #6 will execute and look for matches. If there are no matches, then it will never return this row with Expr1014 = 101 to its parent operator, Assert #5, and execution will not end with a runtime error. However, when the Nested Loops operator does find a matching row, then  the combined data is returned to Assert #5 and execution stops before this row is ever returned to the client (or even stored in the worktable).

Back to the top

The row that was just produced by the second input of Concatenation #1 is now passed through that operator to Index Spool #0, which adds it to the worktable and then passes it to the client. The fifteenth row of the result set is now added to the data in the results pane of SSMS. This row is in the worktable as well. Because Expr1015 was set to 1 for this row, it is added at the end of the worktable (based on the logical structure imposed by the clustered index – we obviously have no idea and do not care on which page within tempdb this row is stored).

Rinse and repeat

After processing the fifteenth row the operators once more call each other, and this time the call propagates to Nested Loops #6 quickly. But here the path changes. The last thing Nested Loops #6 did was request and return a row from its inner input. It will now do that again, so control is passed through Compute Scalar #9 to Clustered Index Seek #10. The second row with ProductAssemblyID 994 is now read and returned – this is a row with ComponentID equal to 525. This row is then processed exactly the same as the previous row. So at the end, this is the sixteenth row in both the results and the worktable.

If we were to pause execution at this point, the result set would look as shown to the right. The worktable would be equal, though with the addition of Expr1015 (0 for the first fourteen rows, 1 for the last two)) and the hidden uniqueifier to ensure that the rows remain properly ordered.

For the 17th call the control with pass through the operator in the same way is before. But now Clustered Index Seek #10 finds no more rows to return; its “end of data” signal is relayed trough Compute Scalar #9 to Nested Loops #6, which then requests the next row from its outer (top) input.

This call flows through Compute Scalar #7 to Table Spool #8. As explained above, the Table Spool of a stack spool combination first removes the row it previously read, row 14 in the picture above. After removing this row it then proceeds to fetch the row with the highest key value, which is row 16 in the screenshot above. For this row, Expr1014 is computed as 2 (because it was stored with Expr1015 equal to 1), and then the data is passed to Nested Loops #6. However, since there are no rows in the base table with ComponentID 525, the inner input of this Nested Loops operator returns nothing so Compute Scalar #7 and Table Spool #8 are invoked again.

Now row number 16 in the picture above is removed, since this is the row last processed. After that, the row with the highest key is row 15 in the screenshot, so now this row is passed to Nested Loops #6. And this time it will get matches from its inner (bottom) input, so now more rows will be produced, stored in the worktable (again at the end), and returned to the client.

The same process repeats, over and over again, until all 89 rows of the result set are produced. Over and over again, Table Spool #8 removes the row it read last from the worktable and then reads the row with the highest key value, which is always the row added last. And each time that row actually produces new results in the join, they are added, at the end of the worktable, by Index Spool #0.

Eventually, Table Spool #8 will read the last remaining row from the worktable. After that, either new matches are found and added to the worktable so the process can continue, or no new matches are found. In that last case, Nested Loops #6 will one final time try to get a row from its outer (top) input. Table Spool #8 will remove that final row that it just read, and then see no rows left for it to read and return. It will send an “end of data” signal which then propagates all the way through all operators to the top-left SELECT, and execution of the query stops.

Stack spool

Some of the readers might already be familiar with the concept of a stack. For those who are not, here is a short explanation.

If two elements within a system exchange data and the elements may sometimes produce more data than can be handled immediately, there are two basic structures for exchanging that information: queue-based or stack-based.

The queue-based system, also known as FIFO (“first in first out”) gets its name from the British concept of queueing politely, without elbowing in front of other people. The first to arrive at a bus stop will stand there, the next gets in line behind them, and when the bus arrives the passengers board in the order in which they arrived. A queue within information systems is used to ensure that data is processed in the same order it is produced.

The stack-based system, also known as LIFO (“last in first out”) gets its name from the concept of stacks of plates in a self-service restaurant. When a customer arrives they’ll grab the top plate of the stack. When the dishwasher brings a new plate (or more typical a bunch of new plates) they are simply put on top of the stack. The next customer will grab the plate that was added the most recently. A stack within information systems is used when the data that was added last needs to be processed and removed from the stack first.

Not quite a stack

The combination of an Index Spool and Table Spool, both with the With Stack property set to True, is called a stack spool. That is because it conceptually implements a stack that ports data produced in one part of the execution plan to another part where it is processed. However, it is not a 100% true stack. Official stack implementations would remove a row as it is processed, not after it has been processed and more rows are added. That would be like a restaurant customer adding food to their plate while the dishwasher already stacks now plates upon it, and then the customer will have to get the plate (with the now squashed food) from within the stack while leaving the plates that were added later on top of it.

I am not sure why Microsoft chose to implement this the way it is. After all, it’s probably even easier to build a system that removes the row immediately. However, I do have two theories:

  1. Perhaps there are some scenarios with a very complex recursive query where the same row needs to be read multiple times from the same worktable. This is of course not possible if the row is already deleted. However, I have not been able to set up any such scenario; most of my attempts were rejected for violating the rules of recursive CTEs.
  2. The more likely explanation is that operators in execution plans are designed to be as efficient as possible, and that one such optimization is that operators that read data from a table (including worktables) do not actually copy the data from the buffer pool to working storage, but just use pointers to the location within the buffer pool. If this were done with data read from the worktable of a stack spool, then deleting that row as it is read would mark that area of the buffer pool as free for reuse – and now there is a risk of other processes overwriting the data when it is still needed.

Conclusion

Recursion may sound like a simple technique, but in reality it rarely isn’t. In this blog post we looked at all the nuts and bolts that come together to implement recursion in the execution plan for a query with a recursive CTE.

The absolute worker bee is the combination of an Index Spool and a Table Spool, both with the With Stack property present and set to true. This combination is also known as a stack spool, because it implements a stack that allows rows to be stored in one place in the plan and then read and processed in another plan using the so-called “LIFO” (last in first out) method. The last row added will always be read and processed first; the first row added will be read last. Understanding this mechanism helps to explain things like, for instance, the order in which rows are returned from a recursive CTE. However, as always, the order of the results in a query is only ever guaranteed if an ORDER BY clause is used – so please don’t go and build logic that relies on the order described in this post. It is subject to change without notice if Microsoft decides to implement a different way to evaluate recursive queries.

But the role of the various Compute Scalar operators should also not be underestimated. In these, the recursion level is computed. It is zero for the anchor query, because no recursion is used in that part. For rows produced in the recursive part, it is set to the recursion level of the input row plus one. We saw this recursion level being used by an Assert operator, to ensure that execution stops when the maximum recursion level is exceeded. But the more important usage of the recursion level is that it defines the clustered index for the worktable. That’s why the computation of the recursion level will not be removed it you disable the maximum recursion check.

I hope to be able to continue this series. Whether the next episode will be ready within a month, I do not know. But I will work on it. However, I am starting to run out of ideas. There are still some notes on my shortlist but I do need reader input here as well! If you see an unusual and interesting pattern in an execution plan, let me know so I can perhaps use it in a future post!

Tags:

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close