Announcing: The SQL Server Execution Plan Reference

As some of you might already know, Grant Fritchey is working hard on updating his book on execution plans. The refreshed and much-improved third edition should release “real soon now”. And I happen to be involved in this project, as technical editor.

While working on that book, I often ran into interesting and intriguing details in the execution plans. Details that, more often than not, are well beyond the scope of the book. So they will not be included, and rightly so.

But it felt like a shame to have that information not available anywhere. Or scattered across the internet on various sites. So one day, while looking at yet another very interesting operator interaction in yet another execution plan, I decided that I would start to assemble all the information I can find on execution plans, and bring it all together in one single location: The Ultimate SQL Server Execution Plan Reference – or EPR for short.

The plan

I didn’t choose the term “Execution Plan Reference” by accident. The core of the EPR will be a full description of all that is known about every operator known to exist in execution plans: what it does, how it functions, what properties it can have, how those properties affect its behavior, and any additional information that may be relevant to understand the operator. This section will be one page for each operator. Of course, some operators are fairly simple while others are more complex, so those pages will vary in length.

Apart from that core content, I planned some generic pages. It makes no sense to repeat the explanation for properties such as Estimated Number of Rows or Number of Executions on each operator’s page, so instead I wanted to have a single page to list and describe all common properties. I also wanted an introduction page that explains the basics of reading an execution plan, lists the properties for plans as a whole, and debunks some common misconceptions.

And there will be articles with additional background. Instead of having to explain what exactly an “anti semi join” is on each of the four pages for the four join operators, I decided to create a single page describing all the logical join types. When working on Hash Match, the page, was already very long and complex before I even had a chance to start on the details of the “dynamic destaging” process that handles memory spills, so I decided to leave that for a future page. As I continue to work on the EPR, I will probably continue to create or plan separate pages for content that applies to multiple operators, or that I consider too deep and too advanced for the operator’s normal page.

I realize that this is a major undertaking that will take a long time to finish – and given the rate at which new versions of SQL Server are currently released, probably never finish at all. The plan was to make a start and go live once I had “sufficient content”. With no exact idea of how to define “sufficient”.


After a quick start during a two-week period of little other work, real life reared its ugly face and progress changed from a fair pace to a slow crawl. I had already hinted at my plans a few time, but didn’t feel ready for go live yet – and given the slow progress, I didn’t dare to set a date either. Until Riley Major picked a T-SQL Tuesday topic that triggered me.

Upon reading that post, I decided that my vague “I’ll go live when it’s ready” and “I’ll work on it when I have time” would not cut it. I needed to set a deadline. So I set myself a deadline: the EPR will go live before June 2018. Whatever content I have completed at that time will be out in the open, and I will post an announcement on my blog.

And to make sure I would not get second thoughts and weasel my way out of this, I committed to this timeframe in public. No way back.


After making that commitment I got swamped with other work even more than before so I now have even less content than planned. I just barely managed to finish the Hash Match page in time (which, admittedly, was a cubic buttload of work), and had to spend a few hours today for a final cleanup sweep across the pages I already finished.

Bit I did manage to have complete descriptions of a few of the more complex operators: three of the four join operators (Nested Loops, Merge Join, Hash Match), both aggregation operators (Stream Aggregate and the aforementioned Hash Match), and a few more generic pages are now live for everyone to see. And with Hash Match out of the way, I really hope to add new pages at a faster pace.

And now …. it is time. I did decide to remove the word “Ultimate” from the title; I might add it back later but for now it is not justified. So now is the time to proudly present … (drumroll)

The SQL Server Execution Plan Reference

The link above takes you to the main page of the EPR. The “Reading guide” paragraph then provides links to the three most important other locations: the generic information (which I recommend as a must-read for everyone; even if you already know a lot about execution plans there might still be some surprises there), the list of operators (which has further links for all operators I already described), and the page that describes properties that are the same for all (or most) operators and that are therefore not included in the property lists of the individual operators.

I hope you like it. If you do, tell your friends and coworkers. If you don’t, tell me why. I might be able to fix it!

“Can I help?”

There are of course a lot of people in the community who know a lot about execution plans. Some of you may know things I don’t. If this applies to you, then yes: you can help. Let me know what you know. Tell me if you see anything that is incorrect or incomplete. With your help, I can make this site better.

But you do not have to be an expert to help me. Say you are looking at an execution plan in one window and the EPR in another window. And the execution plan shows something that the EPR fails to describe. I want to know that too. I do not only want to hear from people that recognize my errors for what they are, I also want to hear from people who are struggling to find an answer and don’t find it on the EPR. Perhaps I am able to figure out the missing information, and I’ll add it. Perhaps I have no clue either but I can then at least document that the issue exists. That might at least safe some other poor souls some time when they run into the same issue, somewhere in the future. Or it might trigger another reader, stir a memory, and then they can tell me how they fixed it.

Or perhaps you don’t see errors, you don’t have unanswered questions, but you simply would like to give some input to help me set priorities. Or you want to point out a spelling error, request a style change, or … well, anything basically!

Whatever your feedback is: you can help me make this site better by sharing it. There is a link to the contact form on every page. I might not always respond immediately. I might sometimes not even respond at all. But I do read every mail I get.

Still here?

Why are you still here? What are you doing reading this final paragraph. Go ahead, click that link and start reading … the SQL Server Execution Plan Reference.

T-SQL Tuesday #102: Announcing a new site

Back in 2009, Adam Machanic (b|t) started an initiative that is still going strong: T-SQL Tuesday. This is a monthly blog party. Every second Tuesday of each month, lots of people blog about the same topic, selected by the host for that month.

The May 2018 edition, better known as #102, is hosted by Riley Major (b|t). His challenge for this month: give back to the community. Pick an activity to help the community, explain your choice, and then commit to it. And include an ETA. If ever I saw a scary T-SQL Tuesday challenge, this has to be it.


Why is this invitation scaring me, you might wonder? The issue is, even though I do produce content for the community, in several ways, I tend to struggle with planning. Sometimes I plan to prepare my next blog post “this week”, and then it takes over a month to even get started. Sometimes I get invited to speak at a conference, and instead of preparing my slides months in advance like all other speakers do (heh!), I am working frantically on the last demos right until the start of the conference. And sometimes I plan to increase my activity on (fill in your Q & A site here), dive in at the deep end, and then just drop off that site’s radar after a few months

And then, last year, I bought this domain, set up my blog here, and told everyone that I would “soon” announce my other plans for this site. I did start working on it enthusiastically. But as so often, reality overtook me. Other stuff happening, other priorities – the usual. I never stopped working on the project, it just slowed down to a snails pace. And I kept postponing the official announcement because I felt there is not enough content yet.

First step

So here is my commitment. Open in public for all to see, and for Riley to keep bugging me when I don’t live up to my plans. No later than the end of this month (May 2018), I will officially announce what other content I will host on this site. And at the same time, I will put all the content I have at that time live. It will be less then it probably should be for the start of a big project, but I blame Riley for that – his challenge pushed me to take this step now instead of waiting and postponing for, well, perhaps forever.


But announcing the site and putting it live is just the first step. I also commit to adding content to that site, at a page of at least one new page per month. I will try to grow the site at a faster pace than that, but I know that life can have its surprises, so my official commitment will not go beyond one page per month.

More details

I addressed three of the four steps Riley prescribed: I picked something, I told you how I am going to do it, and I told you when. I skipped on the why. However, it is impossible to explain my reasoning without raising at least a corner of the veil, so you can consider this a pre-announcement. Or, if you wish, a teaser.

Most of my community activity over time was focused around query tuning. In the last few years this has been focusing more and more on execution plans. I occasionally teach pre-conference workshops on execution plan, and several of my regular sessions are on that subject too; I am working as the technical editor for the third edition of Grant Fritcheys (b|t) excellent book on execution plans (by the way, do not buy the second edition but wait for the third, it should be ready real soon now, and it is ten thousand times better), and my blog posts tend to focus on this subject as well.

Doing this learned me, among lots and lots of other things, that I know more about execution plans than I can fit in a single workshop; that my knowledge goes deeper than is appropriate for Grants book, and that not all my knowledge is suited for blog posts. But I still want to share all that knowledge I have gathered. Gathered from listening to and reading from many sources (I am truly standing on the shoulders of giants), from experimenting, diving deeper and deeper to understand what I see, and from putting all that input together.

I managed to find a form that I think it good for its purpose. I’m building a site that I believe will cater to the needs of a relevant subset of our community. Not all: I know that not everyone is as interested in understanding every detail of execution plans as I am. But those that are should hopefully benefit from my endeavors.


So there’s my motivation. A desire to share all the things I have learned (for a large part from the community) about execution plans with our community. And that will result in a new website, hosted here at, focusing on execution plans. It will go live somewhere in this month, May 2018. And after that (starting June) I will add at least one new page each month.

Thanks, Riley, for giving me that nudge that I needed to go live and to commit myself to upping the pace!

Plansplaining, part 5. Bitmaps

This is the fifth post in the plansplaining series. Each of these blog posts focuses on a sample execution plan that exposes an uncommon and interesting pattern, and details exactly how that plan works.

In the first post, I covered each individual step of each operator in great detail, to make sure that everyone understands exactly how operators work in the pull-based execution plans. In this post (and all future installments), I will leave out the details that I now assume to be known to my readers. If you did not read part 1 already, I suggest you start there.

In this post I will take a look at bitmaps. Bitmaps are most commonly seen in execution plans for so-called star join queries.

Sample query

The query for this post is very similar to the query we used last month. In fact, the only thing I did was to add two filters to the query; other than that the query is unchanged.

If you run this query in ConsosoRetailDW and inspect the execution plan, you will see that the small change in the query has resulted in a small change to the execution plan. The picture below represents the right-hand side of the execution plan, where the data is retrieved, joined, and filtered. (The left-hand side is responsible for grouping and aggregating, a detailed discussion of that process is out of scope for this post).

If you compare this picture to the one in the previous post, you will see more similarities than differences. Once you start digging into the properties you will see a few more differences but most of the execution plan is actually the same. Which means I can save a lot of time by not explaining the things that are unchanged.

Where are my filters?

Based on the additional WHERE clause in the query, you might have expected to see one or more Filter operators added to the plan. Or the use of Index Seek operators with appropriate indexes on the EmployeeCount and ColorName columns. But there are no such indexes so the optimizer has no better option than the Clustered Index Scan, and the reason that you see no Filter operator is because the optimizer loves to push predicates as deep into the tree as possible.

In this case it was able to push the predicates completely into the Clustered Index Scan (the screenshot shown here is for Clustered Index Scan #7, but #12 is very similar). It is still a scan. The storage engine still has to read all rows from the table – 306 in this case as evidenced by the Number of Rows Read property. But the storage engine itself then checks the predicate, and only he 187 rows that match this predicate are then returned to the Clustered Index Scan operator (which then, as usual, returns them to its parent operator).

If you look carefully, you may notice that the width of the arrows into and out of the two Parallelism operators (#7 and #11) are not the same. In this case you need to look hard to see it, but I have seen execution plans where the difference in width was far more obvious. It appears as if the Parallelism operators return less rows than they receive. That is not the case, though. What you see is the result of a visual bug in Management Studio: when an operator exposes both the Actual Number of Rows and the Number of Rows Read properties, then the arrow width is based on the Number of Rows Read. It represents the amount of work done by the parent operator, but not the amount of data flowing between the operators. Since arrow width normally always represents the amount of rows, I consider this to be highly confusing and I have reported it as a bug.

The bitmaps

If you look back at the execution plan I showed in the previous post, you might realize that just adding these two pushed-down predicates into the two Clustered Index Scan operators would have been sufficient. No other change is needed, and the joining and filtering part of the query would have been very straightforward: read only the small stores and the black products, then read all FactOnlineSales data and join everything. A lot of the rows read from FactOnlineSales would not be for black products or for small stores, so those rows would be removed from the intermediate results after the joins (which are all defined to do the Inner Join logical operation, which removes non-matching data from the results). The end result would be totally correct.

And yet, the optimizer has decided to do something else. It introduces two extra operators, called Bitmap, with “Bitmap Create” as the logical operation. That smells like extra work, and whenever I see something in an execution plan that smells like extra work I want to know exactly what it does so I can understand why the optimizer introduces it.

Bitmap #5 – creating a bitmap

As implied by its logical operation, “Create Bitmap”, the Bitmap operator creates a bitmap. (I assume that this is the only logical operation that the Bitmap operator supports, since I have never seen a Bitmap operator with a different logical operation). A bitmap is a structure that stores Boolean values for a consecutive range of values in a small amount of memory. E.g. the range from 1 to 8000 has 8000 possible values. These can be represented as 8000 bits in just 1000 bytes. For each value, the location of the corresponding bit can be computed by dividing the value by 8; the dividend is the location and the remainder determines which of the bits to use. The value of that specific bit can then be tested, or it can be set to false (zero) or true (one).

The bitmap is named Opt_Bitmap1005 in this case. This name is not exposed in the quick property popup, but you can find in in the full property sheet as shown here, in the Defined Values property. You will also note that this bitmap is not included in the Output List property. That’s because this bitmap is not created for each individual row; there is a single bitmap that accumulates information from all rows. Other operators in the execution plan can reference this bitmap by name, even though it is not passed to them in their input rows.

You will also notice this operator uses hashing (sorry to those who are still recovering from last month…). It has a Hash Keys property, which is set to StoreKey in this case. The hash function here is not the same as the one used in Parallelism operator #6 (see previous post), as the hash function for a bitmap operator needs a much larger range than the eight-value range used to direct rows to threads. It may or may not be the same as the hash function used in Hash Match operator #4; these details are not published by Microsoft and not exposed in the execution plan in any way.

So here is what this operator does. When it is initialized, it creates a bitmap that is large enough for the range of possible results of the hash function, and it sets all bits to false (zero). Then, every time GetNext() is called it reads a row from its input, hashes the StoreKey to find a location in the bitmap, and sets the corresponding bit to true (one). It then passes the row, unchanged, to its parent operator. This repeats until all rows are processed.

Simple example: Let’s say that only three stores in our database have less than 30 employees. These stores have StoreKey 17, 38, and 82. The first row the Bitmap operator receives is for StoreKey 17; it hashes the number 17 and the result is 29, so bit 29 in the bitmap is set to one. It then reads the next row, for StoreKey 38; hashes the number 38 which results in 15; and sets bit 15 to one. The third row it reads is for StoreKey 82; due to a hash collision this number also hashes to 29 so it will once more set bit 29 to one (effectively not changing anything). There are no more rows so the end result will be a bitmap with bits 15 and 29 set to one and all other bits set to zero.

The bitmap is not used here. At this point it appears that SQL Server is doing lots of extra stuff for no good reason at all. Trust me, the reason for this extra work will become clear later.

One final note. I described the above as if this is a single-threaded operation. However, the execution plan runs in parallel. This means that the actual implementation is a bit more complex then suggested here. In fact, I do not even know exactly how a bitmap in a parallel execution plan works. One possibility is that there is a separate copy of the bitmap for each thread; each instance of the Bitmap operator then writes to its own copy without interference from the other threads, but operators that read the bitmap must then combine the bitmaps from all threads. Another possibility is that there is a single copy of the bitmap that is used by all threads at the same time; this necessitates additional complexity to ensure that the threads do not run into race conditions as they update the bitmap, but makes reading from the bitmap a lot simpler.

Clustered Index Scan #15 – using the bitmaps

I will now skip a lot of the steps in the execution plan. The two Hash Match operators, the Clustered Index Scan on the DimProduct table, and all the Parallelism operators, are all already explained in the previous post. And Bitmap operator #10 is doing effectively the same as Bitmap operator #5, except that in this case the hash is computed off the ProductKey column and the resulting bitmap is called Opt_Bitmap1004.

To understand why these two Bitmap operators are included, I will focus on the Clustered Index Scan operator (#15) that actually uses these operators. If you look at its properties, you see that this operator now also has a Predicate property. You would have expected this if the query contained any direct filter on any columns from this table but that is not the case. And the predicate looks different from what you normally see. There are two conditions here. After removing some extra parentheses and brackets and simplifying the tablename references, the second one reads “PROBE(Opt_Bitmap1005, fos.StoreKey, N'[IN ROW]’)”. But what exactly does this mean?

A PROBE function call in a predicate means that the operator has to verify a bitmap for each row. The first parameter specifies the bitmap. In this case it is Opt_Bitmap1005, which was created by Bitmap operator #5. The second parameter indicates which column should be hashed to find a bitmap location. If the third parameter is “IN ROW”, this means that the actual processing of this PROBE check is done within the storage engine, before returning the row to the operator. Without this IN ROW indication, all rows would be returned from the storage engine to the operator and the operator would then check the bitmap to determine whether or not the row is passed.

Earlier in this post I walked through the bitmap creation process with a simplified example using just three stores with less than 30 employees: 17, 38, and 82. This resulted in a bitmap with bits 15 and 29 set to true and all others set to false. Let’s now see how this bitmap is used as some of the rows from the FactOnlineSales table are read. The first row has StoreKey value 29. This value is hashed and the result is 46; the storage engine checks bit 46 in the bitmap and sees that it is false. This row is rejected and not returned to the scan operator; the storage engine immediately moves to the next row. This second row has StoreKey 38. We already know that 38 hashes to 15, and since bit 15 is set this row qualifies. Assuming it also qualifies the second PROBE predicate, it will be returned to the Clustered Index Scan, which then returns it to its parent. This row will now traverse through all the join operators and end up in the aggregation part of the execution plan, as it should.

A PROBE predicate can also produce false positives. Let’s continue our example with the third row, which is for StoreKey 65. There happens to be a hash collision between 38 and 65, so the value 65 also has 15 as its hash result. This row, too, will qualify and be returned (again assuming the second PROBE predicate allows it to pass as well). But this will not result in incorrect results. The row will be joined to some Product data in Hash Match #9, but in Hash Match #4 there will be no row from DimStore that matches StoreKey 65, so the row will still not be returned.

False negatives are not possible. If a FactOnlineSales row has a StoreKey value of 17, 38, or 82, the result of the hash function will always be 15 or 29; these bits are set so these rows will all qualify. And similarly, if they are for black products then they will also qualify the other PROBE predicate, so these rows will certainly be returned,

So what the two PROBE predicates in this query effectively do is: use a hash on both the StoreKey and the ProductKey to find a location in the two bitmaps, then check to see if that location is a hash value with at least one qualifying store or product. All qualifying rows will always match this filter. Some non-qualifying rows might also match, due to hash collisions, so the rest of the query still has to ensure that these are removed. (In this case that happens automatically due to the inner joins; sometimes an explicit Filter operator is used for this). But most non-qualifying rows will be removed in the scan.

As you see in the screenshot above, only 1.9 million rows were returned from the Clustered Index Scan, instead of the 12.6 million that would have been returned if the optimizer had chosen not to use bitmaps in this case. (And in case you wonder about false positives: I checked and there were exactly 9 of them for this quey – that is how infrequent hash collision are in the actual hash functions used).

In the screenshot above you may also notice a confusing Number of Rows Read property. The operator is a scan; it really reads all rows from the table. If you run the query with SET STATISTICS IO ON, you will see a number of logical IOs that matches the number of pages in the table. Trust me on this, the Clustered Index Scan operator #15 really reads all 12.6 million rows that are in this table and the reported number is wrong. I have filed a bug report for this.

Wrong estimate?

The wary reader might have noticed in the last screenshot that there is a huge difference between the estimated and the actual number of rows. The estimate is 12.6 million – that is, it is equal to the number of rows in the table. Looking at the execution plan it appears as if the optimizer estimates that every row in the table will match the bitmaps and be passed on to the parent operators. But based on that estimate, why would the optimizer even bother? If that estimate were correct, the bitmaps would not provide any cost saving, while still introducing extra overhead in the execution plan.

But there is more. If you look at Parallelism operator #13, you will see that this operator has a much lower estimated number of rows: just 1.5 million. But there is no predicate on this operator, so how can the optimizer estimate that this operator, that without predicate would always pass along each row unchanged (just on a different thread) would reduce the number of rows from 12.6 million to 1.5 million?

The answer to these questions have to do with the exact sequence of steps the optimizer takes when creating an execution plan. Most of the decisions are taken during the main optimization phases. But once a plan has been chosen, a final “cleanup” phase runs. And one of the things this cleanup phase does is: pushing PROBE filters down into a scan where this is possible.

So here is what the execution plan would have looked like if the cleanup step had not pushed this probe filter into the clustered index scan (sorry for the switch to the old execution plan artwork, but I spent two hours doctoring up a fake screenshot a few years back and I don’t feel like repeating it).

The Clustered Index Scan reads the entire table and returns all rows, because no filter is pushed down (yet). The estimate matches that (to six digits accuracy). The filter then applies the bitmaps to reduce the rowcount to 1.9 million, with an estimate of 1.5 million, which is within the normal margins of error. The Parallelism operator does not affect the estimated or actual row count at all, as expected.

But the cleanup step does execute. It realizes that the probe in the Filter operator can be pushed into the Clustered Index Scan. And now the Filter itself does nothing so it is removed. And the Microsoft developers who built this didn’t take the extra steps to change the original estimate of 12.6 million rows for the Clustered Index Scan to the 1.5 million rows that were actually estimated to be remaining after the filter. Probably because they figured that the estimated number of rows is only relevant for plan selection, which by this time has already concluded. Maybe someone should tell them that consultants actually look at estimated and actual row count as part of their tuning?

Credit where credit is due: I did not know the above, and I probably never would have found out, until a few years ago Paul White (b|t) explained the process. I do not know if this was in a blog post, in an email, or in any other way. I have tried to find the source (and link to it if possible) but I was unable to. That’s why I decided to reproduce it. If Paul’s explanation of this process is available somewhere, then please indicate this in a comment and I will update this post with a link.


In this post I looked at the execution plan of a query that joins two dimension tables to a fact table, with filters on the dimension tables. Because the filters are not directly on the fact table, normal methods of pushing down predicates work only for the dimension tables. But the fact table is the large table, with millions of rows – this is where you WANT pushdown to happen!

In order to make that possible, SQL Server uses bitmaps. The Bitmap operators that create the bitmaps use hashing to map the values that are needed to a hash value, then set a bit in the bitmap to mark that value as “required”. Now the Clustered Index Scan on the large fact table can use the same hash function on the corresponding columns, check the bit in the bitmap, and thus filter out a huge amount of data very early. There is still the risk of “false positives” due to hash collisions, so the rest of the execution plan still has to implement any filters that the query requires.

There is a cost associated with this pattern. There is some additional startup cost, reason why you will never see this pattern when querying small tables. The Bitmap operators take time, and the bitmap itself uses up memory. Plus, the probe action in the clustered index scan also uses CPU cycles as the hash function has to be executed for each row. But if sufficient rows are filtered out early, then the cost saving in the rest of the plan, where now far less rows are processed, greatly outweigh that cost.

This post illustrates the most common usage of bitmaps (and hence the Bitmap operator) in SQL Server execution post. There are other situation where bitmaps are used. The PROBE function is often used in the predicate of a scan, but I have seen it on other operators as well, so make sure to find exactly where each bitmap is created and where it is then used if you want to understand what a bitmap is used for in each specific plan.

This post was a small diversion from the normal style of my plansplaining posts, because it focuses mostly on a single operator. Next month I will return to the normal style. I will look at a query that uses the OVER clause to get aggregated data and detailed data in a single result set. Each of the operators used in its execution plan by itself is relatively easy to understand, but their interaction is very complicated.

But let me repeat this: I do not want this series to be about only my interesting plans. Reader input is greatly encouraged! If you have ever run into a pattern in an execution plan that appeared to make no sense until you dug into the gory details, or even into a pattern that still makes no sense to you, let me know and I will consider it for inclusion in a future post.


The Segment operator, an error in Books Online

If you are as obsessed with execution plans as I am, you must be familiar with this page in Books Online: Showplan Logical and Physical Operators Reference. I must admit that I have mixed feelings about it. My problem with this page is the confusing way in which it mixes physical operators, logical operations (called logical operators on that page to add to the confusion), and operators that only exist in intermediate phases of query optimization and that we will never see in actual execution plans. But I do like it for presenting all operators that exist, with a short description. And though the descriptions are very short and incomplete, they are at least correct. Well … usually.

The Segment operator

Let’s first take a look at a simple query on the AdventureWorks sample database:

The execution plan for this query looks simple enough:

The Index Scan operator, which has its Ordered property set to True, is guaranteed to return rows in index key order. That means that the rows come from this operator ordered by TerritoryID and CustomerID, which is the ideal ordering for this query. The rows flow through a Segment operator that we will look at shortly, and then to a Sequence Project operator that computes the ROW_NUMBER() expression (this can be seen in its Defined Values property). The final results are then returned to the client.

The reason that a Segment operator is needed is that Sequence Project is very good at counting rows to find the ROW_NUMBER, but that it has no idea how to apply the PARTITION BY specification that we requested. That’s what the Segment operator is for.

Books Online

The Segment operator, like all operators, is described at the Books Online page mentioned above. Here is the description, quoted verbatim:

Segment is a physical and a logical operator. It divides the input set into segments based on the value of one or more columns. These columns are shown as arguments in the Segment operator. The operator then outputs one segment at a time.

Looking at the properties of the Segment operator, we do indeed see the argument mentioned in this description, in the Group By property (highlighted in the screenshot). So this operator reads the data returned by the Index Scan (sorted by TerritoryID, which is required for it to work; this is why the Index Scan operator is ordered to perform an ordered scan), and divides it into segments based on this column. In other words, this operator is a direct implementation of the PARTITION BY spefication. Every segment returned by the operator is what we would call a partition for the ROW_NUMBER() function in the T-SQL query. And this enables the Sequence Project operator to reset its counters and start at 1 for every new segment / partition.

The problem

Now that we have seen what the operator does, let’s discuss why I consider the description in Books Online to be inaccurate and misleading. The problem is in the last sentence: “The operator then outputs one segment at a time”. It may be a language thing (English is not my native language) in which case this entire post becomes a non-issue – but to me, when I read that sentence I can’t interpret it in any other way as to mean that the Segment operator processes input data until it has a complete segment, and then returns all rows belonging to that segment before returning to the input. So this would mean that it would first read all 3,520 rows that have TerritoryID = 1 from Index Scan before returning the first, then return the remaining 3,519 rows from that segment without calling Index Scan, and then repeat that pattern for the next territory. In other words, Books Online describes Segment as a partially blocking operator.

By default, every operator in an execution plan has some memory available for storing the  current state, the current row, and some other housekeeping data. That memory is not enough to store 3,520 rows. So if the Books Online description is accurate, the Segment operator would need to store these rows somewhere else.

… but where?

Execution plan operators have only two options for storing data beyond their standard small working area. These options are: in memory, or in tempdb.

If an operator has to store extra data in memory, then the optimizer will estimate the amount of memory needed and record that in the execution plan as the memory grant. In an actual execution plan, this memory grant is visible on the top left node. The property is always shown what at least one operator in the execution plan needs additional memory beyond the normal working area.

When an operator uses tempdb to store data, it allocates either a worktable or a workfile in tempdb. All I/O to this structure is then visible if you run a query with SET STATISTICS IO enabled. This is not conditional – when an operator as much as has the option to need to use tempdb, it will always show the worktable or workfile in SET STATISTICS IO. Even when, at run time, the work area was not needed. That’s why any execution plan with a Sort operator always shows a worktable in SET STATISTICS IO, with all numbers equal to zero unless the Sort had to spill to tempdb.

The execution plan for the query used in this post does not show a Memory Grant property on the SELECT node. And there is no worktable or workfile in the SET STATISTICS IO results. So the only logical conclusion is that the operator does NOT store the 3,520 rows of territory 1. Which in turn means that is has to return them as it reads them – Segment is not in any way blocking; it is definitely a fully streaming operator.

In case the above does not convince, here is another way to explain this. The video above shows the query above executing with the Live Query Statistics feature, slowed down extremely so it is easier to follow. You can clearly see that the Segment operator does not do any buffering at all. All rows it receives are immediately passed to the Sequence Project operator; the entire execution plan is flowing and there are no blocking or semi-blocking operators.

What it actually does

Now that we have established that the text in Books Online does not accurately describe how the Segment operator works, let’s try to find out how it does work. A firm clue is available in the property list of this operator, as shown in the screenshot.

The property “Segment Column” reports a column name. And this column, Segment1003 in this case, is included in the Output List property of the operator; this means that this column is included in the rows that are returned to the Sequence Project operator.

This column is the secret key. When the Segment operator reads a row, it compares the values in the Group By columns to those of the previous row. If they have changed since the previous row (or if there is no previous row because this is the first row processed), the Segment Column is set to a specific value (probably 1). For all other rows, the Segment Column has a different value (e.g. 0). The result is that the Segment Column can be used to very easily identify the row that marks the start of a new segment.

The Sequence Project is one of a number of operators that are “segment aware”. Being segment aware means that the operator responds in a smart way to segment columns that are in the input stream. When the value in the segment column marks the start of a new segment, those operators respond to that in a smart way. In the case of Sequence Project, the response to the start of a segment is that it resets its internal counters and restarts the row counting at 1, which results in the ROW_NUMBER() column resetting in the result of the query.

Improved text

Based on this, the text in Books Online should be changed to better reflect how this operator actually works and to avoid confusion over this operator being blocking or not. Here is my proposed replacement text:

Segment is a physical and a logical operator. It divides the input set into segments based on the value of one or more columns. These columns are shown as arguments in the Segment operator. The operator then marks the start of each new segment in a specific Segment Column that is added in the output stream.


Books Online is a fantastic resource. But, like any man-made product, it may contain errors. I recommend using Books Online whenever appropriate. But as with anything you read anywhere: no matter how authoritative the source is, you should never take it at face value. Always use your common sense and never forget that to err is human. Trust, but verify!

T-SQL Tuesday #101: A few of my favorite tools

Back in 2009, Adam Machanic (b|t) started an initiative that is still going strong: T-SQL Tuesday. This is a monthly blog party. Every second Tuesday of each month, lots of people blog about the same topic, selected by the host for that month. And now, in the 99th installment, I decided to finally join in!

The April 2018 edition is hosted by Jens Vestergaard (b|t). His challenge for this month: write about the tools, other than SSMS and VSDT, that we use for our work.

Essential tools

The full invitation even includes words such as “essential” tools, and “depend on said tool”. Well, that narrows it down pretty extensively in my case. Sometimes I get to work on my own machine, when customers give me access to their database or just send me a copy, or send me only an execution plan to work on. But often I have to work on the tools that the customer has installed, and I do not want to force any customers to first have to install my favorite tools before I can even start helping them.

I have always made sure that I can do my work using just the standard tools. That does not mean that I don’t like using tools. There are a few tools on my own computer that I will install on any new machine because of how much time they save me. But I do not depend on them, they are not essential. When a customer has different tools, or even no tools at all, I am still able to do my job – though, perhaps, sometimes a bit slower.

Code formatting

When I am asked to tune a slow running query, I mostly need to look at two things: the query and the execution plan. And while the T-SQL language can be hard to learn for someone not yet familiar with the concept of set-based processing, the actual number of keywords is quite small. So once you get the hang of it, reading queries is not that hard. When they are properly formatted.

But for some reason, not all queries are properly formatted. Some developers just don’t understand the One True Formatting Style ™. (In fact, I think only a single person in the world uses the OTFS and that’s me). That can make their code a bit harder to read. Or, depending on the style they do use, even more than just a bit.

Please don’t do this at home

Or even worse, you have code that is (hopefully!) generated by some tool and does not care about formatting at all. I have seen entire queries on a single line!

SQL Prompt

My standard method to quickly unravel the secrets of ill-formatted and nigh on unreadable queries is to paste the query in my SSMS window and hit Ctrl-K – Ctrl-Y. This is the keyboard shortcut for SQL Prompt to reformat the code. SQL Prompt is a tool that has a lot of functionality, but its code formatting is what I use most and what saves me the most time. Now I will grant you that this tool, too, does not understand the OTFS. But any formatting style is a huge improvement over some of the code I receive.

Other tools

But as I said in the introduction, I do not depend on just a single tool. If a customer wants me to work on their own infrastructure, I will. And that means I often do not have SQL Prompt available.

In those cases, when I am confronted with an ill-formatted query, I typically open an internet browser, go to a search engine and type “SQL formatter”. There are several webpages that allow me to paste in some SQL and have it reformatted for me. I just click one and go with what I get back. As mentioned before, when I need a query to be reformatted I do not care much about what style it gets, as long as the readability improves I am happy.

Execution plans

For digging into the execution plan, I usually just use SSMS. There are lots of tools that are targeted at making it easier to work with execution plans but I am mostly happy with what SSMS has to offer and I find that I have little need for tools.

Plan Explorer

There is, however, one tool that I do install on every machine I own, and that I would mention to customers if they asked me for a recommendation: SentryOne’s excellent Plan Explorer. I like this, and I regularly switch to it, because it has some very nice features that are not present or harder to find in SSMS.

However, unlike some other people I know Plan Explorer is not my default tool. Even on machines that have Plan Explorer available, I still use SSMS as the standard tool for working with execution plans and only switch to Plan Explorer when I need one of its improved features. And if I had to choose one execution plan tool for the rest of my life, it would be SSMS.


The reason that I prefer SSMS over Plan Explorer is that, if I want to really understand an execution plan, I need to look at the properties of the operators. All of them.

Both tools have a window that pops up when I hover over an operator, and that shows some of the properties. Not all. And I cannot select which properties are shown, that is chosen by the tool’s creators.

Both tools also allow access to all other properties, but they do so in a different way. In Plan Explorer, I need to switch away from the Plan Diagram tab to another tab (Plan Tree or Top Operations), then go to the column chooser, add properties I want to see, and then try to find the operators I am looking for in the list. And I need to repeat that for every property (of which there are far too many to just leave all of them always included in the list). This interface is absolutely fantastic for comparing the value of the same property for different operators in the execution plan, much easier than with SSMS. But for exploring and understanding the plan, it is cumbersome.

In SSMS the process is far simpler. I right-click an operator, select “Properties” and then pin the properties window. I then only need to click on a different operator and the window immediately changes to reflect the properties of that operator. Even the ones I did not yet know exist for that operator. For looking at an execution plan and trying to understand how exactly it works, this is a very simple and accessible method. But, obviously, it does not offer the at-a-glance value comparison that Plan Explorer has.


In this post I talked about some of the tools that I use for my daily work. But they do not qualify as “essential” for me. I do not want to depend on any tool, because that would become annoying if I ever need to work for a client where that tool is not available.

I never spent a lot of time looking for tools. The tools I use (and describe above) are not tools I selected after comparing several candidates. They are the tools that, at one point in my career, I “ran into” and then liked enough that I kept using them. Perhaps there are other tools out there, way better than the ones I use, and I just don’t know it yet.

That’s why I am happy with Jens’ choice of subject for this T-SQL Tuesday. I will definitely read through many of the other posts and see if there are any gems out there that I do want to add to my toolbox.

However, no matter how good they are, I will always refuse to become dependent on any tool.

Plansplaining, part 4. Let’s repartition the streams.

This is the fourth post in the plansplaining series. Each of these blog posts focuses on a sample execution plan that exposes an uncommon and interesting pattern, and details exactly how that plan works.

In the first post, I covered each individual step of each operator in great detail, to make sure that everyone understands exactly how operators work in the pull-based execution plans. In this post (and all future installments), I will leave out the details that I now assume to be known to my readers. If you did not read part 1 already, I suggest you start there.

In episode #3, I promised that today’s post would be about bitmaps. However, bitmaps work only in parallel plans, and as I was working on that post I noticed that I needed to include a lot of explaining of the parallelism itself. So much, in fact, that just covering the parallelism issues by itself already makes for a lengthy post. So I decided to slightly change the example and focus on only parallelism today. Next month I will really cover those bitmaps!

Hashing explained

A lot of the operators I touch upon in this post use some form of hashing. So here, for those who do not know this already, is a very short (and mostly simplified) explanation of what hash functions and hash tables are.

A hash function is a function that takes any input and converts it to a number in a predefined range, in such a way that the same input always results in the same number. Different inputs will sometimes also result in the same number; this is called a hash collision. Hash functions with a large range are typically optimized to minimize the chance of hash collisions. For hash functions with a small range, hash collisions are unavoidable. These hash functions are typically optimized to result in a balanced distribution.

A hash table is a table where rows are stored, organized by the result of applying a hash function to one or more columns. For each possible result in the range of the hash function, a container for rows (called a hash bucket, or just bucket) is allocated. All rows for which the hash function resolves to 1 are stored in bucket number 1; rows for which the hash function is 2 go to bucket 2, etcetera. Rows with the same values in the input of the hash function will always land in the same bucket. Rows with a different input for the hash function will usually land in different buckets, but may still end in the same bucket if there is a hash collision.

When SQL Server uses hash tables, they are stored in memory. If the amount of data exceeds the allocated memory, tempdb will be used as a temporary holding place for overflow data; this is called spilling and it generates “hash warnings” that are shown on the execution plan and can be caught in Extended Events or Profiler Trace sessions. A detailed discussion of hash table spills is beyond the scope of this post.

Sample query

I normally like to take all my examples from the AdventureWorks sample database. But that database is too small to get parallel execution plans without setting the parallelism threshold to some weird non-representative low value. So I decided that for this post I am going to use one of Microsoft’s other standard demo databases, the ConsosoRetailDW sample database. This will also allow me to add in the bitmap operators in the next episode with only a small change in the query, something I could probably not achieve in AdventureWorks.

If you run this query in ContosoRetailDW, you will see a rather large execution plan. In this post I will focus on the right-hand side, where the three tables are read and joined. (The left-hand side is where the aggregation happens, using a local-global aggregation pattern that I already blogged about in 2016).

Below is what the relevant part of the execution plan looks like on my laptop. (I added the small numbers in red, so I can easily reference each individual operator; they correspond to the Node ID property of each operator).

Since this query is processing over 12 million rows from the FactOnlineSales table, joining them to the two dimension tables and then doing aggregation on them, you will be happy to see this run as a parallel plan. But are you also happy with the four “Parallelism (Repartition Streams)” operators in this plan segment? Do you know what they do and why they are there?

Here is how Books Online describes what Parallelism (Repartition Stream) does:

The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed.

Ummm, yeah. So basically, this operator swaps the rows around between threads in a parallel plan. Rows are in one thread and are now kicked onto another thread. But this does not really explain why this has to be done four times in between just five other operators that are doing the actual work. And considering that the estimated costs indicate that 20% of the total query effort is expended here, you probably want to know if this is really needed.

Step by step

Let’s dive in, try to understand how all the operators in this execution plan cooperate, and see if that explains the Parallelism operators. I’ll start at the top left of the plan section shown above: Hash Match #4. As you probably know, the Hash Match operator runs in multiple phases. The first phase, called “build phase”, runs when the operator is initialized. During the build phase, the operator requests rows from its build input (shown on top in the graphical execution plan) and stores them in memory, in a hash table. However, before rows are even returned to this operator, a lot of things have already happened. So let’s first look at that.

Clustered Index Scan #6

To understand the logic of the build input, I’ll follow the data flow. It starts with Clustered Index Scan #6. By itself, this operator is not really that interesting. It simply reads all rows from the DimStore table and passes them to its parent operator. And, as always, this does not happen by itself, it is pull-based so it will return a row only when asked.

What’s different this time is that this operator is running in a parallel section of the execution plan. And the scan operator is “parallelism-aware”, which means that it changes its behavior when running in a parallel section. (Note that most operators are not parallelism-aware; scan operators, as well as seek operators that do range scans, are really the exception.)

To understand the behavior change, let’s first recall what a scan operator does in a serial plan. When called it returns the next row from the page it is currently working on. When it has processed all rows from a page and is requested to return another row, it reaches out to the storage engine to request the next page, from which it can then return a row. The calls to the operator are once per row, but the operator itself reaches out to the storage engine just once per page.

In a parallel plan, there are multiple copies of this same Clustered Index Scan operator, all working at the same time. Each of them channels their requests to the Storage Engine, and because this is a parallel plan these requests now go through the so-called Parallel Page Supplier. The Parallel Page Supplier simply always hands the next available page to the first thread that requests a new page. Putting this together, the calls to the scan operator are still once per row, but now on each of the individual threads. The calls to the storage engine are still once per page, coming from multiple threads in parallel. And the Parallel Page Supplier coordinates between these threads to ensure that every page is eventually handled by exactly one of the threads.

This effectively results in a demand driven page-by-page distribution of rows among threads. Because it is demand driven, it is impossible to predict what rows will go to which thread, so the distribution of rows among threads can be considered as “sort of random”. The screenshot I included shows how many rows were on each thread when I executed the query. But if I run the query again, the number of rows per thread will be different.

Parallelism #5 – shuffling the rows

This random distribution of data over threads is rarely acceptable. The optimizer usually needs to exert more control over how rows are distributed among the available threads. That’s what Parallelism operator #5 in this execution plan is for. As explained above, this operator reads rows from all the input threads and returns them on the “correct” thread.

Which thread is correct is defined in the plan properties. In this case, the Partitioning Type shows that a hash-based distribution is used, and the Partition Columns property shows that the hash function uses the StoreKey column from the DimStore table as its input. In most cases where an execution plan uses hashing it is impossible to find out any specifics about the hash function used. The Parallelism operator is an exception to that, though. Since the result of the hash function determines what thread the row goes on, and since this query runs on eight threads, this operator has to use a hash function with a range of eight values. On your system, the hash function’s range will be determined by your degree of parallelism.

This operator changes the distribution of rows from (semi-)random to deterministic. No matter how often I run the query on my laptop, the distribution of rows among the eight threads will always be the same. That by itself though is not sufficient reason to introduce this operator. After all, why would I care which of my CPUs processes each individual row, as long as the end result is the same? I have seen situations where a Parallelism (Repartition Streams) operator was added for the sole purpose of flattening out a very skewed distribution of rows between threads, but that is not the case here.

Hash Match #4

To recapture the explanation so far, I have described how Hash Match #4, during its initialization, repeatedly requests rows from its build input. This causes the Clustered Index Scan to read rows from the DimStores table. These are initially assigned to threads in a demand-driven way, but the Parallelism then forces each row to a specific thread as determined by the hashing the StoreKey value, for a reason that should become clear later.

The Hash Match operator itself stores all the rows it receives from its build input in memory, in a hash table. The Hash Keys Probe property shows that a hash function on the DimStore.StoreKey column is used for this. When the build phase is completed, all stores retrieved from the build input are stored in the hash table but no rows are returned yet.

It is important to note that, even though Hash Match #4 and Parallelism #5 both use a hash function on the same column, it is not the same hash function. The Hash Match operator prefers a hash function with a rather large range, in order to minimize the chance of hash collisions. The hash function used in Parallelism #5, with a range of just eight values, would never be chosen for a Hash Match.

The second phase, called “probe phase” then starts when the Hash Match operator receives its first GetNext call, typically immediately after the Initialize call returns. During the probe phase, the operator requests rows from its build input (the bottom input), applies then applies the same hash function as in its build phase but now to the value of FactOnlineSales.StoreKey (as seen in the Hash Keys Probe property). It then goes to the bucket identified by the hash and looks for matching rows in that bucket only. Matching rows are returned; when no more matches are found the next row from the probe input is tried. (And as always, control returns after returning a row, and resumes where it left on the next call).

The same StoreKey value, regardless of which table it is coming from, always results in the same hash value. So the algorithm only has to search for matches in that single bucket. That means it has to look at a few rows only – the “return on investment” for the (rather expensive) creation of the hash table. In most cases the Hash Match has to do an additional test on the join predicate, found in the “Probe Residual” property, to prevent bad matches being returned due to hash collisions. In this specific case there is no Probe Residual because, for integer columns such as StoreKey, the optimizer can find hash functions that are 100% collision free.

However, the description above is for a Hash Match operator in a serial plan. In this case the plan is running in parallel. Does this change how the Hash Match operator works? The answer to that question is “no”. Like most other operators, Hash Match is not “parallelism aware”. So it does not care or even know, that there are seven other copies of this operator, on the other seven threads, doing exactly the same work. Each of the eight instance of this operator simply creates its own hash table, fills it with the data it receives from the its own build input, then tries to find matches in it for the rows it receives from its own probe input thread. And it never sees any of the data handled by the other threads.

At this point you might be wondering how this can return correct results. Let’s say that a row from the FactOnlineSales table with StoreKey 185 is being processed on thread 3. The Hash Match operator that is running on thread 3 will read this row during its probe phase, but it will only find a match if the corresponding row from DimStore was also processed on thread 3. If that row happens to be processed on thread 5, then the Hash Match operator on thread 3 is blissfully unaware of its existence and would reject the probe input as not matching.

Since there are eight threads available, it appears that each match now only has a 12.5% chance of being detected. Obviously, since there are no headlines in the media about hugely incorrect results being produced by SQL Server, the optimizer has found a way to increase this chance to the 100% it should be. But we will if we look at one more operator: the Parallelism operator on the probe input.

Parallelism #7 – more shuffling

It is at this point not relevant how exactly the input for Parallelism operator #7 is produced. I’ll just start at Hash Match #8. Like most of the plan, this operator runs in parallel. And apparently the optimizer is unhappy with how the rows are distributed over the threads, so it has inserted the Parallelism operator to repartition the streams as it sees fit.

As seen in the properties, this Parallelism operator also uses hashing to determine the proper thread for each row, just as its brother #5. And this hashing function also uses a StoreKey as its input, but in this case it is the StoreKey from the FactOnlineSales table.

What is important to know (and you really need to know this, as this is not exposed in any way in the execution plan) is that this Parallelism operator uses the exact same hashing function as its brother, Parallelism #5. This fact is key to understanding what is happening here.

Going back to the example above, I used an example row from FactOnlineSales that has StoreKey 185 and that “happens to be” processed on thread 3. But it is now clear this thread assignment is not just coincidence. When Hash Match #4 asks for a row from its probe input, it is first being reassigned to a thread (in reality Parallelism operators are more complicated then this suggests, but that is out of scope for this post), and this thread is determined by hashing the StoreKey value. This means that all rows from FactOnlineSales that have StoreKey 185 are guaranteed to all go to thread 3. And, even more relevant: the row from DimStore that has StoreKey 185 was also shuffled to another thread, in Parallelism #5. But that operator used the same hash function, which we already know results in 3 when the input is 185, so this row from DimStore is on thread 3 as well.

Back to Hash Match #4

In other words: two parallelism operators (#5 and #7) both shuffle rows between threads. But they don’t shuffle at random. For rows from DimStore, the thread is determined, using some complex function, by their StoreKey. And for rows from FactOnlineSales (or rather, from the results of joining FactOnlineSales and DimProduct), the thread is determined by applying the exact same function to FactOnlineSales.StoreKey. As a result, it is 100% sure that all rows that can match based on the join condition between FactOnlineSales and DimStore are allocated to the same thread.

The Hash Match operator can just do its own thing, in its own world, without having to worry about other instances on other threads. The optimizer has guaranteed, by inserting some additional operators, that Hash Match will always find the rows it needs on its own thread.

The rest of the plan

In the explanation above I have skipped past everything that happens in Hash Match #8 and the operators it calls. But you should now see that a very similar pattern is used here. The build input of Hash Match #8 reads rows from DimProduct, then uses a hash function on ProductKey to assign them to a known thread. In that thread it is added to the hash table by Hash Match #8. The probe phase then reads FactOnlineSales and reshuffles the rows based on the ProductKey in this row, to ensure that these rows are on the same thread where a matching DimProduct row would be. The Hash Match can then find, join, and return the matching rows.

The results of Hash Match #8 will still be on the thread that is determined by hashing ProductKey (from either table,  as they are the same in the joined results). There is no known relationship between ProductKey and StoreKey, so it is unlikely that this thread is the correct thread for Hash Match #4. That’s why Paralellism #7 is needed.


As should be clear now, all parallelism operators are actually needed. Parallelism operators #9 and #11 are needed because both their inputs are distributed semi-random, and Hash Match #6 needs its inputs to be distributed based on ProductKey. Parallelism operators #5 and #7 are needed because one of them has a semi-random imput distribution and the other has an input distributed based on Product Key, but Hash Match really needs both inputs to be distributed based on StoreKey. Leave out just one of these parallelism operators, and the execution plan would produce incorrect results.

The parallelism operators do contribute to the total cost of the execution plan. But that is not unusual in parallel plans. Coordinating the work always takes extra effort. The total CPU usage of a parallel execution plan is almost always higher than the total CPU cost of an equivalent serial plan. But because more threads work on it concurrently, the elapsed time does go down and the query returns quicker. The query shown in this post uses over 16.4 seconds of CPU time on my laptop but completes in less than 4 seconds. When I force it to go serial, the  CPU usage goes down to 9.5 seconds, but the execution time is now also 9.5 seconds.

The overhead of parallelism is not always as explicitly visible as here. But it does always exist and you should be aware of it and make a conscious choice. Is this query important enough that having it finish in 4 instead of 9.5 seconds is worth the extra resource usage (that will hurt all other concurrent workloads)? Or do you value overall performance more, and decide to take the hit on this query? The correct action depends on who executes this query, how often, for what reason, at what time, and what else is running at that time. This is the part of our job where we really need to understand the business!

As promised in the introduction, plansplaining part five will finally look at the Bitmap operator that I already promised for this episode. However, I still like to see your input too! If you have ever run into a pattern in an execution plan that appeared to make no sense until you dug into the gory details, or even into a pattern that still makes no sense to you, let me know and I will consider it for inclusion in a future post. (When possible, please make sure that you can provide either a repro script or a copy of the actual execution plan, saved as a .sqlplan file).


T-SQL Tuesday #100: Looking forward

The monthly blog party that is known as T-SQL Tuesday has hit 100th episode (and I only missed 98 of them!). In true Olympic fashion, it returns to its roots for its centennial celebration. In other words, our host this month is Adam Machanic (b|t) himself. And even though 100 is a perfectly valid number to look back, he decided that looking ahead is more fun. He asks all bloggers to whip out the crystal ball and predict what our world will look like in another 100 months, when (hopefully) T-SQL Tuesday #200 will be celebrated.

Looking ahead

I must admit that, looking at the news, looking at current trends in the world, it is easy to get very pessimistic about the future. But as a father of two fantastic children, I decided long ago that I can no longer afford to wallow in pessimistic visions of the future. So let’s all assume that we as the collective human race will not destroy our world in any of the dozens of ways we could do that. Let’s assume we will all be here in another 100 months, and be able to look back at these posts and laugh at how wrong we were.

So I will not look at politics, environment, social movements, or any other global trends. I will focus on trends in the nice shielded world of data professionals. And within that subset, I have decided to pick four key points, three because they are or have been trending in the last few years and one because that’s where my heart is.

Cloud computing

Cloud computing definitely has taken off. I will admit that I have been on the wrong side of this for way too long. I fully expected that most companies would prefer not to have their valuable data stored on someone else’s computer. But the benefits of cloud computing are too real, so cloud computing has taken off big time. Having your application and/or your data in the cloud means that your employees can connect to it from wherever they are, using whatever device they have, without you having to provide the infrastructure. This is great for enabling sales people, drivers, or support staff that gets sent to a client site. It also makes it a lot easier to offer remote working to your employees.

Because of those benefits, I now expect this trend to continue even more than it already has. In all the more technologically advanced areas of the world, I expect that in a few years we will all expect wireless internet signal to be available everywhere. This will be used by personal devices (we’d call them phones now but their name may change) that everyone has as a replacement for many different things we now need to keep: phone, car keys, house keys, agenda, wallet, tickets, remote control, etc. The device itself will have limited storage and computing; it will rely on that fast internet connection that is assumed to always be available. Internet outages will become REALLY interesting!

More and more companies will move towards serverless. The computers used in the offices will connect to cloud computers where all application logic is done and all data is stored.

There will be some exceptions to this. For companies that have really large volumes of data and really complex programs running, the cost of buying all that “as a service” will be prohibitive and it will be cheaper to have on premises hardware and the accompanying support staff. There may also be companies where the sensitivity of the data continues to stand in the way of moving to cloud. And some companies will keep critical applications on their own hardware because they do not want to run the risk of downtime when an internet provided has issues.

Overall, however, I predict that in 100 months most applications that are now hosted and maintained on premises will have moved to the cloud or someone is working on moving it there.

Artificial intelligence and machine learning

There has been a lot of progress in the AI area. One of the areas where I was recently really impressed is in voice recognition. I was in the car, navigation app open on my phone, which was on the dashboard. I clicked the speak button and muttered the name of a company and a city. In between the background noise of the engine already running and the radio playing, it recognized what I had said, found the address of the company’s offices, and provided me with driving directions (avoiding a traffic jam along the way).

However, I also noticed its limitations. When I got a call that the meeting had been cancelled, I said “Hey Google! Bestemming wissen” (which is Dutch for “erase destination”). It computed a new route for a three hour drive. I later found that this route was to the German city Wissen (written like the Dutch word for erase, but pronounced differently). Clearly, the system has not been programmed to recognize normal Dutch instructions to erase a destination and responded with the best match it could find.

These problems will disappear over time. Siri and Cortana already beyond speech to text transformation and are starting to get better at interpreting the meaning of spoken instructions. This trend too will continue. Remember those personal unified devices I mention in the previous paragraph? They will not have an on-screen keyboard anymore; they will be purely voice driven. And they will become smarter and smarter. Application developers may need to start looking for other employment as software becomes smart enough to understand a spoken request and find on its own the best way to fulfill it. There will still be demand for people to code the more complex applications but standard reports and dashboards will become entirely machine-generated.

But AI and ML is much more than just voice recognition and voice-based interaction. There is also a huge trend nowadays towards using machine learning algorithms to analyze huge collections of data to train a predictive model that is then applied to new cases to predict the most likely outcome. And in this area, I expect problems. Not because of the algorithms itself, but because of the people. When a computer pops up a conclusion, people tend to trust the algorithms and blindly follow the recommendations. So sales people will no longer call all leads, but only the 30% marked as most likely to respond. And when the prediction is wrong and a very viable prospect is never called, no one will ever know so this problem will not heal itself. Until things really start to go wrong. Until an AI algorithm at an airport points out the people in line that need to be inspected and fails to point out someone who then turns out to be a bad person with bad plans. Or until something else terrible happens and everybody points at anonymous algorithms in unknown computers that base their conclusion on recognizing patterns in thousands of known inputs.

The picture to the right (click to enlarge) is a screenshot that shows the result of feeding the T-SQL Tuesday logo to a publicly available visual recognition website. Yes, it recognizes Tuesday with 99% certainty. But all other weekdays are also mentioned with high 90s certainty. This is still a beta and scientists are working day over day to improve it. One day we will have software that uses the results as visual recognition as input to a prediction model trained by Machine Learning. And we will all blindly follow the recommendations – until the stinky stuff starts hitting the fan because either the recognition was incorrect, or the model has a bad prediction, or both.

My prediction is that, 100 months from now, many data professionals will be licking their wounds, managers will be doing damage control, and the world will start to exhibit a healthy distrust from machine-generated predictions for which we do not see exactly what reasoning has caused the prediction. Machine learning and predictive models will remain in use but only in areas where they are “safe”. In other areas their results will at best be treated as gentle recommendations.

Data protection and privacy

A subject that has always been at interest to data professionals but is now also getting a lot of attention at management level, thanks to the GDPR legislation, is protecting data from unauthorized access and protecting the privacy of people you collect data about. At this time a lot of companies are struggling to get GDPR compliant. Some will finish in time, some will finish a bit late, and a few are not even trying. Once the EU sets a few examples by dishing out harsh penalties for non-compliance, the remaining companies will scurry to either get compliant or rigorously shut out all EU customers so they don’t have to.

But other countries will follow the EU example of setting stricter guidelines (not all; some countries continue to believe in self-regulation). Customers will start to demand more privacy guarantees even in areas where less strict laws apply. And press coverage of large scale data breaches and privacy violations will continue to fuel that fire. In a 100 months from now, every serious company will have better data protection plans then they have now, and be more privacy-friendly in their terms and conditions.

But that does not mean that there will be no more scandals. Those, too, will continue. Just like neither laws nor change in cultural perception will ever fully root out drunk driving, there will always be incidents where data is leaked or privacy is violated. Sometimes because people tend to cut corners (usually for all the right reasons, but it can still backfire). Sometimes because companies try to save money (usually for the wrong reasons, because short-term gain is easier to explain to shareholders or directors than long-term safety).

I also expect that by that time, the GDPR restrictions that are now considered to be very strict will have been surpassed. At that time other countries will be working on legislation (or maybe already have it completed) that is stricter then GDPR. One area where I mainly expect additional rules is in the use of data collected about in individual as input to train predictive models. Because of the incidents described in the previous paragraphs, large percentages of the public will have become aware of machine learning. Many of them will not accept that “their” data is used without their consent to train these models, and politicians will start to push for regulation – either because they agree with those people, or because they want their votes for the next election.

Performance tuning and execution plans

So here is my precited future. Computing is done in the cloud, through personal devices that everyone is carrying and that are always connected. They use voice recognition, and most requests are interpreted by smart algorithms the way that currently the SQL Server optimizer figures out how to perform a task based on a query. Data is stored in that same cloud but subject to strict rules as to who can use what data for what purpose. Some companies will still have on-premises computing but it’s a minority. What does this mean for the future of someone like me: someone who specializes in query tuning, someone who loves to pour over an execution plan, find the reason of a slow response, and fixes it. Will I still have work?

I think I will, but my work will be different from what it is now. And I think a lot of people currently doing other development work will be moving to “my” new line of work.

The parallel I drew between computers responding to voice commands and SQL Server compiling an execution plan for a query is very deliberate. There will be large similarities between the two processes. When you say “Hey Cortana! My parents will be over tonight so come up with a good meal that they will like and have the shop deliver the ingredients to my home by 4PM”, the computer will have to parse the request to make sure it understands what you say, then come up with a plan to do this efficiently (identify who your parents are, fetch data accessible to you on food preferences and allergies of these people, select recipes that qualify, identify store near you that can deliver on time, place order). That plan may sometimes be flawed, either causing incorrect results or just processing slower than it should, and in those cases a root case needs to be found and a fix be deployed. That will be my job, and I will still be looking at execution plans – using operators that are slightly different from what we now see in SQL Server execution plans, but probably not even that different. Identifying your parents is probably still an Index Seek in a public data collection!

So the next question is: who will we be working for. Well, for the public at large, the service of connecting to the internet and asking it questions will be based on a monthly fee. It may be a flat fee or it may be usage based, but it will not be based on the compute power involved with your requests. Just the number of requests. So it will be the provider that benefits from serving the highest number of requests using the least amount of resources. So for requests as described above, it will be the service providers who employ optimization specialists.

Companies will have different contracts. They will pay for the amount of data retrieved from various sources, for the amount of data stored, and for the compute resources used to satisfy the requests that the CEO blurts into his reporting tool. These companies will need to employ their own optimization specialists, to help the CEO phrase his request such that it will not break the bank. And finally, the few companies that still have their own computers will obviously also buy licenses to use these same algorithms and will therefor also have a need for optimization specialists.


The road ahead of us will be interesting. I really do not actually know how much of the changes I predict will actually come true. But one thing is sure: there will be change. And the pace of change will continue to increase.

If you do not want to fall behind, better make sure you stay on top of the change, and on top of emerging new technologies.

Plansplaining, part 3. How repeating work saves time

This is the third post in the plansplaining series. Each of these blog posts focuses on a sample execution plan that exposes an uncommon and interesting pattern, and details exactly how that plan works.

In the first post, I covered each individual step of each operator in great detail, to make sure that everyone understands exactly how operators work in the pull-based execution plans. In this post (and all future installments), I will leave out the details that I now assume to be known to my readers. If you did not read part 1 already, I suggest you start there.

Sample query

Today I will look at a pattern that can be seen when working with partitioned tables. I will not use a standard demo database but create my own set of sample tables to get maximum control. Here is the code to create two partitioned tables and load them with some data. Execute this code in your own “playground” database if you want to follow along.

This may take some time to run (about 2:45 minutes on my laptop). Since it’s a one-time deal, I did not bother to optimize this. Do make sure to run this with the Include Actual Execution Plan option disabled, or you will generate 60,000 execution plans (which will probably crash your SSMS instance).

Once the tables are created and the data is loaded, let’s run a simple query that joins two of the demo tables:

The execution plan for this query is very simple and holds no surprises. There is no WHERE clause and the tables are fairly large, so a Nested Loops join would be a bad idea. A Merge Join would be an option, but the equality part of the predicate is on column a only, which is not unique in both tables; so this would be a many to many Merge Join, with all the associated overhead in tempdb. A Hash Match join is a better choice in this case, and that is indeed what the optimizer picks:

Now let’s see what happens if we make one small change to the query. Instead of joining tables Part_1a and Part_2 (both partitioned but on different partition functions), we’ll now join Part_1a and Part_1b (both partitioned and using the same partition function). Here is the changed query:


With basically the exact same query you would probably expect to see the exact same execution plan. But that would not make for a good Plansplaining post, so fortunately we do actually get a different execution plan this time:

Comparing this execution plan to the one above, we see that the exact same pattern of a Hash Match join processing two Clustered Index Scans is used here. But in this case that pattern is on the inner side of a Nested Loops join, which means it executes once for each row in the outer input (which is a Constant Scan operator that we’ll examine in more detail shortly). The properties of the Hash Match operator show us exactly how often this section of the plan was executed:

Upon seeing this, you would be forgiven for thinking that the optimizer has gone absolutely bonkers. Why is the optimizer creating a plan that, by the looks of it, appears to do the same work four times? Granted, since the aggregation asks for a MAX only there will be no concerns over correctness of results in this case. But repeating the same task four times? Really? Why?

The good news is that there actually is a good reason for this. And that the execution plan is not actually doing the same work four times. However, this is not visible by looking at a picture of the graphical execution plan; we need to dig deep into the details to understand what’s going on.

The devil is in the details

Before diving into the apparently repeated section of the plan, let’s first examine the Nested Loops join operator and the Constant Scan on its outer input. The Constant Scan operator is by itself actually already a mystery here. Looking at the query, there is no obvious reason at all to have a Constant Scan in this plan, and yet here it is.

Constant Scan

Whenever we see a non-obvious Constant Scan operator, out first stop should always be its properties, specifically the Output List and Values properties. These expose the data that the Constant Scan returns. Though these properties are exposed in the property popup, the formatting of the Values property makes it hard to parse. The full properties window has the same data in an easier to understand format:

The Output list shows that this Constant Scan will return rows with just a single column, called Expr1005. Note this column name, because we need to find which operators use this column in order to understand the role of this Constant Scan. The Values property shows that four rows will be returned (not surprising, since we already saw that the inner input of the Nested Loops operator executes four times, and that the value of Expr1005 will be 1 for the first row, 2 for the second, then 3, and finally 4. None of these values appear in the query so they make no sense yet. We really need to find where these values are used!

Nested Loops

Looking at the Nested Loops operator, we see a few interesting things. We know that it receives four single-column rows from its outer input, and then executes its inner input for each of these rows. Let’s look at its properties, more specifically at the Output List and Outer References properties:

Let’s start at the bottom, with the Outer References property. We have already seen this property in previous plansplaining posts, but as a quick reminder: Outer References means that a value from the outer input is pushed into the inner input; the inner input then ensures that only matching rows are returned which is why there is no Predicate property on this join operator. In this case Expr1005, the column that is set 1, 2, 3, and 4 for each of the rows from the Constant Scan, is pushed into the inner input.

The Output List does not include Expr1005. Apparently, the Nested Loops operator in this case doesn’t actually join its two input sources; it merely uses the outer input to drive four executions of the inner input, and then returns data from the inner input only. The values returned from the Constant Scan are not returned by the Nested Loops operator, which also means that we now know that the Stream Aggregate doesn’t do anything different than in the first plan – it doesn’t receive Expr1005, so it cannot act upon it.

Clustered Index Scan

The next part of the execution plan to look at is the section with a Hash Match and two Clustered Index Scan operators. I am not discussing the Hash Match here, because there is nothing of interest on this operator. However, both of the Clustered Index Scan operators have a property that most people would not expect to see, ever, on a Clustered Index Scan (or, in fact, any Index Scan) operator: a Seek Predicates property! (I show a screenshot of the second Clustered Index Scan here, but they both have a very similar Seek Predicates property).

Normally, a Seek Predicates property is only found on (Clustered) Index Seek operators, and not on a (Clustered) Index Scan. An Index Scan is not designed to seek; it is designed to read a whole table. So how can this scan suddenly apply a Seek Predicates? And also, what exactly is the predicate here? A column called PtnId1002, that we never created in any of our tables, is compared to the Expr1005 column, the data coming from the Constant Scan and pushed into the Nested Loop’s inner input.

Luckily the optimizer loves to use mnemonic codes when introducing its own columns. The name PtnId1002 is a so-called “partition id”. Remember, all the tables used in this example are partitioned tables. And partitioned tables happen to be the only (as far as I know) context where you can actually see a Seek Predicates property on a scan operator. It is used to limit the scan to one or more selected partitions only. In this case a single partition. Which partition? Well, that is determined by Expr1005.

Remember, Expr1005 is set to 1 for the first row (and hence the first execution of this section of the execution plan), then to 2 for the second, and so on. So for the first execution, the Clustered Index Scan operators will read data from partition 1 only. This applies to both Clustered Index Scan operators. The Hash Match then combines this data, returning the joined results for partition 1, which Nested Loops then passes to Stream Aggregate. After that, the second row from Constant Scan is read, and the inner loop restarts, this time reading and joining data from the second partition of both tables. Once all four partitions are processed in this way, Stream Aggregate returns the maximum value it found and execution stops.

Helicopter view

Moving back to the helicopter view, we now know that the section consisting of one Hash Match and two Clustered Index Scans only appears to be the same in the two plans. In reality, the first plan actually processed the entire tables in the Clustered Index Scan operators, causing a massive join between 20,000 rows from each input. The second plan used a Constant Scan to enumerate the partitions, a Nested Loops to repeat the section for each partition, and a surprising Seek Predicates property on each of the Clustered Index Scan operators to process only a single partition in each execution. So while the logic is indeed executed four times, each execution now only had to join 5,000 rows from each input. And though the scans execute four times, they do not scan the full table four times. Each execution scans only the requested partition. In the end, the amount of IO for the Clustered Index Scan operators in this plan is exactly the same as in the first plan.

This is an optimization pattern known as “join collocation”. The most obvious place where you can see the effect of this optimization is in the memory grant (visible as a property of the SELECT operator in each execution plan). The first query requests 6112 KB, needed for the Hash Match operator to store 20,000 rows in the in-memory hash table. Because the second query processes the same 20,000 rows in four independent chunks of 5,000 rows each, its memory grant is a mere 1536 KB. This memory will be reused for each of the four executions.

On my laptop, this is not very relevant. But imagine a busy system, with lots of users, plus a buffer pool, a plan cache, and other resources, all competing over the same precious memory. Now a 75% reduction in the memory footprint of a query suddenly becomes very valuable!

Why not both?

At this point you may be wondering why SQL Server doesn’t always apply this pattern? Why did we not get a similar “join collocation” plan for the first query? Or for every join ever done between two large tables?

The answer is simple. Join collocation needs a few conditions to be met before it can be used. The conditions are:

  1. Both tables in the join need to be partitioned.
  2. The join condition has to include equality of the partition columns in the two tables.
  3. Both tables in the join need to use the same partitioning function. Or rather, the partitions need to align exactly (if you use two different partition functions with the exact same definition, you will still get join collocation).

The combination of these three requirements enables the join collocation pattern. Because the tables are partitioned, the (Clustered) Index Scan operators can use a Seek Predicates to return only a subset of the data without overhead. Because the partitions align and the join is on equality of the partition columns, we know that data that needs to be joined from both sources is processed in the same execution of the inner input.

In the case of the first query, the partitions did not align completely: they use the same boundary values, but one uses RANGE LEFT and the other RANGE RIGHT. Now it is possible that a row that is in partition 1 of the first table may need to be joined to a row in partition 2 of the second table, but these rows would not be processed during the same execution so the results can be incorrect.


Details are always important. Just because two execution plan fragments look the same in the graphical execution plan, does not mean they are the same. You always need to look at all the details that are available in the property sheet.

In this post, it appeared at first sight as if the same logic was being executed multiple times. But by looking at the properties we were able to ascertain that these four executions each worked on a quarter of the data. We were also able, by looking at the properties of a Constant Scan and following the generated column through the execution plan, to make sense of an operator that at first appeared to make no sense at all.

In plansplaining part four, planned for April, we will look at an execution plan that uses Bitmap operators to optimize a star join. However, I do not want this series to be about only my interesting plans. Reader input is greatly encouraged! If you have ever run into a pattern in an execution plan that appeared to make no sense until you dug into the gory details, or even into a pattern that still makes no sense to you, let me know and I will consider it for inclusion in a future post. (Note that I WILL ask you for a copy of the actual execution plan, saved as a .sqlplan file, or for a repro script).


SSMS hidden gem: Edit Query Text

When I am tuning a query, I usually start with the query text. I execute it to retrieve an execution plan, and then the fun starts. However, very often I first need to find the specific query (or queries) to tune. There are multiple ways to do this, depending on what monitoring tools and which version of SQL Server the customer uses. In some cases, I at first find only an execution plan. Now I do like to see execution plans when tuning, but I also need to have the query!

The old method

So here is an example of what your SSMS screen may look like when you open an execution plan:

As you can see, the top of the screen shows the query text. Or rather, the start of it. Even with large, high resolution monitors, and even with very small fonts, the single line allocated to this is typically too short to show the full query. In most cases, I see SELECT and the first 10 or 15 columns names, but the FROM is already invisible.

This is just a display issue. The actual execution plan contains the first 4000 characters of the query (which luckily in most cases equates to the entire query), it just doesn’t fit in the SSMS window. And until recently, I did not know a simple way to get to this string. So what I had to do was right-click somewhere in the plan area, select “Show Execution Plan XML”, then in the XML look for the property StatementText=“xxxx”. This then shows the entire query string, with line breaks represented as “
” and a few similar XML-friendly (but person-unfriendly) replacements.

This is usually enough to locate the original query in the source code of the application. If that fails, I can copy the entire content of the StatementText property, paste it in a new window, use search and replace to change all that weird XML stuff back to normal text, and then I finally have the query and I can start tuning.

Bottom line: It works, but it ain’t pretty.

Enter: Edit Query Text

However, there is good news. I recently discovered an incredibly useful new function in SSMS 17.4. I do not know when it was first introduced. Perhaps it really is completely new, or perhaps it has been around for a few years already and I just never noticed. Who knows.

Look at the little button to the right of the query text. This is the new function I discovered. If I hover my mouse over it, it reveals its name: “Edit Query Text”. But that’s not what excites me. What does make me happy is what it does. When I click this button, a new SSMS window opens, with the query text that is stored within the execution plan. And no weird 
&#xA shenanigans either – just normal query text, ready to be executed, or to be modified and tuned.

Now obviously, the 4000 character limit does still apply. Only the first 4000 characters of a query are stored in the execution plan. SSMS can not present you data it doesn’t have. But for any query of normal length, this tiny button offers an amazingly simple and useful way to quickly get the text of the query that resulted in this execution plan.

Thank you, Microsoft!

Execution plans 101: (back to) The basics

I recently received an email from a reader of this blog. Specifically of the plansplaining posts. He wrote that he enjoyed reading it, but he still had some questions. Those questions made me realize that I made a mistake when I started this series. I dove in at the deep end. I assumed a lot of prior knowledge and experience with reading execution plans, but not every reader has that knowledge and that experience.

So I decided that for this one post, I will move from the deep end of the pool to the shallow water. In this post I will cover the basics. Note that this post is titled “plansplaining” – that moniker remains reserved for the deep end! This post, though, will star with the true basics. I am guessing that most readers will be familiar with the first parts but I am still including it just to make sure I cater to everyone. Throughout this post the level will move up and at the end you will be ready to swim for yourself.

Getting an execution plan

Most of the time when you are working on code that needs tuning, you will have the code in a query window in SQL Server Management Studio (SSMS). There are two buttons on the toolbar that you can use to get an execution plan; each has a corresponding item in the query menu and a keyboard shortcut.

When you select the “Display Estimated Execution Plan” option, the text in the query window (or a subset if you have highlighted a part of that window’s contents) will be submitted to the query optimizer to create an execution plan which will then be displayed in graphical format. The query will not actually be executed. This can be a good option to quickly look at an execution plan for a very long-running query, or for queries that would modify data.

When you select the “Include Actual Execution Plan” option, you toggle this option on or off. When it is on, you instruct SSMS to capture actual execution plans for every query executed from that window, and display them in graphical format. This option adds extra information than the estimated execution plan, but you will have to wait for the query to end.

Note that the terms “Estimated” and “Actual” execution plan are actually misleading; both represent the same plan, but an “actual” plan includes additional information that is gathered while the query executes (such as how many rows were actually processed in each stage of the plan).

An example

If you want to follow along, now would be a good time to open a copy of SSMS and connect it to a test instance that has a copy of the AdventureWorks sample database loaded. Open a query window and paste in this simple query:

Now toggle the “Include Actual Execution Plan” in the on state and then run the query. If your SSMS is set to the default option of returning results in a grid, you will now find a total of three tabs in the results pane, as shown below. The “Results” tab is the part that normal people are interested in, and the “Messages” tab can contain interesting information such as error messages or how many rows were returned. However, in this post (and, actually, in almost all content you will find on this site), we are mostly interesting in the third tab, marked “Execution Plan”. Here is what I see when I try this:

In case you have never gone through these steps before, congratulations are due: you have just requested and gotten your very first execution plan. On the other hand, if none of the above was a surprise you could have skipped this section. Let’s see if the next brings anything new for you.


In the execution plan above, you see two basic elements. The first are the various graphical icons, each with two or three short lines of text below them. The second basic element are the arrows; I’ll discuss them shortly but for now I’ll focus on the icons.

Each icon represents what is called an operator. Every operator is similar to a sub-program. Some code inside a larger program that is invoked when needed to perform a specific task. SQL Server execution plans have a rather limited set of distinct operators – less than 70 last time I counted, and most of them are actually very rare; there are only about 15 or 20 that you need to know to understand 99% of all execution plans.

Each operator performs its own specific task. When the optimizer creates an execution plan, it can use as many or as little copies of each operator as it sees fit. The execution plan above contains seven operators: two copies of the Index Scan operator, and one copy each of Sort, Hash Match, Merge Join, Clustered Index Scan, and Compute Scalar. In case you wonder why I didn’t include the icon named SELECT in this list, the reason is that all execution plans have one such icon at the top left that is not really an operator; it is the gateway between the internals of the execution plan and the externals of the client submitting a query and receiving results.

Every operator can be called in three ways (or to phrase it technically, each operator supports three method calls). The Initialize and Close calls are hardly ever relevant for understanding how an execution plan works, so in this post I will focus only on the GetNext call. This call asks the operator to produce a single row of data and return it. After returning a row, control is passed back to the caller. The operator does maintain state, so when the next GetNext call comes in it “wakes up” and resumes where it left off. When the operator has no more rows to return it returns an “end of data” signal; after that the GetNext method cannot be called again, unless the operator is first reset by a new Initialize call.

Operator properties

Every operator has a long list of properties. Some of those properties control exactly how the operator behaves (for instance, a Sort operator will always sort its input data, but the properties define by what column(s) the data will be sorted). Other properties provide estimations of how many rows the operator is expected to produce, or a count (made when executing the query) of how may rows it actually produced.

Accessing the properties of an operator in an execution plan is easy. There are even two methods. The fastest is to hover your mouse over an operator – after a second or so a yellow popup window will appear displaying some (but not all!) of the operator’s properties. The alternative, which allows access to all properties, is to right-click an operator and select “Properties” from the context menu. (If you leave the properties window open at all time, you can bypass the right-clicking and just left-click an operator once to give it focus).

The picture above shows the two versions of looking at the properties of, in this case, the Compute Scalar operator in the plan above. In the properties window (shown on the right) you have the ability to expand or collapse some nodes. In the screenshot the Defined Values property is shown in collapsed state, but I expanded the Output List property by one level to show some more details. If you compare the two pictures, you can see that this Output List property is not visible on the left-hand side, but the Defined Values property is. This is not configurable; Microsoft decides which properties to include in the pop-up. This can vary from version to version. I have also seen situations where, depending on the size of the information and the available screen size, the same property gets included in one case and omitted in another case. The full properties window is a few clicks extra to get, but has the advantage of always exposing all available information.

Data flows

After talking about the icons, representing operators, it is now time to shift our attention to the arrows. What do they represent? Most people will say that they represent how data flows between operators. And while that is true, the real answer is actually a bit more complex than that. It is really important to always keep in mind that an operator doesn’t act by itself, it only does something when a GetNext call activates it. In other words, an execution plan is “pull-based”, not “push-based”. When people think of data “flowing”, the image tends to shift towards a push-based model, where operators produce rows all the time and push them to other operators, instead of doing nothing until asked to do something.

Let’s look at an example. In the execution plan above, there is an arrow pointing from Hash Match to Sort. To understand the actual logic of this plan, we must read this arrow in reverse order: as the Sort operator calling the GetNext method of Hash Match whenever it needs a row. The Hash Match operator responds by doing some work and then returning a row; this row traverses in the direction of the arrow. The arrows are drawn this way because in many cases, the key to understanding the logic of an execution plan is to follow the data. But you should always keep in mind that this is not an autonomous steady stream of data. The data movement is activated by the recipient of the data requesting a row when it needs one. Not by the producer creating rows whether they are needed or not.

Anyone who has sufficient background in math will recognize the graphical representation of an execution plan as a graph. Don’t worry if you never learned graph theory in high school (or if you forgot about it after graduating). You don’t need to understand this in order to work with execution plans. I only bring this up because, due to this mathematical background, operators in execution plans are often called nodes. The arrows establish a hierarchy between nodes. Looking again at the example above, the arrow pointing from Hash Match to Sort defines Sort as the parent of Hash Match, and conversely Hash Match is a child operator of Sort. As you see, some operators (Hash Match and Merge Join in the example above) have more than one child, where some others (e.g. Index Scan and Clustered Index Scan) have no children at all. Most operators always have exactly one child. And every operator always has exactly one parent, except for the operator at the top left. This operator is called the root node (or simply root) of the execution plan.

Data flow properties

Just as you can inspect the properties of operators by either hovering your mouse over them or by opening the properties window with that operator in focus, you can do the same with the arrows that represent the data flows. (In this case you will always see the exact same set of properties in the popup and in the properties window) .However, this can be slightly misleading. You see, the source of each execution plan is a large chunk of XML (which in turn is a representation of the internal structures SQL Server uses). And in this XML, there is nothing that represents these arrows. The hierarchy of the operators is represented in the XML by how the corresponding elements are ordered. SSMS parses this data and then uses it to represent it as arrows, which is obviously far easier to understand for us humans.

But if the arrows are not actually included in the execution plan XML, where then does SSMS take the properties you see when you hover your mouse? The answer is simple: the properties you see on an arrow are a subset of the properties you see on the operator where the arrow originates. Hover your mouse over the arrow between Sort and Hash Match and you will see the popup shown below and to the left. It exposes only the actual and estimated number of rows, estimated row size, and estimated data size (which is not a true property; it is the result of multiplying estimated number of rows by estimated row size, which SSMS shows as a convenience to the user). Hover your mouse over the Hash Match operator itself and you will see the popup at the right. All the three “real” properties you see on the left are included there as well, plus many more. There is no need to ever look at the properties of a data flow (apart perhaps from ease of access if you only need these specific properties, or if you don’t want to do a simple multiplication yourself).

It is also important to point out that the properties that SSMS shows do not include what I personally consider to be the most important properties of data flows: what data exactly is in the rows  that the child operator passes to its parent, and where it is coming from. Have you ever had a manager request you to “produce a report showing a breakdown of sales by region” without any additional specification? Trying to understand an execution plan without knowing exactly what data the operators pass to each other is like trying to build that report without asking the manager which metrics should be included and by which measures the report should be broken down. It’s doomed to fail.

Data flow contents

Looking at a data flow, I always want to know two things: what data is included in the rows that the child operator returns to its parent, and where does this data come from. Let’s start by showing what data is included. This is easy to find if you know where to look. Let’s look at an example. The picture below is the property window for one of the two Index Scan operators (the bottom-most one, to be precise). There are a lot of interesting properties here that provide additional insight in what this operator does, but for the purpose of understanding the data it returns to the Compute Scalar operator that calls it we need to look only at the Output List property. I have expanded it in the window and highlighted in the screenshot.

Every operator in every execution plan has an Output List property. This property lists all the columns that the operator returns. In this case, you see that three columns are returned: the BusinessEntityID, FirstName, and LastName columns, all from table Person.Person in database AdventureWorks2017. This is important information. I know from seeing the Index Scan operator and the table name Person below it in the graphical representation that this operator is reading and returning data from the Person table. But by looking at the properties, I now know that it is not passing the entire row as it is stored in the table, or even in the index. No, the operator picks only three columns from however many are available, and only passes those values. The rest of the data available in the index is, apparently, not needed for this execution plan.

The origin of data

As mentioned before, I do not only want to know what data is passed in each data flow. I also want to know where all that data comes from. In the case of the Index Scan above, this is pretty obvious. The operator doesn’t call any child operators and it does read from an index on the Person table, so obviously all three returned columns hold data that is read from the Person table (or rather its index). That is also pretty easy to conclude from just looking at the column names. Things get more interesting in some other operators. The picture below represents the properties of the Compute Scalar operator, that requests rows from the aforementioned Index Scan and returns rows to the Hash Match operator.

As you can see from the Output List property, this operator also returns rows with three columns each. But they are not the same columns as what Index Scan returns. Two columns are familiar: BusinessEntitityID and LastName (for brevity I omit the full qualification of the column names here). The FirstName column that was passed into this operator is not passed on, but we do see a new column which is named Expr1003. These are the three columns that Compute Scalar passes to Hash Match. The first two columns originate from Index Scan and are simply passed unchanged. But Expr1003 is new. It does not come from Index Scan so it has to be created by Compute Scalar itself. This is confirmed by looking at another property: Defined Values. Unfortunately, SQL Server does not use this property as consistently as I would like. There are operators that include columns in a Defined Values property that are not computed in the operator but taken from somewhere else. There are operators that include columns in the Defined Values but do not bother to include a definition (usually because the definition is obvious if you understand the operator, but it would still be nice for consistency – and for those who do not understand the operator yet – to include the definition anyway). And there are operators that compute new columns and add them in the Output List but do not include a Defined Values property at all.

I have already selected this property in the screenshot above, to show that for properties with a long text as their content SSMS will show an ellipsis button you can click to open a new window with the full content. When I click that button, I see this window open:

With all the parentheses and brackets, this can be hard to parse. I sometimes copy & paste it in an editor and remove brackets and parentheses until I see the light. In this case, though, all I need is some patience and perseverance until I recognize this as an overly complicated representation of the expression for the FullName column in my query: p.LastName + N', ' + p.FirstName .

The optimizer could of course have called this column LastName already at this stage in the execution plan. But that is not how the optimizer works. Whenever a new column is introduced that is not directly taken from a table or index, it is given an artificial name. These names are always made up of a mnemonic prefix (“Expr”, for “expression”, is the most common but not the only prefix you will see) and a four-digit number. Numbering always starts at 1001, but sometimes numbers are assigned to expressions in intermediate phases of optimization and then later removed. Don’t despair if you see Expr1001 and Expr1003 but no Expr1002. This is normal, you are not overlooking anything.

Now I know everything I need to understand what is going on in this section of the execution plan. The Index Scan delivers rows to Compute Scalar with three columns. The BusinessEntityID column is not used by Compute Scalar, but passed unchanged, so I guess this column is needed elsewhere in the execution plan. The FirstName and LastName columns are both used in the expression to compute the Expr1003 column that matches my LastName column in the results. LastName is then also passed to the Hash Match operator so this column is apparently needed in at least one more location in the execution plan. However, the FirstName column was used for this computation only; it is not needed anymore so it is removed from the rows that are being passed between the operators.

Do not fall for the idea that a column named Exprnnnn in the Output List property of an operator is always generated by that operator. That is not the case. A column with that list is computed somewhere but not necessarily in that same operator. For instance, we have already seen that the Hash Match operator receives rows with a.o. the Expr1003 column from Compute Scalar; the Output List of that operator shows that Expr1003 is passed on, unchanged, to Sort. And Sort then passes it to SELECT, the gateway to the client. Which makes sense, of course: we already established that Expr1003 corresponds directly to one of the expressions in the  SELECT  of our query, so of course all operators have to pass this column to their parent in order for the data to reach the client.


The intention of this post was to highlight some of the basic things you need to look at when you want to read an execution plan. I have not explained all the details of the specific execution plan; that was not the intention. It only serves as an example to highlight how elements in an execution plan interact.

The key takeaways for this post are:

  • The icons in an execution plan represent operators. Each operator is a miniature program that performs a very specific task. The properties of each operator expose a wealth of information on what exactly the operator does. It is almost always impossible to understand an execution plan without looking at the properties of the operators.
  • The arrows in an execution plan represent data flows. Data moves from one operator (the child operator) to another operator (the parent operator) in the direction of the arrow. However, be aware that data is not constantly and autonomously streaming. The actual process is that the parent operator calls its child operator when it needs a row, and only then will the child operator do what is needed to produce and return a row. It is possible in some execution plans for an operator to never be called; in such cases that operator does no work at all.
  • In order to understand how the operators in an execution plan interact, it is insufficient to look at which operators are included in the plan, and how they call each other (or in other words, how data flows through the operators). You will also need to look at exactly which columns are used in each data flow, and where these columns come from. Each operator exposes all the columns it passes to its parent in the Output List If you encounter columns in that list that are not included in the Output List of the operators children, then this column must be computed by the operator itself; in most cases you can find details on how it is computed in the Defined Values property.

This post may have been a bit basic for some of my readers. But for others, it might be just what they needed to make more sense of my other posts. However, it is not always easy to judge whether an explanation is spot-on, too basic, or too advanced. So here is what you can do. Use the form below to leave a reply. If you are still struggling with the concepts, tell me where you are stuck and I will try to help. Do not hesitate for fear of being laughed at. I can guarantee you that I will not ever laugh at someone for asking basic questions. I am very much aware, and so should you, that I once started knowing nothing. I have gone through phases where I had nobody to ask questions too, and I still shudder when I think of the things I did in those days. I also had periods where I did have the option to ask questions, and learn. The only reason that I am where I am today in my career is that I never hesitated to ask questions, but also never took the answers for granted. The best way to learn is to ask a question, get several (different is ideal!) answers, then set up some sample tables and on your playground database and run some code to verify whether the answers are correct. And if they are, then change the code, observe the effect of the change, try to understand how and why the change had that effect, and ask more questions if you can’t.

The first step for all learning is to ask questions!

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.