When I look at an execution plan I sometimes right away see how the operators work together. Other times I need to dive into the details and put in effort before I really understand it. And occasionally, I think I understand but then am proven wrong after I start looking at the details. However, understanding all the details of an execution plan is really important when you want to optimize performance. If I see an execution plan where I do not understand the role of each and every operator, I know I do not truly understand how the query is executed. Yet.
This post is the first in a series. In every part, I will look at a query and its execution plan, highlight possibly confusing, misleading or simply non-obvious elements in the plan, and explain why those operators are there and how they interact to return the requested results. It may be a bit long sometimes, it may be a bit dry. But if you put in the effort to read this and try to understand, you will become a better performance tuner.
A simple query
In this post I will be looking at the execution plan for a very simple query (using the standard AdventureWorks sample database):
SELECT p.BusinessEntityID, p.FirstName, p.MiddleName, p.LastName, (SELECT pp.PhoneNumber FROM Person.PersonPhone AS pp WHERE pp.BusinessEntityID = p.BusinessEntityID) AS PhoneNum FROM Person.Person AS p;
The execution plan for this query is shown below. It is from SQL Server 2017, but it is probably the same on all versions of SQL Server. At first sight it doesn’t look bad. I see an index scan for the Person table. Since there is no WHERE clause, that is to be expected. The subquery that grabs the phone number is implemented via a join operator. The logical join type is Left Outer Join, so that a row from the Person table that does not exist in the PersonPhone table is not dropped from the results.
But if I look a bit further, I also see things that are more surprising. An Assert operator is very normal in a plan for modification (insert, update, delete), for constraint checking. But this is a normal select, so why is that operator here? And why has SQL Server added an aggregation operator?
And there is one more thing. The Person table has 19,972 rows. Why did the optimizer choose to use a Nested Loops join? Would a Merge Join or Hash Match join not be better in this case? When I execute the query with SET STATISTICS IO ON added to the batch, I get a confirmation that the Nested Loops join operator is in fact causing overhead. The 211 logical reads on the Person table are expected for this type of query; the 40,242 logical reads on PersonPhone appear to be excessive.
Table 'PersonPhone'. Scan count 19972, logical reads 40242, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Person'. Scan count 1, logical reads 211, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Step by step
It is tempting to immediately jump into tuning mode and try to find ways to change the join type, and perhaps get rid of the Assert and Stream Aggregate operator that I never asked for. But the optimizer usually has good reasons for what it does, so I exercise restraint. I first need to know why those operators are here. And that means that I need to make sure I understand their role within the execution plan. The best way to do that is to work out exactly what happens when the execution plan runs. That can take a lot of time, but you can consider that an investment. Once you know why this plan looks the way it looks, you can apply that knowledge every time you see a similar plan.
Execution plans always start with the top-left operator (SELECT) calling its child operator (Compute Scalar). This is the GetNext() call, requesting a row. Most operators cannot return a row before they receive one, so most operators respond to the first GetNext() call by immediately calling their child operator. The operators in this plan are no exception. When execution starts, SELECT calls Compute Scalar, which calls Nested Loops, which in turn invokes Index Scan (the child operator on its top, or “outer” input).
The Index Scan operators is the first one that actually does something interesting. The Output List property shows which columns it returns: BusinessEntityID, FirstName, MiddleName, and LastName. There is no Predicate property set, so it returns all rows. Looking at the original query, this is not surprising. But of course GetNext() requests just a single row, so the Index Scan will for now only navigate to the first page, grab the first row from that page, and return the values from that row to its parent (Nested Loops). It passes control back to Nested Loops, while maintaining state so that it can continue where it left off when called again.
After receiving this row from Index Scan, Nested Loops switches to the bottom input by calling the GetNext() entry point of the Assert operator.
The bottom input
The Assert operator responds to this GetNext() request by immediately calling the Stream Aggregate operator, which in turn requests a row from the Clustered Index Seek.
The properties of the Clustered Index Seek reveal that this operator will use the structure of the clustered index to navigate directly to the page that stores the phone numbers of the person just read from the Person table. At this point, there are two possibilities: a matching row is found, or not. Let’s for now assume that the operator does find a row. The Output List property shows only the PhoneNumber column, so this value is now returned to Stream Aggregate.
The Stream Aggregate operator in this case does not have the Group By property. This means that it produces a scalar aggregate: a single row with the aggregation results of all input rows. To do this it must first read all rows from the input. So after Clustered Index Seek returns its first row, Stream Aggregate updates some internal data structures and them immediately calls Clustered Index Seek again. However, I happen to know that there are no persons with more than one phone number in the PersonPhone table. So when Clustered Index Seek is called again, it will not find a second matching row and instead return an “end of data” indicator.
Looking again at the properties of Stream Aggregate, I see that it returns a row with two columns, called Expr1004 and Expr1005. Expr1004 is defined as Count(*), the number of rows, so this will be 1. The Expr1005 column is defined as ANY(PhoneNumber). ANY is an aggregate function that we cannot use in our T-SQL, but the optimizer can use it in execution plans; it means “just give me one of the values from the input, I don’t care which one”. In this case, with just one row in the input, there is actually little choice.
This row, with Expr1004 set to 1 and Expr1005 set to the phone number for the person from the current row in the Person table, is then passed to the Assert operator. Assert evaluates an expression (shown in its Predicate property) and then passes the row unchanged when that expression evaluates to NULL, or stops execution with an error when it evaluates to another value. In this plan, the Predicate property of the Assert is “CASE WHEN Expr1004 > 1 THEN 0 ELSE NULL END” (edited slightly for readability). Expr1004 is equal to 1, so this expression evaluates to NULL. Assert passes the row. However, its Output List property shows that it passes only Expr1005 (the phone number). Expr1004 is not needed anymore.
Returning the first row
After Assert returns Expr1005 to the Nested Loops operator, it can combine the data from both inputs. There is no Predicate property on this Nested Loops, so whatever is returned from the bottom input is assumed to be a match. (A safe assumption because the bottom input used a value from the top input to return only matching data; this can be seen in the Outer References property of the operator). The Nested Loops operator now returns a row with the four columns from the Person table, plus Expr1005. Which, as we remember, is “any” of the phone numbers found for the person, which was just a single one in this case.
The Compute Scalar operator then receives this row, and computes a new column, Expr1003, which is set to be equal to Expr1005. This sounds like an utter performance waste. And it would be if SQL Server would actually do this. However, the properties in an actual execution show no actual row count and no actual execution count for this operator. What we see here is an artefact of how plans are generated. Redundant Compute Scalar operators are sometimes left in the plan, but do not actually execute. They are removed from the plan in a final cleanup stage, after the plan was already selected, but this is not reflected in the plan we see. Seeing an operator with no values for the “Actual Number of Rows” and “Number of Executions” properties is a dead giveaway that the operator was removed because it was either redundant or its functionality was collapsed into another operator. In this case, it was redundant. So in reality, the row formed in the Nested Loops operator is returned to SELECT, which represents the client application in the execution plan.
An unmatched row
The SELECT operator will immediately request the next row, and as we now know it doesn’t request this of Compute Scalar but directly of Nested Loops. Nested Loops then needs to check to see if the bottom input has more rows, so it requests the next row from Assert, which relays the request to Stream Aggregate. But Stream Aggregate in this case was set up as a scalar aggregate that returns a single row from the entire input, so it can never return a second row. It returns the “end of data” indicator, that Assert faithfully passes on the Nested Loops.
Since the bottom input is now exhausted, Nested Loops moves back to its top input and requests the next row from the Index Scan. This operator will continue where it left off last time, so now the second row from the indicated index is returned. After receiving this second row, Nested Loops reinitializes its bottom input and the entire process starts over.
We already know what happens when the Clustered Index Seek does find a row. In the sake of keeping this interesting, let’s assume the second row does not have a match in PersonPhone. After Nested Loops kicks off the process, the operators call each other until Clustered Index Seek searches for a matching row and doesn’t find anything. So it can’t return a row to Stream Aggregate. It immediately returns the “end of data” indicator.
The Stream Aggregate operator usually does not return anything if it doesn’t receive any rows itself. But in this plan it is doing scalar aggregation, and that is the exception. Scalar aggregation always returns one row. In this case, Expr1004, the Count(*), will be set to 0; Expr1005 will be NULL. Assert receives this row, checks its Predicate, and since Expr1004 is not more than one, it will pass the row to the Nested Loops operator. This operator adds the NULL value in Expr1005 as the phone number, and returns this to its caller, to be sent to the client. And that’s how rows without phone numbers are handled.
Obviously, this process repeats over and over until all rows from the Person table have been handled and the query finishes. We now know exactly how the query handles all cases in our test data: both persons with and persons without a phone number. We know exactly what the Assert and Stream Aggregate operator do. But we do not yet understand WHY. So far, they have not affected the results. Why are they here?
What about multiple matches?
The Predicate expression of the Assert operator gives a clue as to why the operator is in the plan. Obviously, the optimizer wants to do something different when a person has multiple phone numbers. I know that this is not the case in the data in AdventureWorks, but SQL Server doesn’t know that. So it needs to handle this situation, and apparently it needs handling in this specific case. Let’s look. First in theory, and then test to verify.
Nobody needs more than one phone!
So now let’s assume that we added a row to PersonPhone, for a person that already had a row there. At one point the Index Scan will read this person. It restarts its bottom input and requests a row; the operators call each other and Clustered Index Seek finds the first matching row and returns it to Stream Aggregate. Nothing new so far, this is the same as we saw for the first row. However, this time Stream Aggregate’s request for another row is successful. It updates its internal data structures and, again, requests the next row. Clustered Index Seek does not find a third phone number for this person, so “end of data” is returned.
After it has read all the input rows Stream Aggregate now uses its internal data structures to return the correct values. Expr1004, the Count(*), is equal to 2; Expr1005, the ANY expression, returns one of the two phone numbers that were input. Which one is impossible to tell; the way ANY chooses its returned value is undocumented. But we do know that it will be one of the phone numbers for this person.
However, that phone number will never make it to the client. Since Expr1004 is now 2, the Predicate of the Assert operator evaluates to 0, not NULL. Any non-NULL result causes the Assert operator to immediately stop processing of the execution plan and throw an error. Which error that is can’t be seen from the execution plan. So perhaps this is a good time to run a test and verify what happens.
In order to test this, I will add a row to the PersonPhone table. I can insert it and then delete it later, or use a transaction that I roll back when done. I chose the latter:
BEGIN TRANSACTION; GO INSERT Person.PersonPhone (BusinessEntityID, PhoneNumber, PhoneNumberTypeID, ModifiedDate) VALUES (9647, '555-111-222', 1, CURRENT_TIMESTAMP); GO SELECT p.BusinessEntityID, p.FirstName, p.MiddleName, p.LastName, (SELECT pp.PhoneNumber FROM Person.PersonPhone AS pp WHERE pp.BusinessEntityID = p.BusinessEntityID) AS PhoneNum FROM Person.Person AS p; GO ROLLBACK TRANSACTION; GO
When I run this query, I get partial results. The query returns it first 12,415 rows, but then an error occurs and processing stops. The messages pane shows an error message:
Msg 512, Level 16, State 1, Line 12 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
And there we have our explanation. The SQL language does not like “random” results. When we use a subquery in a place where just a single value is expected, it gives us the benefit of the doubt. “My developers know what they are doing, they will know that this will never return multiple rows”. But true to the paradigm “trust, but verify”, it does add two operators, Stream Aggregate and Assert. These two operators ensure that an error message is thrown if it turns out the developer did not actually know what they were doing.
Can we tune this?
We now know why the Assert and Stream Aggregate operators are there. And this also “sort of” explains why a Nested Loops join algorithm was used. Merge Join and Hash Match both do a single pass over both inputs. The trick of aggregating all rows and seeing if there is more than one would not work if we read all rows instead of only rows matching the current person.
That doesn’t mean there are no other ways to achieve this result. The optimizer could have opted for a plan with a Merge Join or Hash Match by using a variation on this trick. It could do a Clustered Index Scan of all rows from PersonPhone, then feed those rows into a Stream Aggregate with a Group By property. This would return one row for each BusinessEntityID, with one of the phone numbers and a count. The same Assert can then stop execution when the count is too high. That would implement the same test for persons with multiple phones with just a single pass over the input, so now the two streams can be combined in a Merge Join or Hash Match join. Why didn’t the optimizer make that plan?
There can be multiple answers. Perhaps that plan introduces other overhead, making it more expensive. Perhaps the assumptions built into the costing rules make the optimizer think it is more expensive. Perhaps the set of rules used by the optimizer does not include a path from the actual plan to the theoretic one above. Or perhaps the optimizer stopped searching for better plans before it found this option.
Sometimes rewriting a query can work to nudge the optimizer into a different plan. That is not guaranteed to work; I have been outstubborned by the optimizer more often than I care to admit. However, in this case it turns out that it’s fairly easy to convince the optimizer. I’m looking for a plan that reads the entire PersonPhone table and aggregates it by Person. Just putting that (in T-SQL form) in a CTE, and then joining that CTE to the Person table, turned out to work (in this case).
Now I can’t directly use an Assert in my query, but I can use a CASE expression that will force a run-time error when a person has more than one phone. I also do not have an ANY aggregation function. But since I will only actually return a row when the number of rows is 1, I can pick almost every aggregation function. In this case I decided to use MIN.
WITH PersonPhoneDupDetection AS (SELECT BusinessEntityID, COUNT(*) AS Expr1004, MIN(PhoneNumber) AS Expr1005 FROM Person.PersonPhone GROUP BY BusinessEntityID) SELECT p.BusinessEntityID, p.FirstName, p.MiddleName, p.LastName, CASE WHEN pp.Expr1004 > 1 THEN SUBSTRING('', 1, -pp.Expr1004) ELSE pp.Expr1005 END AS PhoneNum FROM Person.Person AS p LEFT JOIN PersonPhoneDupDetection AS pp ON pp.BusinessEntityID = p.BusinessEntityID;
The query above returns the same data as the original query. It also generates a runtime error, just as the original query, although the error message is different. And it results in the type of plan I was hoping for:
This plan uses Hash Match to join the two data streams. The top input is PersonPhone, aggregated by BusinessEntityID. The bottom input is Person. Both inputs are scanned only once, and a lot of I/O is saved this way as demonstrated by the output of SET STATISTICS IO:
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Person'. Scan count 1, logical reads 211, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'PersonPhone'. Scan count 1, logical reads 152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The estimated cost of the execution plan for this query is 0.959. For the original plan, that was 3.921. This plan actually has a much lower estimated cost, and saves a huge amount of I/O. It does introduce a Hash Match operator, making it blocking and more memory hungry. I am going to guess that the optimizer would have picked this plan if it had found it while exploring alternatives. But that is just a guess.
Which one to use?
We now have two versions of the query. It is tempting to say that we should always use one with the lowest estimated cost, and the least I/O. But that is not always the case. The cheaper query does need more memory (in my tests it requests a memory grant of 8416KB), which can be problematic on servers that are already starved for memory. The Hash Match operator causes the plan to be partially blocking, as opposed to a fully streaming plan for the original query. And when there are persons with more than one phone, the original query stops executing as soon as it encounters one such person; the new version first finishes the entire build phase of the Hash Match and will only stop when it encounters the first problem row in the probe phase. Three reasons that might convince you, based on the requirements and limitations of your data and your server, to choose the original query.
Plus, obviously, the original query is a lot easier to understand and maintain; I would never allow the result of my rewrite attempt to go in production code without a long comment to explain what’s going on.
In this post I looked at a sample execution plan and dove into the details of each of the operators to understand their function relative to each other and to the plan as a whole. This is what every SQL Server professional should do when looking at an execution plan.
In this case, looking at all the details of the execution plan helped us find a way to speed up the query (with some downsides). That will not always be the case. Still, when you look at an element in an execution plan you do not understand, you always have to do the work to build that understanding. The result may not be a way to speed up the query, but you only know that for sure after you put in the work.
This post was very verbose, and very detailed. That’s because I also used this post to illustrate some basics of reading execution plans. That’s not how my mental process normally works when analyzing execution plans. I would look at the plan and blend out the parts I already understand. (I would still return to them if needed if needed to make sense of the rest). For this plan, that would leave the Assert, the Stream Aggregate, and the choice of physical join type. I would start with those operators. Upon seeing the Predicate of Assert, I would try to find where Expr1004 is computed, which would bring me to the Defined Values property of the Stream Aggregate operator. And at that point, I would understand the whole logic.
For future posts in the Plansplaining series, I plan to assume more basic understanding of execution plans than what I did for this post. They will be somewhere between the very long description in this post, and the very short description in the first paragraph.
(EDIT June 13 and June 15, 2020: Fixed several small errors and improved some confusing sentences. Thanks, Steve!)