This is part seven of the plansplaining series. Each of these posts takes an execution plan with an interesting pattern, and details exactly how that plan works.
In this post we look at a deceptively simple query: a simple
SELECT with an
ISNULL to show either a row returned or a placeholder value. And yet there is more going on under the covers than one might expect.
The query below can be executed in any version of the AdventureWorks sample database. It returns a code string representing the version number on the 2016 and 2017 versions; on older versions it instead returns the hardcoded text “Older version”.
SELECT ISNULL((SELECT TOP (1) abv.[Database Version] FROM dbo.AWBuildVersion AS abv WHERE abv.[Database Version] >= N'13'), N'Older version');
So what do we have here? A straightforward single-table query with a very simple predicate, embedded in an
ISNULL function call. You would probably expect some scan or seek plus a Top operator to evaluate the subquery, plus a Compute Scalar for the
ISNULL. In reality the plan is not quite as simple, as shown below. (As in earlier episodes, I have added the NodeID numbers into the execution plan to make it easier to reference individual operators).
We do indeed see a Top (#3), a scan (#4), and a Compute Scalar (#0), but there are also two extra, unexpected operators: Nested Loops (#1), and Constant Scan (#2). Why are those in the execution plan?
Constant Scan #2
In part 3 of this series I write: “Whenever we see a non-obvious Constant Scan operator, out first stop should always be its properties, specifically the Output List and Values properties”. Well, this Constant Scan is definitely not obvious, so let’s look at its properties.
Now there is an Output List property, but it is completely empty. And there is not even a Values property at all. That’s … odd! Does this mean that this Constant Scan returns nothing? Not exactly. A Constant Scan always returns at least one row. It normally returns one or more rows, with columns as defined in the Output List property and values as defined in the Values property. But when those properties are missing, it will still return a single row.
That single row is special. It is a type of row that we would never be able to use in our queries; it can only exist in intermediate phases in an execution plan, because it has zero column. This can be seen in the properties if you look at the Estimated Row Size property, which is 9 bytes. That is exactly the size of the row header within execution plans, a small area where metadata of each row is described (very similar to the row header you will find on data pages for rows stored in tables). The Actual Number of Rows property does indeed confirm that a single row was actually returned by this operator.
If you look at many execution plans you will notice this usage of Constant Scan more often, usually combined with a Compute Scalar. Constant Scan cannot do computations, but Compute Scalar needs an input row to start working and to store its result in; the combination of the two works fine: Constant Scan generates an “empty” row and Compute Scalar then adds data to it.
In this case there is no Compute Scalar to the direct left of Constant Scan though. So let’s see what happens next.
Nested Loops #1
If you read the previous parts you already know that execution doesn’t start at Constant Scan. It starts at the far left at SELECT, which calls Top; Top then calls Nested Loops, which calls Constant Scan. As soon as Constant Scan returns its first row (we already know that it’s the only row but Nested Loops does not know or care about that yet), Nested Loops calls into its inner (lower) input by requesting a row from the Top operator.
Top #3 and Clustered Index Scan #4
This part of the execution plan is pretty obvious so I’m not going to cover it in detail. When Top is called, it first calls the Clustered Index Scan operator. This operator has a pushed down search predicate for [Database Version] >= N’13’. Since the AWBuildVersion table in AdventureWorks always contains exactly one row, this scan will either return that row, or no data at all (in which case an “end of data” signal is returned instead).
If a row is returned, Top will pass that row unchanged to Nested Loops. If no row was returned from Clustered Index Scan to Top, then Top will obviously also not return a row to Nested Loops.
Nested Loops #1 (continued)
At this point we know that the Nested Loops operator was called, it then first called its outer input, Constant Scan, to receive an empty row; it then proceeded to call its inner input, Top, from where it may or may not (depending on the version of AdventureWorks) have received a row.
The next thing to do is to check the join condition (in the Predicate property). In this case that property is not present, which means that every row is considered a match. So if Top returns a row, it is considered a match for the empty row returned from Constant Scan and a row is returned to Compute Scalar; if Top returns no row then it is not a match but because the Logical Operation is a Left Outer Join it will still return a row, using Null values for columns originating from the inner input.
The Output List shows that the row returned contains just a single column: [Database Version]. On a newer version of AdventureWorks, this will indeed be the database version from the AWBuildVersion table; on older versions the inner input returns no row so the Nested Loops will provide a Null value in this column.
The Compute Scalar operator (#0) uses a simple isnull expression to replace Null values with the ‘Older version’ text from the query while letting the actual database version on newer builds pass unchanged. (Because of this simplicity I decided not to add a screenshot for this). This row is then returned to the client. Because the client does not know that this will always be a single-row result, it then calls Compute Scalar again, which calls Nested Loops. At this point there are two options.
On newer versions, the inner input of Nested Loops returned a row, so Nested Loops calls it again to check if there are more matching rows. The first operator in that branch is Top, and its Top Expression property is (1). Since it already returned a row it will not bother to call the Clustered Index Scan operator again but immediately return “end of data”. At that point Nested Loops will switch back to the outer input (the Constant Scan operator). But because Constant Scan has no Values property, it will only return a single empty row, not multiple – on this second call it will return “end of data”, so now Nested Loops also returns “end of data” which then propagates through the rest of the plan.
On older versions of AdventureWorks, the inner input had already returned “end of data” after the first call, so in this case the Nested Loops operator will immediately return to the outer input. Obviously with the same end result.
But … why?
All the above explains how the execution plan for this query works. But it doesn’t really explain why these extra operators were needed. Wouldn’t SQL Server have been able to return the same result by simplifying the execution plan?
The picture above is not captured from SQL Server; I created it using copy and paste on the screenshot of the original execution plan, to illustrate the type of execution plan many people would have expected for our sample query. Why was this much simpler execution plan not chosen?
If you somehow manage to force SQL Server to run the above execution plan in an AdventureWorks2016 (or 2017) database, you will actually get the expected result. And that makes sense if you follow along: operators call each other left to right until the Clustered Index Scan starts, which finds a row to return. Top passes that row because its only task is to stop execution AFTER the first row. Compute Scalar then applies the isnull expression and your database version is returned; Top then ensures that no second row is returned even if more data would exist in the AWBuildVersion table.
But this same execution plan on AdventureWorks2014 (or older) would behave differently. The Clustered Index Scan operator would not find a row to return. This means that “end of data” flows from the scan through Top, Compute Scalar, back to SELECT and you would get an empty result set instead of the single row with the text “Older version”.
At this point you might be wondering “but what about the isnull expression in the Compute Scalar operator?” Understandable. But here is a very important thing to remember: almost all operators in execution plans operate only on the rows they receive. Scan and seek operators receive rows from the storage subsystem; Constant Scan receives rows from its own properties; all other operators receive rows from their child operators in the plan. The only exceptions I know of are a Constant Scan with no Values property (which returns a single empty row), and a Stream Aggregate operator with no Group By property(which still needs an input but returns a single row if its input is empty).
Compute Scalar is not an exception. Compute Scalar computes its expressions for each row it receives, adds the computed values and then returns that row. If it never receives a row, then it will not compute its expressions, because there is no input data to use in those expressions and no row to add these values to. That’s why the faked execution plan above would not return the expected result.
A Constant Scan operator normally returns one or more rows with one of more columns of data, as defined in the Output List and Values properties. However, Constant Scan can also be used with an empty Output List and no Values property. In that case it will return a single row that has no columns.
This empty row can be used to generate a placeholder that other operators (often Compute Scalar) can then store values in. It can also simply be used to ensure that other operators actually receive a row of data, since that is for most operators the only way to get them to do any actual work.
The example in this blog post combined both: the Nested Loops operator needed a row from its outer input to make it start reading from its inner input, and then the data returned from that inner input was added to that empty row and passed to other operators to do the rest of the work.
I am still open for suggestions on topics to cover in this series. If you see an unusual and interesting pattern in an execution plan, let me know so I can consider using it in a future post.
If no suggestions come in, then episode 8 will probably focus on recursion.