Over the past weeks, I have published several posts about Actual Number of Rows, Estimated Number of Rows, how they are reflected in the arrow width in the graphical execution plan, and several scenarios where this can be wrong or misleading. And now, after telling you in my previous post that this was the last in this series, I am back with yet another one.
I discovered this only very recently. It was not planned for the series, because I was not aware of it. Until I ran into it while preparing a demo, got fooled by it, investigated it, and (with a little help of the ever amazing Paul White) found an explanation. And then I realized that I really need to write one more blog to cover this situation as well.
Back to the missing nodes
I already described, in a previous post, how sometimes the optimizer can create an execution plan that uses a Filter operator to evaluate a specific predicate, but then a post-optimization rewrite finds a way to push that predicate down into another operator, as a Predicate property, and then removes the Filter operator. When this happens with a bitmap filter, the Estimated Number of Rows is not adjusted, which can be quite confusing.
But for the issue in this post, the root cause was the same, but the error surfaces completely differently.
A Lookup with a predicate
In SQL Server 2005, both the Key Lookup and the RID Lookup operator did not support the Predicate property. That changed in SQL Server 2008. However, Microsoft chose not to add extra logic within the optimizer to teach it to use this new option; instead they chose to use a post-optimization rewrite for this.
Let’s look at this in more detail with this sample query:
SELECT dl.DatabaseLogID, dl.PostTime, dl.DatabaseUser FROM dbo.DatabaseLog AS dl WHERE dl.DatabaseLogID <= 100 AND dl.Event = 'ALTER_TABLE' ORDER BY dl.PostTime;
The DatabaseLog table has no clustered index, so it is a heap. It does have a nonclustered index on the DatabaseLogID column, and no other indexes. So for this query, the optimizer has limited options. It can use an Index Seek operator to find all rows with DatabaseLogID <= 100, fetch the additional columns with a RID Lookup, then use a Filter operator to retain only the rows with Event = ‘ALTER_TABLE’. Or it can use a Table Scan operator, with both filters pushed down in its Predicate property, to return only the required rows (at the price of reading all of them). For a higher cut-off point in the DatabaseLogID filter, it would use that second option. But for the query as shown, the Index Seek + RID Lookup is estimated to be cheaper. So this is the execution plan that is produced for this query:
On SQL Server 2005, that would have been the final version of the execution plan. But on SQL Server 2008 and higher, the post-optimization rewrite recognizes that the Predicate property of the Filter operator can be pushed into the RID Lookup operator. So it adds a Predicate property to the RID Lookup, and then removes the now useless Filter operator.
And while doing that, something realty curious happens with the Estimated Number of Rows property (shown as Estimated Number of Rows Per Execution in the execution plan, since I am running SSMS 18.5).
For the Nested Loops operator, the Estimated Number of Rows is unchanged. This is in fact the exact same issue as what happens when a bitmap filter is pushed, as described in my earlier post. However, for the RID Lookup something even worse happens. Here, the Estimated Number of Rows property does get set to the same value as the Estimated Number of Rows of the Filter operator. However, the Estimated Number of Executions (which is 100, based on the estimated 100 rows returned from Index Seek) is not at all taken into account. So the Estimated Number of Rows for RID Lookup is now changed from 1 (per execution, remember) to 27.0755.
And Management Studio, as of the latest update, obviously displays that as the Estimated Number of Rows Per Execution. And then goes on to also present you a pre-calculated Estimated Number of Rows for All Executions, by multiplying with the Estimated Number of Executions.
And note that, even though the example here uses a heap and a RID Lookup, the exact same issue could happen with a clustered index and a Key Lookup operator.
This specific case is, once you know about it, relatively easy to spot. Due to their specific function, both RID Lookup and Key Lookup can never return more than 1 row per execution, so the Estimated Number of Rows Per Execution (or Estimated Number of Rows, depending on the client tool you use) can never be more than 1. If a Predicate is pushed into a lookup and the estimate is pushed along with it, then that estimate can never be less than 1. So if you ever see a Key Lookup or RID Lookup where the Estimated Number of Rows (Per Execution) is more than 1, you now know the root cause.
There’s also some good news. When I talked about this specific issue with a Microsoft employee, they told me that they have “a planned fix for vNext and for Azure SQL Database under the new database compatibility level once in preview”. They gave no specifics, so I do not know whether that fix will only change the Estimated Number of Rows for the lookup operator itself, or also that of the associated Nested Loops operator.
I personally do hope, once the fix is live, that Microsoft will then also backport it to older versions. After all, it’s easy to see how the current data, especially after the SSMS 18.5 improvements, can be extremely confusing and misleading.