This is the fifth and final post in my May 2020 series about the Actual and Estimated Number of Rows properties in execution plans. In the previous posts, I talked about how you can use them, the confusion a Nested Loops operator can cause, how sometimes the graphical representation is based on input rather than output, and how missing nodes can catch one off guard.
This last part focuses on the issue that is perhaps the most common of all: the lack of data reported by a Compute Scalar operator, and how this affects the graphical execution plan
Compute Scalar operator is probably the most common of all operators. I hardly ever see an execution plan that doesn’t have at least a few occurrences of this operator. The task of the Compute Scalar operator is a simple one: to use some of the data in its input and, based on that, produce new data that is then added as extra columns in its output.
Because of the simplicity of this task, the actual execution of that task is often done by one of the other operators in the execution plan, and the Compute Scalar operator itself doesn’t actually execute. A side effect is that it can’t track how many rows it processes, because it doesn’t process anything at all. The result is that, even in an execution plan with run-time statistics (aka “actual execution plan”), no run-time statistics will be reported by a Compute Scalar operator when all its computations are performed by other operators. (See also the note in this (retired) Books Online article).
This can result in interesting situations, such as shown in the execution plan fragment above. Just looking at the arrow width, you might assume that Compute Scalar outputs a lot more rows than it reads. But Compute Scalar never adds rows that it doesn’t read, just as it never removes rows from the data stream. Every row that a Compute Scalar receives from its child will also be returned to its parent.
Looking at the numbers makes it clear what happens. The Estimated Number of Rows to be returned from Compute Scalar’s child operator to Compute Scalar, and hence also estimated to be returned by Compute Scalar to its parent, is high: almost 20,000 rows. But this is an execution plan plus run-time statistics, so the child operator also reports the Actual Number of Rows, which in this case is only 6. A huge cardinality estimation error. Management Studio uses the best value it has available, the Actual Number of Rows to determine the width of the arrow, which is therefor rendered fairly narrow.
For the output of the Compute Scalar, no Actual Number of Rows is reported. Because the actual computation was done elsewhere, the operator never really executed, so it can’t report how many rows it processed. And in the absence of the Actual Number of Rows, Management Studio can only use the Estimated Number of Rows to determine the arrow width.
Unlike the other bugs and weird choices in how SSMS renders arrow width that I discussed in the previous posts, this is one that I can actually sometimes use to my advantage. If I look at an execution plan plus run-time statistics, and I see anywhere in that plan a Compute Scalar operator with a difference in the width of the incoming and outgoing arrow, I know that can only be because that Compute Scalar didn’t really execute, so the outgoing arrow is based on the Estimated Number of Rows, the incoming arrow is based on the Actual Number of Rows, and the two are different. There has to be an incorrect estimate in at least that area of the execution plan!
But do keep in mind that not every Compute Scalar is only in the execution plan as a leftover that doesn’t do any real work. There are also cases where the work cannot be done by other operators, and in those cases the Compute Scalar will execute, and will report its Actual Number of Rows. Which will then be used for the arrow width. So a Compute Scalar with the same arrow width on both sides does not necessarily indicate that there is no incorrect estimate in that area.
Despite the usefulness of this weirdness, I still believe that execution plans would be easier to use if a Compute Scalar that does no real work is actually removed from the execution plan. We already saw before that there is a post-execution rewrite phase; that phase should (in my opinion) be used to actually remove the Compute Scalar operators that do no real work, and instead add Defined Values properties to the operators that do actually compute them.
It won’t surprise you to read that I created a feedback item for this suggestion too. Again, vote for it if you agree with my suggestion.
I’ve written a lot this month about the Actual Number of Rows property in an execution plan plus run-time statistics (as well as in a live execution plan) and about the Estimated Number of Rows property in every execution plan. These properties are invaluable tools. They can be used to identify the busiest sections of an execution plan, to find where bad cardinality estimations might have resulted in suboptimal plan choices, and where speed of parallel processing may have been hampered by skew. The visual representation of these properties, in the form of narrow or wide arrows, helps to visualize this information and draw attention to it.
But I have also pointed out several special situations where these properties, or the visualization, are not handled the way they should be. In those cases, unless you are aware of what is happening, the properties and their graphical representation can actually backfire, cause you to focus on areas of the execution plan where nothing bad is happening, and overlook actual issues. Knowing these issues can help to prevent this.
Better yet would be if these issues were fixed. This is why, for each of the issues I pointed out, I have also posted an improvement suggestion on Microsoft’s feedback site. If you agree with me, then voting for these suggestions is really important. While Microsoft does not blindly implement the highest voted suggestions, they do look at the number of votes as a quick gauge of how many of their customers are affected by an issue, or how many feel they would benefit from a suggested improvement.
This concludes the “May 2020 month of Actual and Estimated Number of Rows” for me. I hope reading this was useful for you, and that you learned something.
[…] Hugo Kornelis finishes a series on row counts and arrow widths with a look at Compute Scalar operato…: […]