Compute Scalar

Introduction

The Compute Scalar operator is used to compute new values based on other columns in the same row. These new values are then added as extra column(s) in the output rows.

The expressions used to compute the new values can only refer to constant values and to columns in the input rows of the Compute Scalar operator. Other than that, there are, to my knowledge, no restrictions. The expressions can vary from very simple to extremely complex. The expressions can even include references to scalar user-defined functions, to CLR user-defined functions, and to built-in CLR functions.

One rather unique behavior of the Compute Scalar operator is that it very often does not actually execute. Depending on the expressions specified in the Defined Values property, SQL Server can and will often defer the evaluation of the expressions to other operators. When this happens, the Compute Scalar operator has no work to do; in this case it doesn’t execute at all. This is unfortunately not directly visible in execution plans, although you can find evidence of this in an execution plan plus run-time statistics, or predict that it’s the case based on the expressions used in the Defined Values property.

Visual appearance in execution plans

Depending on the tool being used, a Compute Scalar operator is displayed in a graphical execution plan as shown below:

SQL Server Management Studio

Azure Data Studio

Plan Explorer

(version 17.4 and up)

(until version 17.3)

Algorithm

The basic algorithm for the Compute Scalar operator is as shown below:

Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.

Also note that when the Compute Scalar doesn’t execute at all (as explained above and below), then this flowchart is not used at all, since the operator is never called.

Evaluate expressions

If evaluation is not deferred, Compute Scalar computes, for every expression in the Defined Values property, the result value based on the values in the input row. It then stores the results of these computations, assuming no run-time error occurs, in the appropriate column(s) in the output row.

When evaluation is deferred, Compute Scalar is only the container for the definition of the expression. Another operator evaluates the expression, when its result is needed. That operator then uses the result of the expression, but doesn’t store it. This is different from other operators having a Defined Values property of their own: when that’s the case, that other operator will evaluate those expressions and store them in the output row.

Evaluation of a scalar user-defined function, CLR user-defined function, or built-in CLR function is never deferred. These are always evaluated by the Compute Scalar. This evaluation starts by invoking the SQL or CLR code of the function, with the values from the current row passed in as its arguments; Compute Scalar then resumes work when the call to the function returns its result.

Most of the expressions that are used in the Defined Values property use standard SQL operators and keywords, though excessive use of parentheses, brackets, and fully qualified column names can make it hard to read the expressions. There also are several built-in functions that are not valid in SQL, but that are supported and understood by Compute Scalar. Most of these are also supported by other operators, either for deferred evaluation or as part of a Defined Values property on that other operator itself.

The following is a list of built-in functions that can be used in Defined Values but not in standard SQL. This list is almost certainly still incomplete! If you encounter an execution plan where Compute Scalar uses a keyword not listed here, please save it as a .sqlplan file and send it to me.

Function nameParametersDescription
CONVERT_IMPLICITdata_type
expression
third
Converts expression from its current data type to data_type.
The third parameter is an integer value; its function is currently unknown; I suspect that it is similar to the style parameter of the CONVERT function.
ConvertSVToXmlcolumn_set
second
Converts column_set from the internal storage format used for sparse columns to XML.
The second parameter is an integer value; its function is currently unknown.
ConvertXmlToRESVxml
second
Converts xml from XML to the internal storage format used for sparse columns.
The second parameter is an integer value; its function is currently unknown.
ForceOrderexpressionUnknown; probably used to ensure that expression is evaluated without reordering elements for performance gain.
getancestornode_id
second
Used in XQuery evaluation to find the ancestor node of a node in an XML fragment.
The node_id parameter represents a node in the internal node table of an XML fragment. The value returned is the node_id of the ancestor of node_id.
The second parameter is an integer value; its function is currently unknown; I suspect that it is used to specify the number of levels to go up in the hierarchy of the XML fragment.
getdescendantlimitnode_idUsed for XQuery evaluation to find the highest numbered descendant node of a node in an XML fragment.
The node_id parameter represents a node in the internal node table of an XML fragment. The value returned is the highest numbered node_id that is a descendant of node_id.
GetSparseColumnValuecolumn_set
column_no
Extracts the value of a single sparse column from column_set, which is stored in the internal storage used for sparse columns.
The column_no parameter is the column_id of he column to extract (as stored in sys.columns).
GetRangeThroughConvert
(not properly displayed in Management Studio)
begin_expression
end_expression
seek_flags
This function is not properly displayed in Management Studio. When the Defined Values property includes a GetRangeThroughConvert function, it shows as empty.
This function converts a range specified in one data type to a range in another data type; this is typically done to allow the use of an Index Seek operator when the data types of the indexed column and the value to find do not match.
The seek_flags parameter is a set of flags, encoded in an integer value, to fine-tune the exact behavior of the seek operator.
GetRangeThroughConvert returns a vector, a set of three values. Two of these are the lower and upper bound of the range in the target data type; the third appears to be always equal to the seek_flags parameter.
GetRangeWithMismatchedTypes
(not properly displayed in Management Studio)
begin_expression
end_expression
seek_flags
This function is not properly displayed in Management Studio. When the Defined Values property includes a GetRangeWithMismatchedTypes function, it shows as empty.
This function converts a range specified in one data type to a range in another data type; this is typically done to allow the use of an Index Seek operator when the data types of the indexed column and the value to find do not match.
The seek_flags parameter is a set of flags, encoded in an integer value, to fine-tune the exact behavior of the seek operator.
GetRangeWithMismatchedTypes returns a vector, a set of three values. Two of these are the lower and upper bound of the range in the target data type; the third appears to be always equal to the seek_flags parameter.
LikeRangeEndexpressionReturns the high end of a range that encloses all values that can match a LIKE expression predicate, so that matching rows can be found using an Index Seek or other seek operator.
LikeRangeInfoexpressionReturns a set of flags, encoded in an integer value, to define certain seek properties for an Index Seek or other seek operator to find all rows in a range defined by a LIKE expression predicate.
LikeRangeStartexpressionReturns the low end of a range that encloses all values that can match a LIKE expression predicate, so that matching rows can be found using an Index Seek or other seek operator.
manufactureconstructordpathfirst
node_id
third
Used in XQuery evaluation; it is currently not known what exactly this function does, or what it returns.
The first parameter is a binary value; its function is currently unknown. I suspect that it holds binary encoded flags to control the behavior of the function.
The node_id parameter represents a node in the internal node table of an XML fragment.
The third parameter is an integer value; its function is currently unknown.
manufacturesubnodeordpathnode_id_1
node_id_2
node_id_3
Used in XQuery evaluation; it is currently not known what exactly this function does; the data is returns is typically passed to a UDX operator.
The three parameters node_id_1, node_id_2, and node_id_3 each represent a node in the internal node table of an XML fragment.

Actual execution versus deferred evaluation

In most (but not all!) cases where you see a Compute Scalar operator in an execution plan, it doesn’t actually execute; instead the operator merely functions as a placeholder to store the definitions of columns that can be computed through scalar expressions. The actual evaluation is done by other operators, as and when they need the expression result; this also means that the actual number of times the expression is evaluated is not related to the number of rows flowing through the Compute Scalar. See Paul White’s post for a more detailed explanation.

This does not always happen. There are also cases where the Compute Scalar operator does actually execute, to compute the Defined Values expression(s) for each row that passes through and add the results as extra columns in the rows returned. This seems to be triggered by the actual functions used in the Defined Values property. So far, actual execution of Compute Scalar as opposed to deferred evaluation by other operators seems to be triggered by the use of the GetRange­Through­Convert or GetRange­With­Mismatched­Types functions (see table above), or the use of any scalar user-defined function in the Defined Values property.

Though a Compute Scalar can define more than one column in its Defined Values property, it appears that it never mixes expressions for which the evaluation can be deferred with those that it has to compute during actual execution. If a query uses for example both a scalar user-defined function and a simple expression, the execution plan will contain two Compute Scalar operators; one that will actually execute (for the user-defined function) and one that is merely a container for the definition of a deferred evaluated expression.

In an execution plan plus run-time statistics, it is easy to tell whether a Compute Scalar was actually executed, or whether evaluation was deferred. In the case of deferred evaluation, none of the run-time statistics normally associated with an execution plan plus run-time statistics will be reported for this Compute Scalar. A Compute Scalar that actually executes will report all these properties as normal.

When a Compute Scalar runs in batch mode execution, it always evaluates all expressions as part of its operator. Deferred evaluation is not supported for batch mode.

Operator properties

The properties below are specific to the Compute Scalar operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Segment operator are marked with a *.

Property nameDescription
Actual everything *As mentioned in the main text, Compute Scalar often only defines expressions that qualify for deferred evaluation; the actual evaluation is then done in other operators when the result is needed. In that case the Compute Scalar does not execute, hence it will not return any of the properties for run-time statistics.
Defined Values *A Compute Scalar operator always has a Defined Values property. In some cases, the definition of the expression uses functions that Management Studio is unable to display. In such cases, the Defined Values property will appear empty. Looking at the execution plan XML is the only way to find the actual definition in these cases.
Estimated CPU Cost *For Compute Scalar, the Estimated CPU Cost is always equal to the Estimated Number of Rows, multiplied by 0.0000001 (one millionth). Neither the number of expressions in Defined Values, nor their complexity has any effect on this. Even if scalar user-defined functions are invoked with a known higher cost, they do not affect this cost estimate.
Estimated I/O Cost *For Compute Scalar, the Estimated I/O Cost is always equal to zero. Even if scalar user-defined functions are invoked that are known to do I/O, they do not affect this cost estimate.
Number of Executions *As mentioned in the main text, Compute Scalar often only defines expressions that qualify for deferred evaluation; the actual evaluation is then done in other operators when the result is needed. In that case the Compute Scalar does not execute, hence it will not return any of the properties for run-time statistics.

Implicit properties

This table below lists the behavior of the implicit properties for the Compute Scalar operator.

Property nameDescription
Batch Mode enabledThe Compute Scalar operator supports both row mode and batch mode execution. In batch mode, deferred evaluation is not supported.
BlockingThe Compute Scalar operator is non-blocking.
Memory requirementThe Compute Scalar operator does not have any special memory requirement.
Order-preservingThe Compute Scalar operator is fully order-preserving.
Parallelism awareThe Compute Scalar operator is not parallelism aware.
Segment awareThe Compute Scalar operator is not segment aware.
Menu

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close