Stream Aggregate

Introduction

The Stream Aggregate operator is one of the two operators that perform aggregation on their input. It has a single input and a single output, and typically the number of rows in its output is far less then in its input because it returns just a single row for each group of rows in the input.

Stream Aggregate is the most effective of the two aggregation operators. However, it requires its input data to be sorted by the grouping columns. Often this means that a Stream Aggregate can’t be used without adding an extra Sort operator. This extra sorts increases the total plan cost. In such cases, the optimizer tends to choose the alternative aggregation operator, Hash Match.

Visual appearance in execution plans

Depending on the tool being used, a Stream Aggregate operator is displayed in a graphical execution plan as shown below:

SQL Server Management Studio Azure Data Studio
(until version 17.3) (version 17.4 and up)

Algorithm

The basic algorithm for the Stream Aggregate operator is as shown below:

Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.

The flowchart shows how the operator processes the data from the input. The rows for a group are accumulated in internal counters and returned as soon as the group is complete. The optimizer ensures that the input data is sorted by the Group By columns. That means that all rows for a single group are adjacent in the input stream; as soon as a group is ended its data can be returned and a new group started. The end of a group is detected by encountering either a new value in a Group By column or (obviously) be reaching the end of the input data. At that point, the accumulated aggregations for the “previous” group are returned and the counters are reset in preparation for a new group.

When the operator has no Group By property, a scalar aggregation (a single row with the aggregation of all input rows) is requested. Processing is mostly the same as for “normal” (grouped) aggregation; obviously since all data is now a single group, the group only ends at the end of all input. It is important to notice that the results of a local aggregation are returned in a slightly different place then for a grouped aggregation. This is to ensure that the expected row with the aggregation results is returned even if the input stream returns no rows at all. (For a grouped aggregation, an empty input results in an empty results, which would be incorrect for a scalar aggregation).

Counter details

The flowchart above shows when “internal counters” are initialized, updated, and returned as a row. In these internal counters, the following data is tracked:

  1. The value of each of the columns in the Group By property for the group of rows currently being processed. In “Initialize counters”, these counters are set equal to the values in the current row. In “Update counters”, they are not changed. In “Same group?”, they are compared to the values in the current row to determine whether the current row belongs to the same group. (And because the ANSI standard states that for the purpose of grouping all null values are to be considered part of a single group, null values are considered equal in this specific case).
  2. The “work in progress” values of all the aggregate functions that need to be computed by the operator, as specified in the Defined Values How these counters are set in “Initialize counters” and updated in “Update counters” is specified in the table below.

Supported aggregate functions

The list of aggregate functions that is supported by Stream Aggregate is not a direct match of the list of aggregate functions supported by T-SQL. Some aggregate functions that are allowed in a query are not computed directly in an execution plan, but through a workaround. For instance, a query that uses the AVG function can result in an execution plan where a Stream Aggregate computes two values, a COUNT and a SUM; a Stream Aggregate operator is then used to compute the requested average from these two values.

On the other hand, there are also aggregation functions that the optimizer can use in a Stream Aggregate but that are not allowed in T-SQL, such as ANY.

The following is a list of the aggregate functions currently known to be supported by Stream Aggregate. (If you encounter an execution plan that includes a different function, then please save it as a .sqlplan file and send it to me).

Function nameDescriptionInitialize countersUpdate counters
ANY (expression)Returns one of the values seen for expression in the group.
It is not known which value is returned, I expect it to be the first value.
Currently unknown.
(Probably: set to expression)
Currently unknown.
(Probably: leave unchanged)
CHECKSUM_AGG (expression)Returns a checksum values computed of all non-null values of expression within the group.Set to null.Currently unknown.
COUNT (*)Returns the total number of rows in the group, regardless of input values.Set to 0.Increment by 1.
COUNT (expression)Returns the total number of rows in the group where expression is not null.Set to 0.Increment by 1 if expression is not null; leave unchanged otherwise.
FIRST_VALUE_IS_NULL (expression)Returns true if the first value of expression within the group is null.Set to true if expression is null; set to false otherwise.Leave unchanged.
FIRST_VALUE_IGNORE_NULLS (expression)Returns the first non-null value of expression within the group.Set to null.Set to expression if counter is null; leave unchanged otherwise.
LAST_VALUE (expression)Returns the last value of expression within the group (regardless of whether or not it is null).Currently unknown.Set to expression.
MAX (expression)Returns the maximum non-null value of expression in the group.Set to null.Set to expression if counter is null or counter < expression.
MIN (expression)Returns the minimum non-null value of expression in the group.Set to null.Set to expression if counter is null or counter > expression.
STRING_AGG (expression, separator)Returns a string consisting of all non-null values of expression within the group, separated by separator.Set to null.Set to expression if counter is null; set to counter + separator + expression otherwise.
SUM (expression)Returns the sum of all non-null values of expression in the group.Set to 0.Increment by expression if expression is not null; leave unchanged otherwise.
Any CLR user defined aggregate functionReturns values as determined by the CLR code.Executes the Init() method of the CLR user defined aggregate.Executes the Accumulate() method of the CLR user defined aggregate, passing it the value of expression.

Operator properties

The properties below are specific to the Stream Aggregate operator, or have a specific meaning when appearing on it. For all other properties, see Common properties.

Property nameDescription
Defined ValuesNames and definition for each of the aggregations that is computed within the Stream Aggregate operator. When this property is empty, no aggregation is done and the Stream Aggregate operator effectively performs a DISTINCT operation.
Group ByLists the columns from the input stream that define the group boundaries. The output will contain one row for each distinct combination of values in these columns. When this property is missing, a single row with scalar aggregation results is returned.
Logical OperationAlways equal to Aggregate.
Output ListThe Output List of a Stream Aggregate is always equal to zero or more of the columns in the Group By property, followed by all of the columns in the Defined Values property.

Implicit properties

This table below lists the behavior of the implicit properties for the Stream Aggregate operator.

Property nameDescription
Batch Mode enabledThe Stream Aggregate operator supports row mode execution only.
BlockingThe Stream Aggregate operator is non-blocking.
Note that obviously it has to read all rows for a group before it can return the single row for that group, but because it does not return any other rows this is not considered blocking.
For a scalar aggregation, Stream Aggregate will (obviously) block because it is impossible to return correct results without processing the entire input.
Memory requirementThe Stream Aggregate operator does not have any special memory requirement. The internal counters are stored in the standard working memory that all operators get assigned.
Order-preservingThe Stream Aggregate operator is fully order-preserving.
Parallelism awareThe Stream Aggregate operator is not parallelism aware.
Segment awareThe Stream Aggregate operator is not segment aware.
Menu

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close