The Stream Aggregate operator is one of the two operators that perform aggregation on their input. It has a single input and a single output, and typically the number of rows in its output is far less then in its input because it returns just a single row for each group of rows in the input.
Stream Aggregate is the most effective of the two aggregation operators. However, it requires its input data to be sorted by the grouping columns. Often this means that a Stream Aggregate can’t be used without adding an extra Sort operator. This extra sorts increases the total plan cost. In such cases, the optimizer tends to choose the alternative aggregation operator, Hash Match.
Visual appearance in execution plans
Depending on the tool being used, a Stream Aggregate operator is displayed in a graphical execution plan as shown below:
SQL Server Management Studio
Azure Data Studio
(version 17.4 and up)
(until version 17.3)
The basic algorithm for the Stream Aggregate operator is as shown below:
Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.
The flowchart shows how the operator processes the data from the input. The rows for a group are accumulated in internal counters and returned as soon as the group is complete. The optimizer ensures that the input data is sorted by the Group By columns. That means that all rows for a single group are adjacent in the input stream; as soon as a group is ended its data can be returned and a new group started. The end of a group is detected by encountering either a new value in a Group By column or (obviously) be reaching the end of the input data. At that point, the accumulated aggregations for the “previous” group are returned and the counters are reset in preparation for a new group.
When the operator has no Group By property, a scalar aggregation (a single row with the aggregation of all input rows) is requested. Processing is mostly the same as for “normal” (grouped) aggregation; obviously since all data is now a single group, the group only ends at the end of all input. It is important to notice that the results of a local aggregation are returned in a slightly different place then for a grouped aggregation. This is to ensure that the expected row with the aggregation results is returned even if the input stream returns no rows at all. (For a grouped aggregation, an empty input results in an empty results, which would be incorrect for a scalar aggregation).
The flowchart above shows when “internal counters” are initialized, updated, and returned as a row. In these internal counters, the following data is tracked:
- The value of each of the columns in the Group By property for the group of rows currently being processed. In “Initialize counters”, these counters are set equal to the values in the current row. In “Update counters”, they are not changed. In “Same group?”, they are compared to the values in the current row to determine whether the current row belongs to the same group. (And because the ANSI standard states that for the purpose of grouping all null values are to be considered part of a single group, null values are considered equal in this specific case).
- The “work in progress” values of all the aggregate functions that need to be computed by the operator, as specified in the Defined Values How these counters are set in “Initialize counters” and updated in “Update counters” is specified in the table below.
Supported aggregate functions
The list of aggregate functions that is supported by Stream Aggregate is not a direct match of the list of aggregate functions supported by T-SQL. Some aggregate functions that are allowed in a query are not computed directly in an execution plan, but through a workaround. For instance, a query that uses the
AVG function can result in an execution plan where a Stream Aggregate computes two values, a COUNT and a SUM; a Stream Aggregate operator is then used to compute the requested average from these two values.
On the other hand, there are also aggregation functions that the optimizer can use in a Stream Aggregate but that are not allowed in T-SQL, such as ANY.
The following is a list of the aggregate functions currently known to be supported by Stream Aggregate. (If you encounter an execution plan that includes a different function, then please save it as a .sqlplan file and send it to me).
|Function name||Description||Initialize counters||Update counters|
|ANY (expression)||Returns one of the values seen for expression in the group.|
It is not known which value is returned, I expect it to be the first value.
|Set to expression||Leave unchanged|
|CHECKSUM_AGG (expression)||Returns a checksum values computed of all non-null values of expression within the group.||Set to null.||Currently unknown.|
|COUNT (*)||Returns the total number of rows in the group, regardless of input values.||Set to 0.||Increment by 1.|
|COUNT (expression)||Returns the total number of rows in the group where expression is not null.||Set to 0.||Increment by 1 if expression is not null; leave unchanged otherwise.|
|FIRST_VALUE_IS_NULL (expression)||Returns true if the first value of expression within the group is null.||Set to true if expression is null; set to false otherwise.||Leave unchanged.|
|FIRST_VALUE_IGNORE_NULLS (expression)||Returns the first non-null value of expression within the group.||Set to null.||Set to expression if counter is null; leave unchanged otherwise.|
|LAST_VALUE (expression)||Returns the last value of expression within the group (regardless of whether or not it is null).||Currently unknown.||Set to expression.|
|MAX (expression)||Returns the maximum non-null value of expression in the group.||Set to null.||Set to expression if counter is null or counter < expression.|
|MIN (expression)||Returns the minimum non-null value of expression in the group.||Set to null.||Set to expression if counter is null or counter > expression.|
|STRING_AGG (expression, separator)||Returns a string consisting of all non-null values of expression within the group, separated by separator.||Set to null.||Set to expression if counter is null; set to counter + separator + expression otherwise.|
|SUM (expression)||Returns the sum of all non-null values of expression in the group.||Set to 0.||Increment by expression if expression is not null; leave unchanged otherwise.|
|Any CLR user defined aggregate function||Returns values as determined by the CLR code.||Executes the Init() method of the CLR user defined aggregate.||Executes the Accumulate() method of the CLR user defined aggregate, passing it the value of expression.|
Stream Aggregate and Window Spool
A Stream Aggregate operator adapts its behavior if its direct descendant is a Window Spool operator. This pattern is used for aggregates that use an OVER clause with an ORDER BY expression (and a ROWS or RANGE specification) to define aggregation over a window of rows. In this case:
- the Group By expression is always the WindowCount column returned from the Window Spool;
- the Output List property contains a lot of columns from the input, that are defined in the Defined Values property using the ANY aggregate function; and
- the Output List property also contains a few new expressions defined using other aggregate functions.
In this case, the first row received within each group skips the “Update counters” step. This is an extra row produced by Window Spool for the columns that will not be aggregated; the counter for the ANY function of those columns stores their values and then doesn’t change anymore. The “Update counters” step needs to be skipped in this case because this row is an extra copy; updating the counters for this row would result in a double count.
If fast-track optimization is enabled, then the “Initialize counters” step will process as normal, resetting all counters, for only the first group in a segment. For later groups in the same segment, the counters for columns defined with the ANY function are initialized again, but all other counters are left unchanged.
Without fast-track optimization, the “Initialize counters” always processes as normal, resetting the counters for all columns.
The properties below are specific to the Stream Aggregate operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Stream Aggregate operator are marked with a *.
|Defined Values *||Names and definition for each of the aggregations that is computed within the Stream Aggregate operator. When this property is empty, no aggregation is done and the Stream Aggregate operator effectively performs a
|Group By||Lists the columns from the input stream that define the group boundaries. The output will contain one row for each distinct combination of values in these columns. When this property is missing, a single row with scalar aggregation results is returned.|
|Output List *||The Output List of a Stream Aggregate is always equal to zero or more of the columns in the Group By property, followed by all of the columns in the Defined Values property.|
This table below lists the behavior of the implicit properties for the Stream Aggregate operator.
|Batch Mode enabled||The Stream Aggregate operator supports row mode execution only.|
|Blocking||The Stream Aggregate operator is non-blocking.
Note that obviously it has to read all rows for a group before it can return the single row for that group, but because it does not return any other rows this is not considered blocking.
For a scalar aggregation, Stream Aggregate will (obviously) block because it is impossible to return correct results without processing the entire input.
|Memory requirement||The Stream Aggregate operator does not have any special memory requirement. The internal counters are stored in the standard working memory that all operators get assigned.|
|Order-preserving||The Stream Aggregate operator is fully order-preserving.|
|Parallelism aware||The Stream Aggregate operator is not parallelism aware.|
|Segment aware||The Stream Aggregate operator is not segment aware, except when Stream Aggregate has a Window Spool as its child. In that case, the segmented input is used as described above.|