Stream Aggregate
Introduction
The Stream Aggregate operator is one of the two operators that perform aggregation on their input. It has a single input and a single output, and typically the number of rows in its output is far less then in its input because it returns just a single row for each group of rows in the input.
Stream Aggregate is the most effective of the two aggregation operators. However, it requires its input data to be sorted by the grouping columns. Often this means that a Stream Aggregate can’t be used without adding an extra Sort operator. This extra sorts increases the total plan cost. In such cases, the optimizer tends to choose the alternative aggregation operator, Hash Match.
Visual appearance in execution plans
Depending on the tool being used, a Stream Aggregate operator is displayed in a graphical execution plan as shown below:
SQL Server Management Studio |
Azure Data Studio |
Plan Explorer |
|
(version 17.4 and up) |
(until version 17.3) |
||
![]() |
![]() |
![]() |
![]() |
Algorithm
The basic algorithm for the Stream Aggregate operator is as shown below:
Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.
The flowchart shows how the operator processes the data from the input. The rows for a group are accumulated in internal counters and returned as soon as the group is complete. The optimizer ensures that the input data is sorted by the Group By columns. That means that all rows for a single group are adjacent in the input stream; as soon as a group is ended its data can be returned and a new group started. The end of a group is detected by encountering either a new value in a Group By column or (obviously) be reaching the end of the input data. At that point, the accumulated aggregations for the “previous” group are returned and the counters are reset in preparation for a new group.
When the operator has no Group By property, a scalar aggregation (a single row with the aggregation of all input rows) is requested. Processing is mostly the same as for “normal” (grouped) aggregation; obviously since all data is now a single group, the group only ends at the end of all input. It is important to notice that the results of a local aggregation are returned in a slightly different place then for a grouped aggregation. This is to ensure that the expected row with the aggregation results is returned even if the input stream returns no rows at all. (For a grouped aggregation, an empty input results in an empty results, which would be incorrect for a scalar aggregation).
Counter details
The flowchart above shows when “internal counters” are initialized, updated, and returned as a row. In these internal counters, the following data is tracked:
- The value of each of the columns in the Group By property for the group of rows currently being processed. In “Initialize counters”, these counters are set equal to the values in the current row. In “Update counters”, they are not changed. In “Same group?”, they are compared to the values in the current row to determine whether the current row belongs to the same group. (And because the ANSI standard states that for the purpose of grouping all null values are to be considered part of a single group, null values are considered equal in this specific case).
- The “work in progress” values of all the aggregate functions that need to be computed by the operator, as specified in the Defined Values How these counters are set in “Initialize counters” and updated in “Update counters” is specified in the table of all aggregate functions.
Stream Aggregate and Window Spool
A Stream Aggregate operator adapts its behavior if its direct descendant is a Window Spool operator. This pattern is used for aggregates that use an OVER clause with an ORDER BY expression (and a ROWS or RANGE specification) to define aggregation over a window of rows. In this case:
- the Group By expression is always the WindowCount column returned from the Window Spool;
- the Output List property contains a lot of columns from the input, that are defined in the Defined Values property using the ANY aggregate function; and
- the Output List property also contains a few new expressions defined using other aggregate functions.
In this case, the first row received within each group skips the “Update counters” step. This is an extra row produced by Window Spool for the columns that will not be aggregated; the counter for the ANY function of those columns stores their values and then doesn’t change anymore. The “Update counters” step needs to be skipped in this case because this row is an extra copy; updating the counters for this row would result in a double count.
If fast-track optimization is enabled or when a RANGE specification is used, the “Initialize counters” step processes as normal, resetting all counters, for only the first group in a segment. For later groups in the same segment, the counters for columns defined with the ANY function are initialized again, but all other counters are left unchanged.
For a ROWS specification without fast-track optimization, the “Initialize counters” always processes as normal, resetting the counters for all columns.
Operator properties
The properties below are specific to the Stream Aggregate operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Stream Aggregate operator are marked with a *.
Property name | Description |
---|---|
Defined Values * | Names and definition for each of the aggregations that is computed within the Stream Aggregate operator. When this property is empty, no aggregation is done and the Stream Aggregate operator effectively performs a DISTINCT operation. |
Group By | Lists the columns from the input stream that define the group boundaries. The output will contain one row for each distinct combination of values in these columns. When this property is missing, a single row with scalar aggregation results is returned. |
Output List * | The Output List of a Stream Aggregate is always equal to zero or more of the columns in the Group By property, followed by all of the columns in the Defined Values property. |
Implicit properties
This table below lists the behavior of the implicit properties for the Stream Aggregate operator.
Property name | Description |
---|---|
Batch Mode enabled | The Stream Aggregate operator supports row mode execution only. |
Blocking | The Stream Aggregate operator is non-blocking. Note that obviously it has to read all rows for a group before it can return the single row for that group, but because it does not return any other rows this is not considered blocking. For a scalar aggregation, Stream Aggregate will (obviously) block because it is impossible to return correct results without processing the entire input. |
Memory requirement | The Stream Aggregate operator does not have any special memory requirement. The internal counters are stored in the standard working memory that all operators get assigned. |
Order-preserving | The Stream Aggregate operator is fully order-preserving. |
Parallelism aware | The Stream Aggregate operator is not parallelism aware. |
Segment aware | The Stream Aggregate operator is not segment aware, except when Stream Aggregate has a Window Spool as its child. In that case, the input must be segmented, as described above. |