Window Aggregate
Introduction
The Window Aggregate operator, which was introduced in SQL Server 2016, is a batch mode only operator that provides fast computation of window aggregate functions, as well as several other functions with a window specification.
As the name implies, Window Aggregate can be used to compute “window aggregates” – aggregate functions with a window specification (OVER clause). It supports both frameless windows (OVER clause with only a PARTITION BY as well as window frames (OVER CLAUSE with an ORDER BY and a ROWS or RANGE specification), though not all possible specifications are supported.
Visual appearance in execution plans
Depending on the tool being used, a Window Aggregate operator is displayed in a graphical execution plan as shown below:
SSMS and ADS |
Legacy SSMS |
Plan Explorer |
Paste The Plan |
Algorithm
According to information found here, the Window Aggregate operator does not have a single code path, but uses “a dedicated code path for each function”. I interpret this statement to mean that it uses a different code path depending on the type of window it has to process, not a different code path for each function (e.g. SUM, MAX) that is listed in the Defined Values. This interpretation is supported by the fact that multiple different aggregates can be computed by a single Window Aggregate operator, as long as they all use the same window specification. However, multiple different window and frame specifications require independent instances of Window Aggregate.
The Window Aggregate does of course need to “know” the type of window and frame that are specified. However, this information is not available anywhere in the graphical execution plan, nor in the execution plan XML. My assumption is that there are properties for this in the actual internal representation of the execution plan, but that, due to an oversight, these properties are not converted to the XML representation of the plan.
Window Aggregate has been observed to be used for the following window types:
Frameless windows
A single Window Aggregate operator can compute one or more aggregates over a frameless window (an OVER clause with a PARTITION BY specification, but without an ORDER BY and a ROWS or RANGE specification), as long as those aggregates all use the same window specification. The input has to be sorted by the PARTITION BY column(s).
Effectively, the Window Aggregate internally replicates the combined logic of the Segment, Table Spool, Nested Loops, and Stream Aggregate operators, as explained here, but optimized for batch mode.
In order to produce the results, Window Aggregate has to store all rows of the current segment (partition). These are stored in memory, but if there are more rows than fit in the available memory, the operator spills to tempdb.
As far as currently known, Window Aggregate supports the following aggregate functions for frameless windows: SUM, MIN, MAX, COUNT, COUNT_BIG, and Count(). It always produces one extra column with a Count(); this extra column is always the first generated column in the Output List property, and it is never specified in the Defined Values property.
Row-framed windows
A single Window Aggregate operator can compute one or more aggregates over a row-framed window (an OVER clause with an ORDER BY and a ROWS specification, with or without a PARTITION BY specification), as long as those aggregates all use the same window specification. The input has to be sorted by the PARTITION BY and ORDER BY column(s).
The Window Aggregate operator only has support for a single row-framed expression: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Queries that use ROWS BETWEEN CURRENT ROW AND UNBOUDED FOLLOWING can still result in an execution plan with a Window Aggregate function, because the optimizer can change the sort order. Queries that use ROWS BETWEEN UNBOUDED PRECEDING AND UNBOUDED FOLLOWING will internally be converted in the equivalent frameless window, by removing both the ORDER BY and the ROWS specification. However, for queries that n PRECEDING, n FOLLOWING, or the (nonsensical but legal) ROWS BETWEEN CURRENT ROW AND CURRENT ROW, Window Aggregate cannot be used. Execution plans for these queries will use row mode Window Spool and Stream Aggregate operators instead.
Effectively, the Window Aggregate internally replicates the combined logic of the Segment, Sequence Project, Window Spool, and Stream Aggregate operators (with fast-track optimization enabled), as explained here, but optimized for batch mode.
Because Window Aggregate is only used for ROWS BETWEEN UNBOUDED PRECEDING AND CURRENT ROW, there is no need to store any data beyond the current row and the counters for the aggregation results. For this code path of Windows Aggregate, there is no unusual memory requirement, and no risk of spilling to tempdb.
As far as currently known, Window Aggregate supports the following aggregate functions for row-framed windows: SUM, MIN, MAX, COUNT, COUNT_BIG, and Count(). It always produces one extra column with a Count(); this extra column is always the first generated column in the Output List property, and it is never specified in the Defined Values property.
Range-framed windows
A single Window Aggregate operator can compute one or more aggregates over a range-framed window (an OVER clause with an ORDER BY and either an explicit or the implied default RANGE specification, with or without a PARTITION BY specification), as long as those aggregates all use the same window specification. The input has to be sorted by the PARTITION BY and ORDER BY column(s).
The Window Aggregate operator only has support for a single range-framed expression: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Queries that use RANGE BETWEEN CURRENT ROW AND UNBOUDED FOLLOWING can still result in an execution plan with a Window Aggregate function, because the optimizer can change the sort order. Queries that use RANGE BETWEEN UNBOUDED PRECEDING AND UNBOUDED FOLLOWING will internally be converted in the equivalent frameless window, by removing both the ORDER BY and the ROWS specification. Queries that use RANGE BETWEEN CURRENT ROW AND CURRENT ROW will internally be converted in the equivalent frameless window, by adding the ORDER BY columns to the PARTITION BY clause and then removing both the ORDER BY and the ROWS specification.
Effectively, the Window Aggregate internally replicates the combined logic of the Segment, Window Spool, and Stream Aggregate operators (with fast-track optimization enabled), as explained here, but optimized for batch mode.
Because Window Aggregate is only used for RANGE BETWEEN UNBOUDED PRECEDING AND CURRENT ROW, there is no need to store all data in the frame. However, the operator still does need to store all rows with the same PARTITION BY and ORDER BY values as the current row. These are stored in memory, but if there are more rows than fit in the available memory, the operator spills to tempdb.
As far as currently known, Window Aggregate supports the following aggregate functions for range-framed windows: SUM, MIN, MAX, COUNT, COUNT_BIG, and Count(). It always produces one extra column with a Count(); this extra column is always the first generated column in the Output List property, and it is never specified in the Defined Values property.
Ranking functions
A single Window Aggregate operator can compute one or more ranking functions, as long as those ranking functions all use the same window specification (an OVER clause with an ORDER BY, with or without a PARTITION BY specification). The input has to be sorted by the PARTITION BY and ORDER BY column(s).
Effectively, the Window Aggregate internally replicates the combined logic of the Segment and Sequence Project operators, but optimized for batch mode.
When Window Aggregate is used for ranking functions, there is no need to store any data beyond the current row and the previous row’s values for the counters. For this code path of Windows Aggregate, there is no unusual memory requirement, and no risk of spilling to tempdb.
As far as currently known, Window Aggregate supports the following ranking functions: row_number, row_number_ignore_nulls, rank, and dense_rank. In this case, no extra column with a Count(*) is returned.
LAST_VALUE()
A single Window Aggregate operator can compute the result of one or more LAST_VALUE functions, as long as those ranking functions all use the same window specification (an OVER clause with an ORDER BY, with or without a PARTITION BY specification). The input has to be sorted by the PARTITION BY and ORDER BY column(s).
Effectively, the Window Aggregate internally uses the same logic as for row-framed windows, but a LAST_VALUE() computation will not be combined with other row-framed window expressions. This also means that there is no unusual memory requirement and no risk of spilling to tempdb.
In this case, no extra column with a Count(*) is returned.
The Window Aggregate operator has thus far only been observed to compute LAST_VALUE functions for queries that specify the LAG or LEAG analytic functions. When a LAST_VALUE function is explicitly used in a query, the optimizer will not choose the Window Aggregate operator, but fall back to the traditional row mode method instead.
Operator properties
The properties below are specific to the Window Aggregate operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Window Aggregate operator are marked with a *.
Property name | Description |
---|---|
Defined Values * | Names and definition for each of the aggregations that is computed within the Window Aggregate operator, except the first. In some cases, as described in the main text, there is no specification for the first generated column; in those cases that column is always computed as Count(*). The optimizer does not always detect this as a duplicate when the execution plan also requires a Count(*) to be used by the parent operators, so you may find an additional, explicitly defined, Count(*) expression here as well. |
Output List * | The Output List of a Window Aggregate is always equal to zero or more of the columns in the Group By property, followed by all new columns to be generated by the operator (as specified in the Defined Values property). |
Implicit properties
This table below lists the behavior of the implicit properties for the Window Aggregate operator.
Property name | Description |
---|---|
Batch Mode enabled | The Window Aggregate operator supports batch mode execution only. |
Blocking | As far as currently known, the Window Aggregate operator is non-blocking. Since it processes sorted input, it can produce output as soon as all required rows have been read. This might involve some read-ahead processing, which in extreme cases can result in effectively blocking partially behavior. |
Memory requirement | The Window Aggregate operator requires enough memory to store the rows it needs for its operation (see descriptions above). Unlike other memory-consuming operators, Window Aggregate does not expose the Memory Fractions Input and Memory Fractions Output properties. It is currently unknown whether this is due to oversight or by design. |
Order-preserving | The Window Aggregate operator is, as far as currently known, fully order-preserving. However, the optimizer does not seem to recognize it as such. |
Parallelism aware | Just like al batch mode operators, the Window Aggregate operator is parallelism aware. |
Segment aware | The Window Aggregate operator is not segment aware. It can, when needed, use internal processing that is effectively similar to segmenting and then doing segment aware processing. |