Aggregate Functions
Introduction
SQL Server currently supports three operators that can compute aggregations: Hash Match, Stream Aggregate, and Window Aggregate. These operators all use the same basic principle of maintaining internal counters as rows are processed, so that the final value of those internal counters is the expected value.
Supported aggregate functions
The list of aggregate functions that is supported by the aggregation operators is not a direct match of the list of aggregate functions supported by T-SQL. Some aggregate functions that are allowed in a query are not computed directly in an execution plan, but through a workaround. For instance, if a query uses the AVG function, the execution plan computes two values, a COUNT and a SUM, and then uses a Compute Scalar operator to compute the requested average by dividing these two values (often with some special logic to handle empty sets).
On the other hand, there are also aggregation functions that are used in aggregation operators but not allowed in T-SQL, such as ANY.
The table below lists all currently known aggregate functions, with a description, and details of the initial value assigned to the internal counters and how those counters are updated as rows are processed. In some cases there are multiple ways to achieve the same result; since Microsoft has not released any official documentation I have listed the most likely method in those cases.
Function name | Description | Initialize counters | Update counters |
---|---|---|---|
ANY (expression) | Returns one of the values seen for expression in the group. It is not known which value is returned, I expect it to be the first value. Though technically any of the values for expression from the input rows could be considered valid, the actual implementation of this function always returns the value from the first row it receives. | Set to expression. | Leave unchanged. |
APPROX_COUNT_DISTINCT_ACCUM (expression) | Returns a set of values, split based on a hash of expression, with for each set the highest position of the leftmost “1” bit in the hash of expression in that set. (See: hyperloglog). Note that the returned result is a single value, probably encoded by combining the values for each set into a single value. Introduced in SQL Server 2019. | Set to null for each set. | Compute the hash of expression; use the right-most bits to determine the set; compute the position of the leftmost “1” bit in the hash; update the counter if this is higher than the previous value. |
APPROX_PERCENTILE_ACCUM (expression) | Returns a data sketch that represents the input data, or a random sampling of it when the total input data size is too large. The data sketch is a sorted list of each distinct non-null input value, with a “weight” to represent the cardinality of that value. (See: KLL sketch). Note that the sketch is returned as a single value, probably encoded by combining all values and weights into a single value. Introduced in SQL Server 2022. | Set to an empty sketch. | If expression is not null, then increase its weight if it’s already in the sketch, or add it with weight 1 otherwise. If this pushes the sketch past its maximum size, remove all entries in either odd- or even-numbered positions (chosen at random) and double the weight of all remaining entries. |
CHECKSUM_AGG (expression) | Returns a checksum value computed from all non-null values of expression within the group. | Set to null. | Currently unknown. |
Count (*) | Returns the total number of rows in the group, regardless of input values, as a bigint. | Set to 0. | Increment by 1. |
COUNT (expression) | Returns the total number of rows in the group where expression is not null, as an int. | Set to 0. | If expression is not null, then increment by 1. |
COUNT_BIG (expression) | Returns the total number of rows in the group where expression is not null, as a bigint. | Set to 0. | If expression is not null, then increment by 1. |
FIRST_VALUE_IS_NULL (expression) | Returns true if the first value of expression within the group is null. | Set to null. | If counter is null then set to true if expression is null or set to false if expression is not null. |
FIRST_VALUE_IGNORE_NULLS (expression) | Returns the first non-null value of expression within the group. | Set to null. | If counter is null then set to expression. |
LAST_VALUE_IGNORE_NULLS (expression) | Returns the last non-null value of expression within the group. Introduced in SQL Server 2022. | Set to null. | If expression is not null, then set to expression. |
LAST_VALUE (expression) | Returns the last value of expression within the group (regardless of whether or not it is null). | Set to null. | Set to expression. |
MAX (expression) | Returns the maximum non-null value of expression in the group. | Set to null. | If counter is null or expression > counter, then set to expression. |
MIN (expression) | Returns the minimum non-null value of expression in the group. | Set to null. | If counter is null or expression < counter, then set to expression. |
STATMAN (expression) | Used in execution plans for UPDATE STATISTICS. expression is the column for which statistics are updated. It is currently unknown what data type is returned by STATMAN, nor what data is represented in the returned value. | Currently unknown. | Currently unknown. |
STATMANMERGE (expression) | Used in parallel execution plans for UPDATE STATISTICS. expression is the output of the STATMAN aggregation function on one of the threads. STATMANMERGE combines the results of all threads in a serial section of the execution plan. It is currently unknown what data type is returned by STATMANMERGE, nor what data is represented in the returned value. | Currently unknown. | Currently unknown. |
STRING_AGG (expression, separator) | Returns a string consisting of all non-null values of expression within the group, separated by separator. Introduced in SQL Server 2017. | Set to null. | If counter is null then set to expression; else set to counter + separator + expression. |
SUM (expression) | Returns the sum of all non-null values of expression in the group. | Set to 0. | If expression is not null then set to counter + expression. |
Any CLR user defined aggregate function | Returns values as determined by the CLR code. | Executes the Init() method of the CLR user defined aggregate. | Executes the Accumulate() method of the CLR user defined aggregate, passing it the value of expression. |
Missing any function?
There is no official documentation (that I know of) of the supported aggregate functions. The list above has been compiled based on what I have observed in execution plans that I have seen. It is very likely that this list is not complete yet.
You can help me complete this list! If you encounter an execution plan where Compute Scalar uses a keyword not listed here, please save it as a .sqlplan file and send it to me.
Thanks!