Aggregate Functions
Introduction
SQL Server currently supports three operators that can compute aggregations: Hash Match, Stream Aggregate, and Window Aggregate. These operators all use the same basic principle of maintaining internal counters as rows are processed, so that the final value of those internal counters is the expected value.
Supported aggregate functions
The list of aggregate functions that is supported by the aggregation operators is not a direct match of the list of aggregate functions supported by T-SQL. Some aggregate functions that are allowed in a query are not computed directly in an execution plan, but through a workaround. For instance, if a query uses the AVG function, the execution plan computes two values, a COUNT and a SUM, and then uses a Compute Scalar operator to compute the requested average by dividing these two values (often with some special logic to handle empty sets).
On the other hand, there are also aggregation functions that are used in aggregation operators but not allowed in T-SQL, such as ANY.
The table below lists all currently known aggregate functions, with a description, and details of the initial value assigned to the internal counters and how those counters are updated as rows are processed. In some cases there are multiple ways to achieve the same result; since Microsoft has not released any official documentation I have listed the most likely method in those cases.
Function name | Description | Initialize counters | Update counters |
---|---|---|---|
ANY (expression) | Returns one of the values seen for expression in the group. It is not known which value is returned, I expect it to be the first value. | Set to null. | Set to expression if counter is null. |
APPROX_COUNT_DISTINCT_ACCUM (expression) | Returns a set of values, split based on a hash of expression, with for each set the highest position of the leftmost “1” bit in the hash of expression in that set. (See: hyperloglog). Note that the returned result is a single value, probably encoded by combining the values for each set into a single value. | Set to null for each set. | Compute the hash of expression; use the right-most bits to determine the set; compute the position of the leftmost “1” bit in the hash; update the counter if this is higher than the previous value. |
CHECKSUM_AGG (expression) | Returns a checksum values computed of all non-null values of expression within the group. | Set to null. | Currently unknown. |
COUNT (*) | Returns the total number of rows in the group, regardless of input values. | Set to 0. | Increment by 1. |
COUNT (expression) | Returns the total number of rows in the group where expression is not null. | Set to 0. | Increment by 1 if expression is not null; leave unchanged otherwise. |
FIRST_VALUE_IS_NULL (expression) | Returns true if the first value of expression within the group is null. | Set to null. | If counter is null then set to true if expression is null or set to false if expression is not null; leave unchanged otherwise. |
FIRST_VALUE_IGNORE_NULLS (expression) | Returns the first non-null value of expression within the group. | Set to null. | Set to expression if counter is null; leave unchanged otherwise. |
LAST_VALUE (expression) | Returns the last value of expression within the group (regardless of whether or not it is null). | Set to null. | Set to expression. |
MAX (expression) | Returns the maximum non-null value of expression in the group. | Set to null. | Set to expression if counter is null or counter < expression. |
MIN (expression) | Returns the minimum non-null value of expression in the group. | Set to null. | Set to expression if counter is null or counter > expression. |
STRING_AGG (expression, separator) | Returns a string consisting of all non-null values of expression within the group, separated by separator. | Set to null. | Set to expression if counter is null; set to counter + separator + expression otherwise. |
SUM (expression) | Returns the sum of all non-null values of expression in the group. | Set to 0. | Increment by expression if expression is not null; leave unchanged otherwise. |
Any CLR user defined aggregate function | Returns values as determined by the CLR code. | Executes the Init() method of the CLR user defined aggregate. | Executes the Accumulate() method of the CLR user defined aggregate, passing it the value of expression. |
Missing any function?
There is no official documentation (that I know of) of the supported aggregate functions. The list above has been compiled based on what I have observed in execution plans that I have seen. It is very likely that this list is not complete yet.
You can help me complete this list! If you encounter an execution plan where Compute Scalar uses a keyword not listed here, please save it as a .sqlplan file and send it to me.
Thanks!