Sequence Project
Introduction
The Sequence Project operator computes values for the “ranking functions”: functions where the results depend on other rows in the result set, such as ROW_NUMBER, RANK, DENSE_RANK, and NTILE.
A Sequence Project can be considered as somewhat similar in function as Compute Scalar: both operators add new columns to the data based on expression. But Compute Scalar works on expressions other columns from the same row and constant values as input. Sequence Project computes expressions that are based on preceding rows in the data stream as their input.
A Sequence Project processes data that is “prepared” by other operators in the execution plan: it is ordered by first the PARTITION BY expressions (if any) of the ranking function, then by their ORDER BY expressions. It also includes certain special columns, produced by other operators, as outlined below. The Sequence Project can then use this prepared input to keep running counters of its input, reset these as needed, and compute the desired results from those running counters.
Visual appearance in execution plans
Depending on the tool being used, a Sequence Project operator is displayed in a graphical execution plan as shown below:
SSMS and ADS |
Legacy SSMS |
Plan Explorer |
Paste The Plan |
Algorithm
The basic algorithm for the Sequence Project operator is as shown below:
Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.
Input requirements
As mentioned above, the input to the Sequence Project operator needs to be ordered by first the PARTITION BY expressions (if any), then the ORDER BY expressions for the ranking functions to be computed.
The input always includes a segment column, created by a Segment operator, to mark the start of each new partition in the ordered input stream. When this segment column is set, the internal row counters are reset to ensure a clean start of the next partition. Even when no PARTITION BY clause is used, this segment column must still be included. Since the entire input is considered a single partition in this case, it will be set on only the very first row of the input.
If one of the functions “rank” or “dense_rank” is specified in the Output List property, then the input also contains a second segment column, generated by another Segment operator. This second segment column marks a change in the ORDER BY columns; it is set when any of the expressions in the ORDER BY expressions change, but also on any change in the PARTITION BY expressions.
If the Output List includes the “ntile” function, then the input needs to include a column named AggResultnnnn (where nnnn is a four-digit number). This column contains the number of rows in the current partition.
Internal counters
Which internal counters the Sequence Project operator tracks, and how it initializes and updates them, depends on the requested functions, as specified in the Defined Values property. The table below lists all supported functions and how they work. (Some of these functions can be implemented by different algorithms; the one list below is what I consider the most likely, but it is possible that Microsoft actually used a different algorithm, to produces the same result).
A single Sequence Project operator can return just one of the listed functions, or multiple. However, multiple functions can only be computed by a single Sequence Project if they all use the exact same PARTITION BY and ORDER BY expressions.
Function name | Description | Initialize counters | Update counters |
---|---|---|---|
dense_rank | Returns the rank of each row, equal to the number of distinct ordering values in the partition that sort before the current row, plus one. | Set to 0. | Increment by 1 if the second segment column is set; leave unchanged otherwise. |
ntile | Assigns each row to one of a specified number of groups within the partition. (See note below). | See below. | See below. |
rank | Returns the rank of each row, equal to the number of rows in the partition that sort before the current row, plus one. | Set to 0. | Set equal to row_number if the second segment column is set; leave unchanged otherwise. |
row_number | Generates a unique ascending number for each row in the partition. | Set to 0. | Increment by 1. |
row_number_ignore_nulls | Generates a unique ascending number for each row in the partition, ignoring rows that have a null in a specific column. | Set to 0. | Increment by 1 if expression is not null. |
Note that the row_number_ignore_nulls function doesn’t correspond directly to any T-SQL function. The optimizer can use it to support the computation of other expressions, such as for instance PERCENTILE_DISC or PERCENTILE_CONT. The execution plan does not in any way expose which of the columns in the input needs to be checked for null values, neither in SSMS, nor in the underlying XML. I assume that this is due to a bug where some information stored in the actual internal representation of the execution plan is lost during the transformation to the XML format SQL Server uses to expose execution plans.
Ntile
For the ntile function, the Sequence Project operator uses three internal counters: the ntile counter for the actual results, a row_number counter (exactly the same as also used for the row_number function), and a rows_remaining counter to track when ntile needs to increase. It also uses the AggResultnnnn column from the input rows, which holds the total number of rows for the current partition, as well as an integer_expression to represent the number of groups to form (the input parameter n to the NTILE(n) function in the query). This integer_expression is not exposed in the execution plan, neither in SSMS, nor in the underlying XML. I assume that this is due to a bug where some information stored in the actual internal representation of the execution plan is lost during the transformation to the XML format SQL Server uses to expose execution plans.
At the start of each partition, in Initialize Counters, all three counters (ntile, row_number, and rows_remaining) are set to zero.
When processing a row, in Update Counters, the operator first checks if rows_remaining is zero. If so, then it sets rows_remaining to the number of rows for the next group, which is computed by dividing the number of remaining rows in the partition (AggResultnnnn minus row_number) by the number of remaining groups (integer_expression minus ntile), rounding up. After that, it increases the ntile counter by one.
Finally, regardless of whether rows_remaining was zero or not, the operator increases row_number by one and decreases rows_remaining by one.
The result of this is that all rows will be divided among groups of equal size, with the remainder (if any) assigned by adding one extra row to the first groups.
Operator properties
The properties below are specific to the Sequence Project operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Sequence Project operator are marked with a *.
Property name | Description |
---|---|
Defined Values * | For a Sequence Project operator, the Defined Values property can only use the special functions described above. |
Implicit properties
This table below lists the behavior of the implicit properties for the Sequence Project operator.
Property name | Description |
---|---|
Batch Mode enabled | The Sequence Project operator supports row mode execution only. |
Blocking | The Sequence Project operator is non-blocking. |
Memory requirement | The Sequence Project operator does not have any special memory requirement. |
Order-preserving | The Sequence Project operator is fully order-preserving. |
Parallelism aware | The Sequence Project operator is not parallelism aware. |
Segment aware | The Sequence Project operator always requires at least one segment column in the input, which causes all counters to reset. If the Defined Values property includes at least one rank or dense_rank function, then a second segment column is required to signal a change in the relevant ordering column(s). |