The Segment operator is used to insert a “segment column” in a data stream. The segment column is used to mark the start of a new segment (where a segment is a set of consecutive columns with the same value in one or more predefined columns, called the grouping columns).
Other operators use the segment column to change their behavior when a new segment starts in the data. An example of this is Sequence Project, which resets its internal counters when a new segment starts. Only segment-aware columns use the segment column, and a Segment operator is never used in an execution plan that does not have at least one segment-aware column to use the segment column.
The Segment operator expects the input data to be sorted by the grouping columns. This can be achieved by using an explicit Sort operator or by relying on order-imposing and order-preserving properties of other operators.
Visual appearance in execution plans
Depending on the tool being used, a Segment operator is displayed in a graphical execution plan as shown below:
SQL Server Management Studio
Azure Data Studio
(version 17.4 and up)
(until version 17.3)
The basic algorithm for the Segment operator is as shown below:
Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.
The flowchart shown above is based on two assumptions:
- In theory, the Segment operator can be built in two ways: it can mark either the start or the end of each segment (first row or last row). Both versions can be used to explain all operations of all segment-aware operators. However, there are a few cases where a segment-aware operation is harder to implement when end of a segment is marked, a lot where it doesn’t matter, and none where the implementation is harder when the start of the segment is marked. It is also easier for the Segment operator itself to mark the first row instead of the last. For that reason, I assume that Microsoft has chosen to mark the start of each segment. That operation is shown in the flowchart above.
- It is currently unknown what exact values are used in the segment column. There are strong indicators that support the theory that NULL represents the continuation of the previous segment and a non-null value marks the start of a new segment. What exact value is used is not known at all; the flowchart above assumes it’s 1.
Detecting segment change
The logic to detect when the segment changes is based on comparing the values in the current row to the values held in in-memory variables. These variables are then set to the values in the current row, for the comparison of the next row. These comparisons are based on the standard SQL definition of “distinctness”, not “inequality”. (The difference is that two NULL values are considered unequal but not distinct).
For the first row that the operator processes after being initialized, the in-memory variables are not yet initialized; this is always considered as a segment change.
The properties below are specific to the Segment operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Segment operator are marked with a *.
|Group By||Lists the columns from the input stream that define the segment boundaries. Whenever the value in one or more of the Group By columns is distinct from the values in the previous row, the row will be marked as the start of a new segment. All other rows are not marked.
It is possible for the Group By property to be present but empty. In this case, the entire input is considered a single segment.
|Output List *||The Output List of a Segment operator typically includes zero or more of the columns from its input, and always adds exactly one new column. This new column is detailed in the Segment Column property; it is not defined or described in a Defined Values property.|
|Segment Column||This property specifies the name of the column that is added to the output by the Segment operator. This column always has a name of the form Segmentnnnn and it is always included in the Output List property.|
This table below lists the behavior of the implicit properties for the Segment operator.
|Batch Mode enabled||The Segment operator supports row mode execution only.|
|Blocking||The Segment operator is non-blocking.|
|Memory requirement||The Segment operator does not have any special memory requirement. The internal variables used to detect a group change are stored in the standard working memory that all operators get assigned.|
|Order-preserving||The Segment operator is fully order-preserving.|
|Parallelism aware||The Segment operator is not parallelism aware.|
|Segment aware||The Segment operator is not segment aware.|