The Segment operator is used to insert a “segment column” in a data stream. The segment column is used to mark the start of a new segment (where a segment is a set of consecutive columns with the same value in one or more predefined columns, called the grouping columns).
Other operators use the segment column to change their behavior when a new segment starts in the data. An example of this is Sequence Project, which resets its internal counters when a new segment starts. Only segment-aware columns use the segment column, and a Segment operator is never used in an execution plan that does not have at least one segment-aware column to use the segment column.
The Segment operator expects the input data to be sorted by the grouping columns. This can be achieved by using an explicit Sort operator or by relying on order-imposing and order-preserving properties of other operators.
Visual appearance in execution plans
Depending on the tool being used, a Segment operator is displayed in a graphical execution plan as shown below:
SQL Server Management Studio
Azure Data Studio
(version 17.4 and up)
(until version 17.3)
The basic algorithm for the Segment operator is as shown below:
Detecting segment change
The logic to detect when the segment changes is based on comparing the values in the current row to the values held in in-memory variables. These variables are then set to the values in the current row, for the comparison of the next row. These comparisons are based on the standard SQL definition of “distinctness”, not “inequality”. (The difference is that two NULL values are considered unequal but not distinct).
For the first row that the operator processes after being initialized, the in-memory variables are not yet initialized; this is always considered as a segment change.
The properties below are specific to the Segment operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Segment operator are marked with a *.
|Group By||Lists the columns from the input stream that define the segment boundaries. Whenever the value in one or more of the Group By columns is distinct from the values in the previous row, the row will be marked as the start of a new segment. All other rows are not marked.
It is possible for the Group By property to be present but empty. In this case, the entire input is considered a single segment.
|Output List *||The Output List of a Segment operator typically includes zero or more of the columns from its input, and always adds exactly one new column. This new column is detailed in the Segment Column property; it is not defined or described in a Defined Values property.|
|Segment Column||This property specifies the name of the column that is added to the output by the Segment operator. This is a bit column, that is set to 1 on the first row of each segment, and to 0 on all other rows. The Segment Column always has a name of the form Segmentnnnn and it is always included in the Output List property.|
This table below lists the behavior of the implicit properties for the Segment operator.
|Batch Mode enabled||The Segment operator supports row mode execution only.|
|Blocking||The Segment operator is non-blocking.|
|Memory requirement||The Segment operator does not have any special memory requirement. The internal variables used to detect a group change are stored in the standard working memory that all operators get assigned.|
|Order-preserving||The Segment operator is fully order-preserving.|
|Parallelism aware||The Segment operator is not parallelism aware.|
|Segment aware||The Segment operator is not segment aware.|