Segment

Introduction

The Segment operator is used to insert a “segment column” in a data stream. The segment column is used to mark the start of a new segment (where a segment is a set of consecutive columns with the same value in one or more predefined columns, called the grouping columns).

Other operators use the segment column to change their behavior when a new segment starts in the data. An example of this is Sequence Project, which resets its internal counters when a new segment starts. Only segment-aware columns use the segment column, and a Segment operator is never used in an execution plan that does not have at least one segment-aware column to use the segment column.

The Segment operator expects the input data to be sorted by the grouping columns. This can be achieved by using an explicit Sort operator or by relying on order-imposing and order-preserving properties of other operators.

Visual appearance in execution plans

Depending on the tool being used, a Segment operator is displayed in a graphical execution plan as shown below:

SQL Server Management Studio Azure Data Studio
(until version 17.3) (version 17.4 and up)

Algorithm

The basic algorithm for the Segment operator is as shown below:

Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.

The flowchart shown above is based on two assumptions:

  1. In theory, the Segment operator can be built in two ways: it can mark either the start or the end of each segment (first row or last row). Both versions can be used to explain all operations of all segment-aware operators. However, there are a few cases where a segment-aware operation is harder to implement when end of a segment is marked, a lot where it doesn’t matter, and none where the implementation is harder when the start of the segment is marked. It is also easier for the Segment operator itself to mark the first row instead of the last. For that reason, I assume that Microsoft has chosen to mark the start of each segment. That operation is shown in the flowchart above.
  2. It is currently unknown what exact values are used in the segment column. There are strong indicators that support the theory that NULL represents the continuation of the previous segment and a non-null value marks the start of a new segment. What exact value is used is not known at all; the flowchart above assumes it’s 1.

Detecting segment change

The logic to detect when the segment changes is based on comparing the values in the current row to the values held in in-memory variables. These variables are then set to the values in the current row, for the comparison of the next row. These comparisons are based on the standard SQL definition of “distinctness”, not “inequality”. (The difference is that two NULL values are considered unequal but not distinct).

For the first row that the operator processes after being initialized, the in-memory variables are not yet initialized; this is always considered as a segment change.

Operator properties

The properties below are specific to the Segment operator, or have a specific meaning when appearing on it. For all other properties, see Common properties.

Property nameDescription
Group ByLists the columns from the input stream that define the segment boundaries. Whenever the value in one or more of the Group By columns is distinct from the values in the previous row, the row will be marked as the start of a new segment. All other rows are not marked.
Logical OperationAlways equal to Segment.
Output ListThe Output List of a Segment operator typically includes zero or more of the columns from its input, and always adds exactly one new column. This new column is detailed in the Segment Column property; it is not defined or described in a Defined Values property.
Segment ColumnThis property specifies the name of the column that is added to the output by the Segment operator. This column always has a name of the form Segmentnnnn and it is always included in the Output List property.

Implicit properties

This table below lists the behavior of the implicit properties for the Segment operator.

Property nameDescription
Batch Mode enabledThe Segment operator supports row mode execution only.
BlockingThe Segment operator is non-blocking.
Memory requirementThe Segment operator does not have any special memory requirement. The internal variables used to detect a group change are stored in the standard working memory that all operators get assigned.
Order-preservingThe Segment operator is fully order-preserving.
Parallelism awareThe Segment operator is not parallelism aware.
Segment awareThe Segment operator is not segment aware.
Menu

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close