Merge Interval
Introduction
The Merge Interval operator reads dynamic seek range specifications, checks to see if their specified ranges overlap, and if so combines the overlapping ranges into one new range.
One typical use case is for a query that uses multiple BETWEEN specifications, connected with OR. When these ranges overlap, they must be combined into a single range. This saves performance, but more important is that it prevents rows that satisfy both range specifications from being returned multiple times. When the boundaries of the BETWEEN are given as constants, the optimizer analyzes for overlaps and combines ranges if needed when compiling the query. But when the boundaries of the BETWEEN specifications are only known at run-time (variables, column references), the Merge Interval operator is used for this task.
In order for the operator to work correctly, the input data needs to contain a specific set of columns, and it must be sorted. Details of this are specified below.
Visual appearance in execution plans
Depending on the tool being used, a Merge Interval operator is displayed in a graphical execution plan as shown below:
SSMS and ADS |
Legacy SSMS |
Plan Explorer |
Paste The Plan |
Algorithm
The basic algorithm for the Merge Interval operator is as shown below:
Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.
Input and output columns
Based on execution plans seen, a Merge Interval operator seems to always have the same six columns in its input. They do not have fixed names, but they appear to always be presented in the same order:
- Start value: This value defines the lower bound of the range. It is NULL if the range has no lower bound.
- End value: This value defines the upper bound of the range. It is NULL if the range has no upper bound.
- Seek_flags: This single byte stores 5 bits to specify if the index has a lower bound, if it has an upper bound, if each of these boundary values should be included or not, and how NULL equality should be handled for this range. See Dynamic seek range for the details.
- “Interval starts at NULL”: This is a Boolean (bit) column. It is True (1) if Seek_flags indicates that the range has a lower bound, and Start value indicates that this lower bound is equal to NULL. Depending on ANSI- or non-ANSI-behavior of the comparison (as indicated in the Seek_flags column), this either indicates a range that matches no rows, or a range that starts at the very start of the table (since nulls sort before all other values in SQL Server), making it effectively equal to a range with no lower bound.
- “Interval starts somewhere”: This is a Boolean (bit) column that is True (1) if Seek_flags indicates that the range has a lower bound. If this is False (0), the range is a “less than” or “less than or equal to” specification.
- “Interval ends somewhere”: a Boolean (bit) column that is True (1) if Seek_flags indicates that the range has an upper bound. If this is False (0), the range is a “larger than” or “larger than or equal to” specification.
The first three of these columns define the range. The next three can all be derived from the first three, but they are computed and materialized as explicit columns because they are used to define the required sort order for the operator’s input. It is unknown whether they are actually used in the operator itself.
The output of the Merge Interval operator is always a set of three columns (Start value, End value, and Seek_flags) to define a combined range.
Merge Interval requires its input to be sorted in such a way that overlapping intervals are guaranteed to be adjacent. This sort order is as follows:
- “Interval starts at NULL”, descending. All ranges that (depending on Seek_flags) can be discarded or considered as having no lower bound come first.
- “Interval starts somewhere”, ascending. All ranges that have no lower bound come first, because they all overlap with each other (and with the intervals that start at NULL with non-ANSI behavior, since these effectively have no lower bound either).
- Start value, ascending. For ranges without lower bound, this is always NULL so it won’t affect the sort order. For the ranges with a lower bound (that all come after those without), this ensures that they are handled in ascending order.
- “Interval ends somewhere”, descending. Within each group (ranges with no lower bound / ranges with the same lower bound), this ensures that ranges with no upper bound come first.
Read-ahead buffer
The only way to determine whether the range in the “current” row overlaps with that in the “next” row is by reading the next row. This has to be before the current row can be fully processed.
This means that the Merge Interval operator, unlike most other operators, usually holds two rows in its working memory. The current row is always available in the normal working area; the next row is saved in a temporary holding area, the “read-ahead buffer”.
The operator requires some extra memory for this read-ahead buffer, but because it is never more than a single row this extra memory is still considered part of the normal operator working memory; it is not included in the memory grant computation for the execution plan.
Overlapping and combining ranges
Based on the sort order imposed on the input, there is a limited amount of possible combinations of the ranges in the current row and the next row in the read-ahead buffer. These are the possible combinations, and their expected result:
- Current row starts at NULL or ends at NULL, and NULL comparison is set to ANSI-behavior. This is effectively an empty range. It by definition overlaps with the range in the next row. They are combined by setting the current range equal to that of the next row.
- Next row starts at NULL or ends at NULL, and NULL comparison is set to ANSI-behavior. This is effectively an empty range. It by definition overlaps with the range in the current row. They are combined by leaving the current range unchanged.
- Both current row and next row has unbounded start or starts at NULL with non-ANSI-behavior. The ranges overlap. The combined range has unbounded start. It has an unbounded end if one of the combined ranges has an unbounded end. Otherwise it ends at the highest End value of the two ranges. If they have the same End value, then the combined Seek_flags specifies that End value is included if that is the case for at least one of the two original ranges.
- The current row has unbounded start or starts at NULL with non-ANSI-behavior; the next row does have a lower bound. The rows overlap if and only if the Start value of the next row is less than the End value of the current row, of if these two values are equal and at least one of the two Seek_flags specifies that the boundary value is included. If the ranges overlap, the combined range has unbounded start. It has an unbounded end if one of the combined ranges has an unbounded end. Otherwise it ends at the highest End value of the two ranges. If they have the same End value, then the combined Seek_flags specifies that End value is included if that is the case for at least one of the two original ranges.
- The current row and next row both have a lower bound. The rows overlap if and only if the Start value of the next row is less than the End value of the current row, of if these two values are equal and at least one of the two Seek_flags specifies that the boundary value is included. If the ranges overlap, the combined range starts at the Start value of the current range. It has an unbounded end if one of the combined ranges has an unbounded end. If the ranges have the same Start value, then the combined Seek_flags specifies that Start value is included if that is the case for at least one of the two original ranges. If they have the same End value, then the combined Seek_flags specifies that End value is included if that is the case for at least one of the two original ranges.
GetNext() into buffer / Get next from buffer
Every row, except the first, is read by the operator while still working on the previous row, because it has to check whether the ranges in these rows overlap. Therefore, all rows (except the first) are first read in the read-ahead buffer. This is still a regular call to the GetNext() method of the child operator, the only difference is that the returned row (or end of data signal) is not stored in the normal working area, but in the read-ahead buffer. It can then look at the values in the read-ahead buffer to determine whether or not the range overlaps with the range of the current row, in the normal working area.
If the range specified in the row in the read-ahead buffer overlaps with the range in the current row, the operator will change the range in the current row to the range defined by the overlap of the two (as detailed above), then continue to check whether the row after this also overlaps. If the row in the read-ahead buffer does not overlap, the operator returns the current row as it is at that time, then moves the data from the read-ahead buffer to the current row data so if can continue its processing for the remaining rows.
Operator properties
The properties below are specific to the Merge Interval operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Merge Interval operator are marked with a *.
Property name | Description |
---|---|
Output List * | As described in the main text, the Output List of the Merge Interval operator is always a combination of three columns, that hold the Start value, End value, and Seek_flags for the combined ranges. These columns have the same name as the first three columns in its input rows. |
Implicit properties
This table below lists the behavior of the implicit properties for the Merge Interval operator.
Property name | Description |
---|---|
Batch Mode enabled | The Merge Interval operator supports row mode execution only. |
Blocking | The Merge Interval operator is non-blocking. |
Memory requirement | The Merge Interval operator uses a small bit of extra memory for the read-ahead buffer, but this is still considered part of the normal operator memory and not included in the execution plan’s memory grant. |
Order-preserving | The Merge Interval operator requires a specific order in its input, and preserves that order in its output. See the main text for the details. |
Parallelism aware | The Merge Interval operator is not parallelism aware. |
Segment aware | The Merge Interval operator is not segment aware. |