Concatenation

Introduction

The Concatenation operator reads and returns all rows from all its inputs, in order, and without modification.

This operator is most commonly used to execute queries that use UNION or UNION ALL. In the former case, other operators are required to remove the duplicates, because Concatenation doesn’t provide that functionality. You may also find the Concatenation operator in queries on partitioned views.

The Concatenation operator has two or more inputs. There is no known maximum. The inputs are processed in the order in which they appear in the execution plan XML, which corresponds to the order in which they are shown (top to bottom) in the graphical execution plan.

Visual appearance in execution plans

Depending on the tool being used, a Concatenation operator is displayed in a graphical execution plan as shown below:

SSMS and ADS (current versions)	Legacy SSMS	Plan Explorer	Paste The Plan

Algorithm

The basic algorithm for the Concatenation operator is as shown below:

Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.

Also note that while the flowchart might suggest that Concatenation can work with just a single input, there is no way to guarantee it actually does because the optimizer never produces an execution plan where Concatenation has less than 2 inputs.

Input and output mapping

The columns in the output of a Concatenation operator, as specified in its Output List property, are typically given a generated name. As all generated column names in execution plans, these names are formed by combining a mnemonic (usually “Union” for a Concatenation operator) and a 4-digit number that is unique within the execution plan.

The Defined Values predicate of Concatenation specifies, for each of the columns in the Output List, what column from each of the inputs is mapped to that output column. In the execution plan XML, the format of the Defined Values property for Concatenation is one DefinedValue node per output column, and within that node are n + 1 ColumnReference nodes. The first ColumnReference specifies the name of the output column; ColumnReference 2 through n + 1 specify the mapped input column from input 1 through n. All input columns that are mapped to the same output column must have the same data type. In Management Studio, the Defined Values predicate it displayed in a slightly more readable way, as output column = (input column 1;…; input column n).

As mentioned above there is no known upper limit to the number of inputs. In a lab environment, I have successfully tested with over 30,000 inputs.

Order preservation

Even though the Concatenation operator returns all rows in the order it receives them, the output of a Concatenation operator is always considered unordered. This is easy to understand by looking at an example.

If a Concatenation processes two inputs, each ordered, then the first input might be (1, 5, 8, 12); the second input might be (3, 8, 13, 15). In this case the output of Concatenation will be (1, 5, 8, 12, 3, 8, 13, 15). This is of course no longer ordered.

There are situations where each next input only has larger values than the previous input. If for example two inputs to Concatenation have (1, 5, 8, 12) and (13, 15, 18), then the operator will return the rows in the order (1, 5, 8, 12, 13, 15, 18); this is of course still in order. But it relies on all the data in the second input being larger than all the data in the first. Without knowledge of constraints and query logic, this can never be guaranteed.

Based on experiments I did, it seems that the optimizer also has no rules to check for and detect cases where it does have information available that would guarantee that each next input would always sort after the previous. So even in those theoretic cases, where enough information is available to state with certainty that Concatenation will retain the logical ordering, this is not registered as such by the optimizer. My guess is that Microsoft decided not to build a lot of complex logic for what frankly is an outlier edge case anyway, and just mark the Concatenation operator as producing unordered output.

Operator properties

The properties below are specific to the Concatenation operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Concatenation operator are marked with a *.

Property name	Description
Defined Values *	The Defined Values property of a Concatenation includes, for each column in the Output List property, an XML element that lists the name of the output column, followed by the name of the mapped input column from each input. See also the main text.

Implicit properties

This table below lists the behavior of the implicit properties for the Concatenation operator.

Property name	Description
Batch Mode enabled	The Concatenation operator supports both row mode and batch mode execution. The descriptions on this page are for row mode execution. For batch mode execution, the same process applies, except that now entire batches are read and returned instead of single rows. The Concatenation operator does not look at, nor change, the contents of the batches it processes.
Blocking	The Concatenation operator is non-blocking.
Memory requirement	The Concatenation operator does not have any special memory requirement.
Order-preserving	Even though the Concatenation operator does not change the order of the rows from its input, its output is still considered unordered. See also the main text.
Parallelism aware	The Concatenation operator is not parallelism aware.
Segment aware	The Concatenation operator is not segment aware.