Logical Join Operations

Introduction

SQL Server has four different operators available for joining two data sets: Nested Loops, Merge Join, Hash Match, and Adaptive Join. Each of these uses a very different algorithm to achieve basically the same result. They all compare rows from the two input data sets, based on some join condition (or predicate), and then produce results based on whether or not there was a match.

What these join operators do exactly in the case of a match, or of a non-match, depends on what is called the “(logical) join operation”, or “join type”. There are many different join operations (most join operators only implement a subset). Some join operations have the same name, and the same function, as the T-SQL join types with the same name. Others only exist as a join operation and cannot be directly used in a query.

Reading guide

On the rest of this page are the descriptions of each of the logical join operations. These descriptions use the simplest way to describe and understand how that join type works and how it differs from the other join types. The join operators typically use other algorithms, designed to return the same results with less effort

For each logical join operation, you will also see sample results. All of these sample results are based on the input data shown here. You should not draw any conclusions from the order in which the sample results are shown, these will vary depending on which of the physical join operators is used.

Many logical join types use “left” and “right” in their names to refer to the inputs that are displayed as the top and bottom inputs in a graphical execution plan. Just remember: left = top, and right = bottom.

Inner Join

The logical operation Inner Join is the same as the INNER JOIN keyword in T-SQL. It compares each row from the left input to each row from the right input; any matches found are returned. Rows from either input that do not match are not returned.

In our sample data, there are apples and oranges in both of the inputs. Because there are two oranges in each of the inputs, all four possible combinations are included in the results. Pears and bananas are each in only one of the inputs; no matching data is found in the other input so these fruits are eliminated from the results.

Left Outer Join

The logical operation Left Outer Join is the same as the LEFT OUTER JOIN keyword in T-SQL. It compares each row from the left input to each row from the right input; any matches found are returned. Rows from the right input that do not match are not returned; however rows from the left input that do not have a match are still included in the result, with NULL values in the columns that normally come from the right input.

The results are mostly the same as for the Inner Join operation. However, there is one extra row. The pear, which is only in the left input, is now included as well. Since there is no matching data in the right input, columns from that input have no value (represented as NULL). The unmatched banana from the right input is still not in the results.

Right Outer Join

The logical operation Right Outer Join is the same as the RIGHT OUTER JOIN keyword in T-SQL. It compares each row from the left input to each row from the right input; any matches found are returned. Rows from the left input that do not match are not returned; however rows from the right input that do not have a match are still included in the result, with NULL values in the columns that normally come from the left input.

The results are mostly the same as for the Inner Join operation. However, there is one extra row. The banana, which is only in the right input, is now included as well. Since there is no matching data in the left input, columns from that input have no value (represented as NULL). The unmatched pear from the left input is still not in the results.

Full Outer Join

The logical operation Full Outer Join is the same as the FULL OUTER JOIN keyword in T-SQL. It compares each row from the left input to each row from the right input; any matches found are returned. Rows from either input that do not have a match in the other input are still included in the result, with NULL values in the columns that normally come from that other input.

These results can simply be described as the combined effect of a Left Outer Join and a Right Outer Join: the same rows as the Inner Join, with extra rows for both the pear and the banana, each with NULL values for the columns from the other input.

Left Semi Join

The logical operation Left Semi Join does not have a corresponding keyword in T-SQL. However, it is similar to EXISTS with a subquery. It compares each row from the left input to the data in the right input. If at least one match is found, the row from the left input is returned (unchanged). If no match is found the row is not included. No data at all from the right input is ever returned, this data is only used to check for existence.

With our sample data, the apples and oranges (and corresponding names) from the left input are in the results. Even though the oranges have multiple matches in the right input, they are still each returned just once. The pear is unmatched, and therefore not included. The fruits (and names) from the right input are not returned at all.

Left Semi Join (probed)

A probed left semi join is not considered a separate logical operation in execution plans, it is listed as a normal left semi join. But some of the join operators do have properties that change the result of the operator sufficiently to consider this a distinct join type.

Where a normal left semi join is like WHERE EXISTS, a probed left semi join is like CASE WHEN EXISTS: it doesn’t filter any rows from the left input, but adds a column that indicates whether or not a matching row exists. As such, a probed left semi join compares each row from the left input to the data in the right input. If at least one match is found, the row from the left input is returned, with the probe column set to a value that signals existence. (The exact value used is currently unknown). If no match is found, the row is also returned, but now with a NULL value in the probe column to signal that there is no match. No data at all from the right input is ever returned, this data is only used to check for existence.

The results from our sample data include all the rows from the left input, unchanged, but with one extra column for the probe result. There are one or more apples and oranges in the right input, so for these rows the probe column has a value. But since the right input has no pears, the probe column is NULL for Joey’s pear. The fruits (and names) from the right input are not returned at all, regardless of whether or not they were matched.

Left Anti Semi Join

The logical operation Left Anti Semi Join does not have a corresponding keyword in T-SQL. However, it is similar to NOT EXISTS with a subquery. It compares each row from the left input to the data in the right input. If at least one match is found, the row from the left input is ignored; it is only returned (unchanged) if no match is found. No data at all from the right input is ever returned, this data is only used to check for existence.

With our sample data, the apples and oranges (and corresponding names) from the left input are not in the results, because they have one or more matching rows in the right input. The pear, though, does not have any match, so this row is in the results. The fruits (and names) from the right input are not returned at all.

Right Semi Join

The logical operation Right Semi Join does not have a corresponding keyword in T-SQL. It is exactly equivalent to a Left Semi Join with the inputs reversed (the optimizer may swap inputs and change the join type when that results in a cheaper plan). It compares each row from the right input to the data in the left input. If at least one match is found, the row from the right input is returned (unchanged). If no match is found the row is not included. No data at all from the left input is ever returned, this data is only used to check for existence.

With our sample data, the apples and oranges from the right input (and corresponding names) are in the results. Even though the oranges have multiple matches in the left input, they are still each returned just once. The banana is unmatched, and therefore not included. The fruits (and names) from the left input are not returned at all.

Right Anti Semi Join

The logical operation Right Anti Semi Join does not have a corresponding keyword in T-SQL. It is exactly equivalent to a Left Anti Semi Join with the inputs reversed (the optimizer may swap inputs and change the join type when that results in a cheaper plan). It compares each row from the right input to the data in the left input. If at least one match is found, the row from the right input is ignored; it is only returned (unchanged) if no match is found. No data at all from the left input is ever returned, this data is only used to check for existence.

As far as known, SQL Server supports only the “normal” right semi join, not the probed version.

With our sample data, the apples and oranges (and corresponding names) from the right input are not in the results, because they have one or more matching rows in the left input. The banana, though, does not have any match, so this row is in the results. The fruits (and names) from the left input are not returned at all.

Concatenation

The logical operation Concatenation is not actually a true join type, because it doesn’t compare all rows from one input to all rows from the other input. The effect of a Concatenation operation is best described as equivalent to using UNION ALL to combine the results of two query results. The most common physical operator to implement a Concatenation operation is the Concatenation operator, which even supports an unlimited number of inputs. However, one of the join operators (Merge Join) also implements a Concatenation operation.

Unlike “real” join operations, Concatenation does not have a join condition. All rows from both inputs are returned, without doing any kind of comparison.

The results from using Concatenation on our two sample sets simply returns all the eight rows: four from the left input and the remaining four from the right.

Union

The logical operation Union is not actually a true join type, because it doesn’t compare all rows from one input to all rows from the other input. The Union operation is equivalent to its T-SQL counterpart, using UNION (without ALL!) to combine the results of two query results and remove duplicates. Queries that use UNION or equivalent constructions usually result in an execution plan that uses a combination of other operators to produce the results, however there are situations where the optimizer will use a Merge Join or Hash Match operator to implement a Union operation.

Unlike “real” join operations, Union does not have a join condition. All rows from both inputs are returned; duplicates are removed based on the full collection of columns.

The results from using Union on our two sample sets is almost the same as for Concatenation. The only difference is that Mark’s Orange, which is included in both input sets, appears only once in the output.

Not visible from our sample data is that any duplicates that are within one of the sets will also be eliminated. I.e. if a second pear for Joey is added to Table_B, the results will not change.