The Beatles versus the Stones – the explanation
On December 31 of last year, I posted this brain teaser, promising to post the answer “in a few days”. Apparently, 15 is a few J.
In case you have forgotten what the puzzle was about and are too lazy to click the link above, the bottom line is that I created and populated two tables, with the same schema but different content. One held the first and last name of each of the Beatles, the other held first and last name of each of the Rolling Stones. I then queried both tables, using the same query in both cases. The query did not include an ORDER BY clause. It turned out that the results from the Beatles table were always returned in order of the clustered index, whereas the results from the Stones table was unordered. And that was the question I left you with – how can two tables that have the same schema show so different behavior?
The answer to this is that, of course, the two tables don’t display different behavior at all. In fact, they behave exactly the same – the results looked different but were not!
If you rerun the code to create the two tables and then issue the two SELECT statements again with the option to show the execution plan turned on, you will get this execution plan:
As you can see, both queries were executed with an identical plan: a scan of the nonclustered index that was created to check the UNIQUE constraint on the FirstName column. If you now change the original queries to include this column as well, you should not only see that the execution plan remains the same, but you should also see the explanation for the different behavior.
SELECT FirstName, LastName FROM Beatles;
SELECT FirstName, LastName FROM Stones;
As you see, both result sets are alphabetically ordered by first name. It is a shear coincidence that the alphabetic ordering of the Beatles by first name exactly matches the ordering by last name. So the results have never been returned in the order of the clustered index, but always in the order of the nonclustered index, for both tables. These orders just happen to be indistinguishable for one of the two tables.
Of course, this still leaves us with the question why the optimizer chooses to scan the nonclustered index for this query. That question is easily answered. You just have to consider which columns are included in each of the indexes.
A clustered index will always have the indexed columns in the root and intermediate pages, and all columns in the table in the leaf pages. A nonclustered index will also have the indexed columns in the root and intermediate pages, but its leaf pages contain only the indexed columns plus the columns of the clustered index (if any). For the Beatles and Stones tables, this means that the clustered indexes have LastName in the non-leaf pages, and all columns (both LastName and FirstName) in the leaf pages; the nonclustered indexes have FirstName in the non-leaf pages and add the clustered index key of LastName in the leaf pages.
Since the queries issued don’t have a WHERE clause, the optimizer can only consider plans using a scan operator. A scan will only touch the leaf pages of an index. In this particular case, all rows used in the query (just LastName) happen to be available in the leaf pages of both indexes. That means that the nonclustered index is a covering index for this query, and an extra lookup operator is not required to get the data. So the choice is between scanning either the clustered index or the nonclustered index.
In this case, both indexes have exactly the same amount of data in their leaf pages, so there is no reason to prefer one over the other. But that is very unusual – in 99% of all normal situations, a nonclustered index will have less data in its leaf pages than the clustered index. (Remember that a clustered index always includes ALL columns in the leaf pages!), and a nonclustered index will never have more data in its leaf pages. For that reason, the query optimizer will always favor an execution plan that uses a covering nonclustered index over a plan that uses a clustered index.
One interesting comment was made by Denis The SQL Menace: if you add WHERE LastName > ” to the queries, the order will match the clustered index for both queries. That makes sense. Remember that there is no constraint that disallows us from setting LastName equal to the empty string in one of the rows of the table, so there might be a row that doesn’t satisfy the WHERE clause. The nonclustered index scan used previously would still have to process that row; a clustered index seek would enable the engine to bypass that row. So I’m not surprised to see the execution plan change to a clustered index seek.
An even more intriguing observation was pointed out by Ahmed Charles. He added a computed column to the table and replaced the UNIQUE constraint with a UNIQUE INDEX, with the computed column as included column. Here’s how the Stones table looks after Ahmed’s modification:
CREATE TABLE Stones
(LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,
LastNameAgain AS CAST(LastName AS char(1597)));
CREATE UNIQUE INDEX IX_FirstName
Ahmed found that if the computed column is defined as char(1597) or more, the engine scans the clustered index, resulting in the last names being lasted alphabetically; as long as the computed column in char(1596) or less, the engine prefers to scan the nonclustered index. Why does the execution plan suddenly change if the computed column gets one additional character? Computed columns are not even supposed to be persisted, right?
Well, yes and no. Computed columns are indeed normally not stored, but computed when queried. But there are two things that can change that: adding the PERSISTED keyword to the column’s definition, or using the column in an index. In the first case, the value of the column is physically stored in the clustered index (or in the heap); in the latter case, the value is physically stored in each index that is defined on or includes the column. The latter is what happens here.
If you look at the data in the IX_FirstName index in a bit more details, we see that each leaf page entry will hold one char(1597) column (always 1597 bytes), two varchar(20) columns (number of bytes equal to actual length, plus 2 bytes each for length), plus 8 bytes of overhead per row (one byte for status flags, one byte for the NULL bitmap, two bytes for the total number of columns, two bytes for the number of variable-length columns, and two bytes for the row offset). So for each row, the amount of bytes used is equal to 1609 (1597 + (2 * 2) + 8) + the number of characters in the first and last name. That’s 1618 bytes for Bill Wyman, 1619 for Brian Jones, 1621 for Charlie Watts, 1622 for Keith Richards, and 1619 for Mick Jagger, making a grand total of 8099 bytes. Now each page can hold 8192 bytes, but 96 of them are reserved for the page header, so that leaves 8096 bytes for data. Three less than the number of bytes required to store the index data – so the index requires a second page. But if I reduce the length of the computed column by 1, the required amount per row drops 1 bytes; for all 5 rows combined, we need 5 bytes less – 8094 bytes, which does fit in a single page.
Just in case you want to check this for yourself without doing the calculations or running the DBCC PAGE commands, you can also execute
SELECT dpages FROM sysindexes WHERE name = ‘IX_FirstName’;
to see the number of leaf pages used by the index.
With this knowledge, I can new explain why the computed column changes the behavior of the query when the computed column exceeds 1596 bytes – an index scan will now require more page reads than a clustered index scan, so the optimizer will surely choose the latter.
With this knowledge, I can also predict some other interesting things:
· If I change the definition of the FirstName and LastName columns to char(20), I can reduce the computed column to 1574 bytes and I still get a clustered index scan;
· If I reduce the initial load of the table to just four of the five band members (for instance by omitting Brian Jones), I have to increase the length of the computed column to 2002 before I get a clustered index scan;
· But on the other hand, if I add all five band members to the table, then delete Brian Jones again, I still get the clustered index scan even with a computed column of 1597 bytes length. Once the second page has been allocated to the nonclustered index, it will stay allocated even if the data could in theory be compressed on a single page again. Reorganizing the index won’t change this, but rebuilding will.