The Beatles versus the Stones

23 Comments

Here’s a nice brain teaser, just before the end of the year. Despite the title, it is related to SQL Server, not to music!

 

A common misconception amongst SQL Server users is that a clustered index on a table will ensure that data is returned in the order implied by that index. I have lost count of the number of times I had to disprove this notion.

 

Of course, there are many cases where the rows returned by a query will be in the order of the clustered index. Here’s a quick illustration, using the lineup that The Beatles had during most of the 60s.

 

CREATE TABLE Beatles

     (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

      FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);

INSERT INTO Beatles (LastName, FirstName)

SELECT ‘Lennon’, ‘John’

UNION ALL

SELECT ‘McCartney’, ‘Paul’

UNION ALL

SELECT ‘Harrison’, ‘George’

UNION ALL

SELECT ‘Starr’, ‘Ringo’;

SELECT LastName FROM Beatles;

DROP TABLE Beatles;

go

 

LastName

——————–

Harrison

Lennon

McCartney

Starr

 

The results of this query are in alphabetical order of last name, the column used in the clustered index. Apparently, this is one of the very many cases where the order of the rows is implied by the clustered index, allowing the misconception that this is always the case to spread even further.

 

But an interesting thing happens if I use the exact same table definition to old and query the lineup of that other famous rock group of the 60s:

 

CREATE TABLE Stones

     (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

      FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);

INSERT INTO Stones (LastName, FirstName)

SELECT ‘Jagger’, ‘Mick’

UNION ALL

SELECT ‘Jones’, ‘Brian’

UNION ALL

SELECT ‘Richards’, ‘Keith’

UNION ALL

SELECT ‘Watts’, ‘Charlie’

UNION ALL

SELECT ‘Wyman’, ‘Bill’;

SELECT LastName FROM Stones;

DROP TABLE Stones;

go

 

LastName

——————–

Wyman

Jones

Watts

Richards

Jagger

 

In this case, the names are returned in random order. That makes this a great example to really disprove the notion of a clustered index guaranteeing any output order. (Dare I say that we now finally have solid proof that the Stones are better than the Beatles? Or will that make me subject to loads of flames?)

 

What’s intriguing in this case is the difference in behaviour for the two examples. Apart from the table name, the two code snippets are exactly the same – and even renaming the tables won’t change the results. So here’s the brain teaser that I’ll leave you to ponder over your glass of champagne: what is the reason that the Beatles are, but the Stones are not returned in clustered index order?

 

I’ll post the answer in a few days.

 

 

On a more personal note, I want to apologize for not posting any new stuff during the last two months. I still have some good ideas in my scratchpad, but I need some time to polish them up to blog quality – and time is the one thing I have been lacking for the past two months. The bad news is that I will probably be short on time for the next month as well, but things are looking more sunny after that.

 

And with this being my last post of the year, I’ll also grab this opportunity to wish all readers of sqlblog.com a very great 2007, with lots of love and luck in your personal lifes, and lots of interesting SQL challenges and enticing performance gains at work.

The table scan from hell
The Beatles versus the Stones – the explanation

Related Posts

No results found

23 Comments. Leave new

cinahcaM madA
December 31, 2006 18:31

Nice one, Hugo — that is a great example!  Took me a moment to figure out, but it makes perfect sense.  Have a great New Years!

Reply
Kalen Delaney
December 31, 2006 19:18

This is great! I’ll admit, I didn’t figure it out by just looking at the code. I had to run it, and start doing some of my usual query analysis, but I got it pretty quickly after that. I’d love to use this as an example in my classes, if that’s ok with you? It makes me wish I was still a college teacher, and I could actually use this as a pop quiz. 😉

Reply
Denis The SQl Menace
December 31, 2006 20:02

Oh yes, I remember this thread “insert into tbl1 select * from tbl2order by field1 doesnt work!”

from the microsoft.public.sqlserver.programming newsgroup(http://tinyurl.com/y8umhm)

There were 121 messages in total going back and forth until Hugo set the op straight

Denis

Denis

 —

Edited by admin to remove long link –  

Reply
AaronBertrand
December 31, 2006 23:49

The answer isn’t too obvious, unless you’ve been dealing with this misconception a lot.  🙂  It’s a very good, succinct example that I’m sure will be referenced in many such discussions in the future.

Reply
Dave Markle
January 1, 2007 16:56

Awesome.  3 minutes!  Let me pat myself on the back… There.  Done.  I feel a lot better.  

Honestly, I haven’t been fighting this misconception much.  I find myself fighting far too many "wtfs" — subqueries like "(SELECT TOP 100 PERCENT FROM foo ORDER BY bar)".  I am a bit shocked every time I see that one.  And the fact that it usually seems to "work".  And the fact that the SQL 2000 EM query designer generates these sorts of hideous abominations.

Reply
Hugo Kornelis
January 2, 2007 00:56

Kalen: Of course you can use this in your classes. The pop quiz idea is closer than you may think. I had intended to demonstrate these queries at the end of a session I delivered last month and give a prize to the first to correctly explain the behaviour, but I didn’t have enough time left 🙁  That’s when I decided to post it here instead.

Dave: Allow me to add my back-pattings as well. Figuring this out in just 3 minutes is really good! re "TOP 100 PERCENT … ORDER BY": you’ll probably stop complaining about EM’s view designer adding this once you find out that SSMS’ view designer does the same – even though the "trick" doesn’t even work anymore on SQL Server 2005!

Reply
Dave Markle
January 2, 2007 04:07

Wow.  Thanks Hugo!  A truly disturbing revelation indeed.  Since it’s so totally off-topic, I did some more investigation of it on my blog (http://executioniseverything.blogspot.com/2007/01/order-by-in-views-sql-server-2005.html).  If you play with the numbers a bit in the TOP clause, it really makes you wonder what the "proper" behavior should be!

Reply
Hugo Kornelis
January 2, 2007 16:05

Hi Dave,

The "proper" behaviour is to use the ORDER BY only to determine which rows are included in the TOP xxx subcollection. A view can’t be ordered, by definition, so SQL Server is free to return rows in any order it sees fit.

SSMS should not even allow you to attempt to define an order for a view. I have submitted a bug for this to Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=249248.

Reply
Denis The SQL Menace
January 2, 2007 18:53

Yes it does make sense but I think (some) people will say

No problem I will just add

where lastname > ”;

and that fixes that ‘problem’

Reply
Ahmed Charles
January 3, 2007 22:10

Well, I thought I’d try to understand why the optimizer does what it does and came up with the following:

CREATE TABLE Stones

    (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

     FirstName varchar(20),

     LastNameAgain AS CAST(LastName AS char(1597)));

CREATE UNIQUE INDEX IX_FirstName ON Stones(FirstName) INCLUDE (LastNameAgain);

In SQL Server 2005 (9.00.3033.00), the adding the char(1597) column to the nonclustered index, but not the clustered index changes the behavior. A char(1596) column doesn’t. I guess this approach is another way to "solve the problem" without requiring that you change all your queries. 🙂

Reply
Hugo Kornelis
January 4, 2007 01:37

Denis, Ahmed,
For the record – there is no “problem”, so nothing needs to be fixed. The query has no ORDER BY, so any ordering in the returned data is correct. My post is just about the strange observation that the Beatles “happen to be” returned in clustered index order, while the Stones are not.
I’ll get back to the reasons why Denis’ where clause and Ahmed’s (very impressive!! – kudos for finding that) computed column change the behaviour of the Stones at a later time. I can’t get into that right now without giving away the answer, and I’ll postpone that a bit, so that new readers can scratch their heads a bit (and others can continue to tear out what’s left of their hair <evil grin>).

Reply
Ahmed Charles
January 4, 2007 02:52

I agree, there is no "problem" (unfortunately sarcasm is fairly hard to express in text). I also agree that people should scratch there heads for awhile, though I’m sure it’ll make sense in the end.

Reply
Chuck Boyce
January 5, 2007 01:22

Hugo, my friend,

There’s simply no comparison between the Beatles and the Stones.  I love the Stones, btw…"Jumping Jack Flash", "Happy", "Miss You".  The Stones rock, man.

But the Beatles???  Are you kidding me???????

"Dear Prudence"

"Rain"

"Baby, You’re a Rich man"

That awesome Rickenbacker John Lennon played?

The Stones are at the top of the 20th century and certainly one of the best bands of the 60’s and 70’s, but the Beatles are the most significant Pop group in a millennia.

Reply
Hugo Kornelis
January 5, 2007 02:21

Hi Chuck,

Heheh! I expected comments llike yours – in fact, I am surprised I didn’t get a big flame war about which band was better within hours of posting <g>.

I won’t get into this debate. In fact, I even carefully avoided any indication that either band might be better or worse, or even that either behaviour of any of the tables might be better or worse!

Reply

Results are being returned in order by the non clustered index.

Remove the non clustered index and the results are returned in the correct order.

Optimizer chooses to do an index scan when clustered index only exists and via non clustered  index when both indexes exist?

Very interesting.

Reply
Hugo Kornelis
January 16, 2007 22:21

Hi ShimMaster,

Congratulations! You figured out the correct answer.

But I must object to the usage of the words "correct order" when refering to a query that has no ORDER BY. All orders are correct – they are only different (or rather, they <b>appear</b> to be different).

I tried to publish a new blog post with a detailed explanations and some follow-up to the observations made by Denis and Ahmed, but experienced some technical difficulties. I’ll try again later today.

Reply

Very nice article ! Short and using a nice title !

Reply

Got it 1 minute, but don’t know the reason for behavior, why its considering FirstName at all?

Reply

This fixes the problem:

SELECT LastName FROM Stones WITH (Index(1));

Reply
How about #temp table?
April 5, 2016 21:30

if I already have Stones table and I copy data I need into temp table, will the data in temp table always sorted

CREATE TABLE Stones

    (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

     FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);

INSERT INTO Stones (LastName, FirstName)

SELECT ‘Jagger’, ‘Mick’

UNION ALL

SELECT ‘Jones’, ‘Brian’

UNION ALL

SELECT ‘Richards’, ‘Keith’

UNION ALL

SELECT ‘Watts’, ‘Charlie’

UNION ALL

SELECT ‘Wyman’, ‘Bill’;

go

SELECT LastName

INTO #MyTemp

FROM Stones Order by LastName

SELECT * from #MyTemp –will this data always be ordered?

DROP TABLE #MyTemp

Also, is the order guaranteed with SELECT LastName FROM Stones WITH (Index(1))"?

Thanks

Reply
Hugo Kornelis
April 10, 2016 15:13

Hi, "How about #temp table?"!

Your question:

> SELECT * from #MyTemp –will this data always be ordered?

has a simple answer: No. It might be, but there is no guarantee. If you want to be sure that data is returned in any specific order, then you will always have to add an ORDER BY clause. Nothing else will ever provide guarantees.

Reply

Great post.

Disappointing to see that the ability to create a supposedly ordered view is still present within the view designer in the latest version of SSMS.

I see the comments in your Connect item.

It doesn’t look like there is any appetite on Microsoft’s part to address this.

Reply

In the first query you don’t know if it is ordering ascending by first or last name because in both cases the result is the same. In the second query it is not random, it seems it is ordering for first name, the non clustered index. It seems in both cases it is ordering ascending for the non-clustered index, but why it is not using the clusted index?

Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu
%d bloggers like this:

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close