Here’s a nice brain teaser, just before the end of the year. Despite the title, it is related to SQL Server, not to music!
A common misconception amongst SQL Server users is that a clustered index on a table will ensure that data is returned in the order implied by that index. I have lost count of the number of times I had to disprove this notion.
Of course, there are many cases where the rows returned by a query will be in the order of the clustered index. Here’s a quick illustration, using the lineup that The Beatles had during most of the 60s.
CREATE TABLE Beatles
(LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,
FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);
INSERT INTO Beatles (LastName, FirstName)
SELECT ‘Lennon’, ‘John’
UNION ALL
SELECT ‘McCartney’, ‘Paul’
UNION ALL
SELECT ‘Harrison’, ‘George’
UNION ALL
SELECT ‘Starr’, ‘Ringo’;
SELECT LastName FROM Beatles;
DROP TABLE Beatles;
go
LastName
——————–
Harrison
Lennon
McCartney
Starr
The results of this query are in alphabetical order of last name, the column used in the clustered index. Apparently, this is one of the very many cases where the order of the rows is implied by the clustered index, allowing the misconception that this is always the case to spread even further.
But an interesting thing happens if I use the exact same table definition to old and query the lineup of that other famous rock group of the 60s:
CREATE TABLE Stones
(LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,
FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);
INSERT INTO Stones (LastName, FirstName)
SELECT ‘Jagger’, ‘Mick’
UNION ALL
SELECT ‘Jones’, ‘Brian’
UNION ALL
SELECT ‘Richards’, ‘Keith’
UNION ALL
SELECT ‘Watts’, ‘Charlie’
UNION ALL
SELECT ‘Wyman’, ‘Bill’;
SELECT LastName FROM Stones;
DROP TABLE Stones;
go
LastName
——————–
Wyman
Jones
Watts
Richards
Jagger
In this case, the names are returned in random order. That makes this a great example to really disprove the notion of a clustered index guaranteeing any output order. (Dare I say that we now finally have solid proof that the Stones are better than the Beatles? Or will that make me subject to loads of flames?)
What’s intriguing in this case is the difference in behaviour for the two examples. Apart from the table name, the two code snippets are exactly the same – and even renaming the tables won’t change the results. So here’s the brain teaser that I’ll leave you to ponder over your glass of champagne: what is the reason that the Beatles are, but the Stones are not returned in clustered index order?
I’ll post the answer in a few days.
On a more personal note, I want to apologize for not posting any new stuff during the last two months. I still have some good ideas in my scratchpad, but I need some time to polish them up to blog quality – and time is the one thing I have been lacking for the past two months. The bad news is that I will probably be short on time for the next month as well, but things are looking more sunny after that.
And with this being my last post of the year, I’ll also grab this opportunity to wish all readers of sqlblog.com a very great 2007, with lots of love and luck in your personal lifes, and lots of interesting SQL challenges and enticing performance gains at work.
23 Comments. Leave new
Nice one, Hugo — that is a great example! Took me a moment to figure out, but it makes perfect sense. Have a great New Years!
This is great! I’ll admit, I didn’t figure it out by just looking at the code. I had to run it, and start doing some of my usual query analysis, but I got it pretty quickly after that. I’d love to use this as an example in my classes, if that’s ok with you? It makes me wish I was still a college teacher, and I could actually use this as a pop quiz. 😉
Oh yes, I remember this thread “insert into tbl1 select * from tbl2order by field1 doesnt work!”
from the microsoft.public.sqlserver.programming newsgroup(http://tinyurl.com/y8umhm)
There were 121 messages in total going back and forth until Hugo set the op straight
Denis
Denis
—
Edited by admin to remove long link –
The answer isn’t too obvious, unless you’ve been dealing with this misconception a lot. 🙂 It’s a very good, succinct example that I’m sure will be referenced in many such discussions in the future.
Awesome. 3 minutes! Let me pat myself on the back… There. Done. I feel a lot better.
Honestly, I haven’t been fighting this misconception much. I find myself fighting far too many "wtfs" — subqueries like "(SELECT TOP 100 PERCENT FROM foo ORDER BY bar)". I am a bit shocked every time I see that one. And the fact that it usually seems to "work". And the fact that the SQL 2000 EM query designer generates these sorts of hideous abominations.
Kalen: Of course you can use this in your classes. The pop quiz idea is closer than you may think. I had intended to demonstrate these queries at the end of a session I delivered last month and give a prize to the first to correctly explain the behaviour, but I didn’t have enough time left 🙁 That’s when I decided to post it here instead.
Dave: Allow me to add my back-pattings as well. Figuring this out in just 3 minutes is really good! re "TOP 100 PERCENT … ORDER BY": you’ll probably stop complaining about EM’s view designer adding this once you find out that SSMS’ view designer does the same – even though the "trick" doesn’t even work anymore on SQL Server 2005!
Wow. Thanks Hugo! A truly disturbing revelation indeed. Since it’s so totally off-topic, I did some more investigation of it on my blog (http://executioniseverything.blogspot.com/2007/01/order-by-in-views-sql-server-2005.html). If you play with the numbers a bit in the TOP clause, it really makes you wonder what the "proper" behavior should be!
Hi Dave,
The "proper" behaviour is to use the ORDER BY only to determine which rows are included in the TOP xxx subcollection. A view can’t be ordered, by definition, so SQL Server is free to return rows in any order it sees fit.
SSMS should not even allow you to attempt to define an order for a view. I have submitted a bug for this to Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=249248.
Yes it does make sense but I think (some) people will say
No problem I will just add
where lastname > ”;
and that fixes that ‘problem’
Well, I thought I’d try to understand why the optimizer does what it does and came up with the following:
CREATE TABLE Stones
(LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,
FirstName varchar(20),
LastNameAgain AS CAST(LastName AS char(1597)));
CREATE UNIQUE INDEX IX_FirstName ON Stones(FirstName) INCLUDE (LastNameAgain);
In SQL Server 2005 (9.00.3033.00), the adding the char(1597) column to the nonclustered index, but not the clustered index changes the behavior. A char(1596) column doesn’t. I guess this approach is another way to "solve the problem" without requiring that you change all your queries. 🙂
Denis, Ahmed,
For the record – there is no “problem”, so nothing needs to be fixed. The query has no ORDER BY, so any ordering in the returned data is correct. My post is just about the strange observation that the Beatles “happen to be” returned in clustered index order, while the Stones are not.
I’ll get back to the reasons why Denis’ where clause and Ahmed’s (very impressive!! – kudos for finding that) computed column change the behaviour of the Stones at a later time. I can’t get into that right now without giving away the answer, and I’ll postpone that a bit, so that new readers can scratch their heads a bit (and others can continue to tear out what’s left of their hair <evil grin>).
I agree, there is no "problem" (unfortunately sarcasm is fairly hard to express in text). I also agree that people should scratch there heads for awhile, though I’m sure it’ll make sense in the end.
Hugo, my friend,
There’s simply no comparison between the Beatles and the Stones. I love the Stones, btw…"Jumping Jack Flash", "Happy", "Miss You". The Stones rock, man.
But the Beatles??? Are you kidding me???????
"Dear Prudence"
"Rain"
"Baby, You’re a Rich man"
That awesome Rickenbacker John Lennon played?
The Stones are at the top of the 20th century and certainly one of the best bands of the 60’s and 70’s, but the Beatles are the most significant Pop group in a millennia.
Hi Chuck,
Heheh! I expected comments llike yours – in fact, I am surprised I didn’t get a big flame war about which band was better within hours of posting <g>.
I won’t get into this debate. In fact, I even carefully avoided any indication that either band might be better or worse, or even that either behaviour of any of the tables might be better or worse!
Results are being returned in order by the non clustered index.
Remove the non clustered index and the results are returned in the correct order.
Optimizer chooses to do an index scan when clustered index only exists and via non clustered index when both indexes exist?
Very interesting.
Hi ShimMaster,
Congratulations! You figured out the correct answer.
But I must object to the usage of the words "correct order" when refering to a query that has no ORDER BY. All orders are correct – they are only different (or rather, they <b>appear</b> to be different).
I tried to publish a new blog post with a detailed explanations and some follow-up to the observations made by Denis and Ahmed, but experienced some technical difficulties. I’ll try again later today.
Very nice article ! Short and using a nice title !
Got it 1 minute, but don’t know the reason for behavior, why its considering FirstName at all?
This fixes the problem:
SELECT LastName FROM Stones WITH (Index(1));
if I already have Stones table and I copy data I need into temp table, will the data in temp table always sorted
CREATE TABLE Stones
(LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,
FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);
INSERT INTO Stones (LastName, FirstName)
SELECT ‘Jagger’, ‘Mick’
UNION ALL
SELECT ‘Jones’, ‘Brian’
UNION ALL
SELECT ‘Richards’, ‘Keith’
UNION ALL
SELECT ‘Watts’, ‘Charlie’
UNION ALL
SELECT ‘Wyman’, ‘Bill’;
go
SELECT LastName
INTO #MyTemp
FROM Stones Order by LastName
SELECT * from #MyTemp –will this data always be ordered?
DROP TABLE #MyTemp
Also, is the order guaranteed with SELECT LastName FROM Stones WITH (Index(1))"?
Thanks
Hi, "How about #temp table?"!
Your question:
has a simple answer: No. It might be, but there is no guarantee. If you want to be sure that data is returned in any specific order, then you will always have to add an ORDER BY clause. Nothing else will ever provide guarantees.
Great post.
Disappointing to see that the ability to create a supposedly ordered view is still present within the view designer in the latest version of SSMS.
I see the comments in your Connect item.
It doesn’t look like there is any appetite on Microsoft’s part to address this.
In the first query you don’t know if it is ordering ascending by first or last name because in both cases the result is the same. In the second query it is not random, it seems it is ordering for first name, the non clustered index. It seems in both cases it is ordering ascending for the non-clustered index, but why it is not using the clusted index?