There are two ways to test how your queries behave on huge amounts of data. The simple option is to actually use them on huge amounts of data – but where do you get that if you have no access to the production database, and how do you store it if you happen not to have a multi-terabyte storage array sitting in your basement? So here’s the second best option: you cheat.
Luckily, SQL Server has a feature that allows you to do just that. I must warn you that this feature is undocumented and unsupported. You should never under any circumstances use this feature on a production server, unless explicitly directed to by qualified Microsoft staff. Using it, sparingly, on a test box is okay. But as an undocumented feature, there is no guarantee that it will always work, or that it will continue to work in future versions.
With that disclaimer out of the way, let’s take a look at the “cheat” option. As you probably know, the Query Optimizer relies on statistics for its decisions. If those statistics are wrong, the decisions will probably be wrong. Here is where we find the leverage to cheat: if we can force SQL Server to use statistics that are representative of the production database rather than the test database, we will get the execution plan it would generate on production. The actual performance will probably still be lots faster, but at least we can look at the plan and use our understanding of execution plans to check that we get scans where we want scans, seeks where we want seeks, and all the right types of joins.
The undocumented feature that we can use here actually has a very simple and easy-to-remember syntax. You just run an UPDATE STATISTICS command, adding the options “WITH ROWCOUNT=xxxx, PAGECOUNT=xxxx” to force SQL Server to store the numbers you mention as the rowcount and pagecount. Clear the plan cache, then test your queries and check how they behave with the simulated number of rows and pages. I could give a much longer description and examples of the usage, but others have already done so – see for instance this post from the horse’s mouth (aka the Query Optimization Team), this post from Benjamin Nevarez, or if you are able to read German this excellent post from Uwe Ricken.
… and how to get rid of them
But what all those blog posts fail to mention is what happens later. Of course, if you just create a small test table, slap in a few sample rows, fake big statistics, check the execution plan and then drop the table, that is not an issue. But what if you have a test database that contains a lot of carefully crafted test data, and that is used for other tests as well? You may not want to run all the tests with those artificially bumped rowcounts!
In my case, I ran into this in the AdventureWorks sample database, my loyal companion for presentations and demos. I will present a pre-conference seminar on Execution Plans in Exeter (March 21), in Copenhagen (March 28), and in Zagreb (June 2). For my preparations, I wanted to force a hash spill warning, so I artificially lowered the rowcount for the SalesOrderDetail table to just 200. That worked just fine – this nifty feature can be used to mimic every rowcount, not just the big numbers.
Because I use this sample database for other purposes as well, I wanted to immediately reset the statistics to their normal value. I figured rebuilding all the statistics on the table with the FULLSCAN option would do the trick. Right? Wrong! The UPDATE STATISTICS did run, it took a while to scan all rows in the table (as requested) – but then it still retained the rowcount and pagecount values that I had forced earlier! And there is no visible indication at all – executing the UDPATE STATISTICS … WITH FULLSCAN statement simply runs for a while, then reports success.
It was actually mere coincidence that I later did some other tests on the same table, and just happened to notice that the estimated rowcount for this table was still 200. It then took me at least an hour of searching the internet and trying many different options (including using zero or a blank value for the rowcount and pagecount parameters, combining different other options of the UPDATE STATISTICS command, clearing or not clearing the plan cache, and a few other things I don’t even remember. None of them worked. I could of course count the actual number of rows and pages and use that, and that would have worked because my AdventureWorks database never changes – but for a test database where the data can actually change over time, this options would not work.
In the end, I finally found one a method that works. But it is really a sledgehammer approach, and I prefer not heaving to do this on large tables on a regular basis: ALTER TABLE Sales.SalesOrderDetail REBUILD. After running that statement, I found that the statistics on the SalesOrderDetail table had finally reverted to their normal behavior. Until the next time I need to fake a different number.
On a development server, using the WITH ROWCOUNT and WITH PAGECOUNT options of the UPDATE STATISTICS is a fine method to simulate large numbers of rows, or to simulate the effect of wrong statistics. But unfortunately, the numbers given stick for far longer that I like; this is not documented anywhere, and not easy to undo. Rebuilding the table and indexes appears to be the only solution.
If you have found other ways to return from fake row- and pagecounts to the normal statistics behavior, post a comment and share your knowledge!