Did you know that SQL Server allows stored procedures to have up to 2100 parameters? And more important: do you care? Well, some people do care, and Joe Celko seems to be one of them.
If you are a regular reader of SQL Server newsgroups, you probably know Joe Celko from his always unfriendly and often incorrect replies. Here is a typical example, one that I have seen several times recently, in a paraphrased form:
Question: I want to send a list of values to my stored procedure, but WHERE ColumnName IN (@ValueList) does not work – how to solve this?
Answer: SQL Server can handle over 1000 parameters. You should use WHERE ColumnName IN (@Parm1, @Parm2, …, @ParmN).
Joe Celko is the only one I have ever seen giving this advise. Many people will then jump into the discussion, challenging Joe’s advise. To which Joe will always reply that he has received a smart stored procedure that will solve a Sudoku puzzle, and that takes 81 parameters (one for each cell in the puzzle) as its input – unfortunately, Joe has so far refused to actually publish his code to let other people verify his claims.
The test setup
I still wanted to see for myself how passing 81 parameters into a stored procedure compares to other methods of passing in the same input, so I wrote three simple test procedures. Each of these procedures takes a Sudoku puzzle as input, but in three different forms. Each of the three then uses the input to populate a temporary table (#Problem) with the puzzle, and then performs a pretty standard pivot query to output the puzzle in the usual form.
After verifying that all of the procedures worked as expected, I uncommented the pivot query to reduce the output for my performance tests. I then set the tests. I selected two real Sudoku puzzles (an easy one, with 34 cells given, and a hard one with only 27 cells given) and added two nonsensical ones of my own (one with only 5 cells given, and one with 72 cells). For each combination of a puzzle and a procedure, I coded a loop that calls the procedure a thousand times and records the elapsed time in a table. These twelve loops were than enclosed in an endless loop. Once satisfied with the code, I hit the execute button, pushed my laptop out of the way and went on to other stuff.
Some 24-odd hours later, I interrupted the test script. Each of the twelve “thousand calls” tests had been executed 400 times. I dusted of a query I originally wrote for testing the performance of code for the Spatial Data chapter in “Expert SQL Server 2005 Development” to calculate the average duration per single call, disregarding the fastest and slowest 10% of the measurements to exclude the influence of semi-random other activities on my laptop.
(Note that all code to create the stored procedures and run the tests is in the attachment to this post, so you can always repeat these tests on your machine.)
The first contender is of course the procedure with 81 parameters that Joe Celko is such an avid fan of. Creating this procedure involved a lot of copying and pasting, a lot of editing numbers in the copied and pasted code, and a lot of tedious debugging until I had finally found and corrected all locations where I had goofed up the copy and paste or where I had failed to edit a number after pasting. The resulting code is very long, tedious to read and maintain, and screams “Hey! You forgot to normalize these repeating groups into their own table” all the way. Manually typing the EXEC statements to call this procedure with test data was also very cumbersome and error-prone. In a real situation, the procedure would probably be called from some user-friendly front end code. I’m far from an expert in front end code, but I expect this code to be very long as well, since it has to check and optionally fill and pass 81 parameters.
The second contender uses a pretty standard CSV string as input, with the additional requirement that each value in the CSV is three characters: row@, column#, value. The procedure uses a variation of one of the scripts found on Erland Sommarskog’s site to parse the CSV list into a tabular format. This code is lots shorter, and as a result easier on the eyes and easier to maintain. Typing in the EXEC statements for testing is still pretty cumbersome (though I found a way to cheat – simply copy the parameter list for the Celko version, do some global search and replace to remove various characters, and the end was exactly the string I needed to call this procedure). The front end code will probably be lots shorter, since it can use a simple loop to process the input and build the CSV parameter.
The third and last contender takes a CHAR(81) parameter as input. The first 9 characters of this parameter describe the top row, using a space to depict an empty cell; the second set of 9 characters is for the second row, and so forth. Parsing this parameter turned out to be even easier than parsing the CSV parameter. Another observation I made is that is was much easier to manually enter the parameter for the tests – just read the puzzle left to right and bottom to top and type either a number or a space for each cell. This was absolutely painless, and I didn’t make a single mistake. Of course, this is irrelevant for the expected real situation where the parameter is built by the front end – the code to do this will probably be about as complex as that for the CSV parameter.
If you’re as eager to see the test results, you’ll probably have skipped the previous section. No problem, just promise to go back and read it later, m’kay?
Joe Celko’s 81 parameters
Single CSV parameter
Single CHAR(81) parameter
Almost empty (5 cells)
Hard puzzle (27 cells)
Easy puzzle (34 cells)
Almost full (72 cells)
As you see, using lots of parameters is faster than using a single CSV parameter only if you don’t actually pass values in these parameters. As soon as you use the parameters, performance of a procedure with lots of parameters deteriorates quickly.
You can also see that the CHAR(81) parameter wins in all cases.
My testing was all carried out on my laptop. The results will for the most part me a result of the time needed to process the input, not on network capacity. However, it is easy to see by just looking at the EXEC statements that the CHAR(81) version uses the least network resources, Celko’s version with 81 parameters uses the most, and the CSV versions sits nicely in between.
You may have noted that I have not included a version with an XML input parameter in my tests. I probably should have done that, but I have to admit that I still have so much to learn on how to handle XML in a SQL Server database that I didn’t feel comfortable enough to sit down and write one myself. But your submissions are welcomed – if you feel that you can write an efficient version of this procedure that accepts its input in XML format, do not hesitate to write me. As soon as I can spare the time to set up the laptop for another all-nighter of performance testing, I’ll rerun the script with your XML solution included and post the results back to this site,
While writing this post, I found a newsgroup posting by Joe Celko where he reveals a snippet of “his” Sudoku solver. And guess what? I was wrong when I thought that I could guess how his procedure looks. It turns out that he does not use defaults for his parameter; you always have to supply them all, using 0 for an empty cell. I didn’t want to repeat all the tests at this time. I expect that this will reduce performance even more, though not by much – but it will also cause a huge increase in network usage!
I also saw that the parameters in Joe Celko’s version were declared as integer, so that each parameter will use 4 bytes instead of just 1. This will definitely affect both the performance of the procedure and the network pressure.
If you have to pass a long list of parameters to a stored procedure or function, you should not use a long collection of parameters. It makes the code harder to write and maintain, prone to subtle errors, longer (which will affect parse and compile time, though I did not include this in my test), uses far more network resources than any of the other alternatives (except, maybe, XML), and gets terribly slow as more parameters are actually used.
Joe celko will probably find that he too can shorten the amount of code in his Sudoku solver *and* increase performance by using a different strategy to pass the puzzle. Of course, in the case of solving a Sudoku, those two 2 milliseconds extra execution time won’t really matter, not the few hundred extra bytes travelling over the network. But if you ever encounter a similar multi-parameter problem in a procedure that will be called from a web page that will get hundreds of hits per second, those 2 milliseconds and those extra bytes in the network can suddenly become a huge bottleneck!
SQL Server may support up to 2100 parameters – but that does not imply that it is a good idea to actually use them!
File Attachment: Sudoku.zip
I just realised that I forgot to include the attachment with the scripts to create and test the procedures. This has now been corrected.
Tony Rogerson wrote something similar a while back (only not as nice in tone, I wonder why? <vbg>)
Actually 81 parameters is no big deal. Our business needs up to 100,000 parameters, and I handle it easily with an image, as described here:
Of course my approach is also based on Erland Sommarskog’s article too.
It would be interesting to benchmark it against Celko’s approach at its upper limit, let’s say 2000 parameters.
Thanks for the pointer to Tony’s article. I did read this at the time it was posted, but somehow forgot about it later. Sorry, Tony – I should have mentioned and linked to your post in my article.
Such benchmarking has already been done, for 1,000 parameters. Just follow the link Denis posted. I don’t think Tony used Erland’s version, though.
However, I guess it’s safe to assume that, given the increase in performance of the many-parameter approach I have already seen when using 81 parameters, you won’t find good performance figures when you scale it up to or over the 1,000 parameter mark.
Your 100,000 parameters encoded in an image also looks interesting – I just have no idea how to enter that lightbulb icon into my C# source code 🙂
Thanks for your comments, Alex and Denis!
Another great post Hugo – I often see arguments both ways on large parameter numbers – this spells things out fairly clearly.
I replaced the [i] lightbulb with [ i ], so it’s readable now. Thanks for noticing the problem. Tony in his benchmarks did not use fixed width numbers in binary representation as I do, he used comma separated numbers in character representation, which are slower.
There’s another lightbulb in the second snippet. Now that I know it represents [i], I understand what you meant – but you might want to fix it for the benefit of future readers! <g>
Nice article. Joe Celko answered one of my forum questions in the EXACT manner you replied..not really answering the question, just saying I don’t have a clue about how RDBMS works and I am asking the wrong question.
He is one of those people I would love to call "theory architect". I worked with a few before..and they love to get in a room and argue the MOST abstract nonsensical crap, while OTHERS are actually doing work.
One thing I have learned thus far in my software dev. career is that "perfect design" doesn’t exist nor is it worth it. I have seen people make a TON of money not using the correct architecture, however they were first to market or idea. Joe Celko would be the type of person to rip you a new one having seen it and having a smirk on his face how he "set you straight"….while the company is 4 months behind on their deliverable and losing deals in the process.
No quite true; it is not MY code to publish. It was written by Richard Romley, a retired SQL expert from Smith-Barney Solomon. It was in response to an incomplete Sudoku query I had written. People who have read my columns and books over the decades might remember him as the guy who could cook my puzzles and solutions damn near every time. I do not yet have his permission to publish it on my website (it is about 18 pages long)
I got his permission to send the routine to Tony Rogerson, along with an offer to run his own tests remotely or on his own hardware. Tony bragged that he would pound out a better answer in five minutes, but it has not happened yet as far as I know 🙂
The procedure is in T-SQL dialect. You input a 9×9 Sudoku grid (81 parameters), it prints out an ASCII picture of the grid. It then finds ALL solutions for that grid. This is part of why Sodoku can be a good math problem! You need at least 24 numbers in the initial grid to guarantee a unique answer.
If a grid has fewer than that, then it can have multiple solutions and there are 6.671 * 10^21 possible valid grids. In fact, there is an article on Sodoku solving from IEEE SPECTRUM on-line about how it is NP-complete. Richard found one published puzzle when he was pulling in samples to test his procedure that had 39 answers (it finds all of them).
If you want to really freak out, there is a 81-table self-join in the code. And the response time is in milliseconds for any grid tested so far. The optimizer guys at MS are better than I would have guessed!
I did write that you "received" it. But it’s good that you point out once more that the actual Sudoku-solving code has to be attributed -if it ever gets published- to Richard Romley.
Anyway, you now know that you can shave off another ~0.5 milliseconds of the solver’s execution time by replacing the 81 parameters with a single char(81) parameter, which was the point of this post.
How do these solutions compare with Itzik’s solution?
Was this question directed at Joe, or at me?
In the former case, I’ll await Joe’s answer with bated breath.
In the latter case, the answer is that they don’t compare at all. I merely focused on how to pass a Sudoku puzzle to a stored proc and skipped the solving part completely 🙂
Not sure if this is applicable or not, but hasn’t Celko posted the code on his SQL puzzles site?
More specifically, here : http://www.celko.com/sudoku.txt
Thanks for the pointer. Yes, that must be the code Joe has previously been referring to without having the permission to publish it – appparently, Richard Romley changed his mind.
It’s a crying shame that Joe published his code in this form, claiming that this code "demonstrates that a long parameter list is not harmful" (granted, there won’t be any smoke in the server room if you run this, but I have proven that there are better [read: faster] alternatives and he doesn’t even mention this – I’m sorry, Joe, but pretending my counterarguments don’t exist doesn’t make you any less wrong).
It’s also a shame that the code looks awfully formatted in a browser window (hint: right-click the link, choose "Save as", then open the downloaded file in any text editor). And that he chose to use table names, coolumn names, and variable names that are for the most part completely non-descript.
I am tempted to copy and paste one of Joe’s own newsgroup rants about using IEEE-11179, about temp tables mimicking a scratch tape, and so on, but I’ll resist temptation. It seems that Joe is perfectly able of making a fool of himself without me driving home the point.
That being said, I’ll have to admit that Richard’s solution is pretty smart. I’ll admit that I did spend an hour or two trying to come up with my own algorithm for solving a sudoku, but I didn’t get near the performance and efficiency of Richard’s solution.
ye I do believe Celko did post that allready