Collation change script

Normally when I blog, I spend a lot of time making sure that get as close to perfection as I can. This post is an exception. The problem I am talking about is too far out of my main specialization area that I want to spend more time on it than I already have, yet what I have so far appears to be valuable enough already to warrant posting. Even in it’s current, unfinished and not fully tested state.

The problem

One of my customers called me to look at a problem where they were unable to upgrade a third party application due to some mysterious “collation conflict” error messages during the upgrade process. Those error messages originated from the SQL Server database used as the backend for the third party application. And since I am supposed to be the SQL Server expert, they were all looking at me for a solution.

I won’t bother you with the long process we went through for troubleshooting. Let’s just say that with better service providers, we probably would have found the issue earlier. (In fact, with better service providers it probably would never have happened). So let me jump straight to the root cause.

About two years ago, they had moved the database backend for this application from their old physical hardware to hosted hardware (not true cloud, a virtual cloud from a hosting provider). For weird reasons they had not asked me to set up the SQL Server instance, but instead asked the hosting provider. Who apparently felt confident that just clicking next should be good enough. They installed SQL Server, restored a backup of the old database, and called it a day.

Problem is: the new instance was set up with a default collation that was different from the default collation of the old instance. And hence different from the database default collation. And when that resulted in errors, they responded by changing the default collation. Now everything ran and they were happy.

Fast forward two years. The database now has lots of “old” tables that use the original default collation for all string columns. But also lots of “new” tables that have their string columns using the new default collation. And the upgrade failed because somewhere in the new code is a comparison between two character columns that now have incompatible collation.

The fix

I contacted the vendor to find out what collations are recommended and/or supported. I rebuilt the master database to set the default instance collation to the recommended collation, then switched the database default collation to match it. But I had one problem remaining: thousands of columns defined with the wrong collation.

Finding the columns with incorrect collations is easy enough. Scripting the ALTER TABLE statements to change their collation is also not a really big problem. But to change the collation, one has to first drop all indexes that contain the column, and drop all foreign key constraints that reference the column. That’s where it became challenging.

So what does an IT person do when faced with a challenge? Yes: they turn to Google. I found several sites with various tips and scripts. The one that looked most promising was found at StackOverflow. The script posted by Philip C appeared to do everything I needed, and probably even more.

But we all know the golden rule of internet: never trust anything or anyone blindly. So obviously I did not just run this on production. I restored a copy of the database to a test server, so I could experiment with the script. And boy, was I glad I did. There were various issues with the script. I tweaked a bit here, made a small change there … and then realized there was a basic problem that actually required me to do a quite major overhaul. I still was happy to have found the script because it absolutely gave me a head start, but it did end up taking more time than I had hoped for.

The bugs

The first bug I fixed was not actually a real bug. It was a suboptimal choice. The script generates lots of SQL statements in a temporary table, then at the end sends them all to the SSMS window where you can select the output, copy and paste into a new query window, and then execute. When outputting the statements, it sorts them by statement type to ensure that objects are first dropped, then collations are changed, and then objects are recreated in an order that ensure that objects with dependencies are generated after the objects they depend on.

But I disagree with some of the ordering choices. For instance, when recreating objects the script first creates indexes, then unique constraints, and then primary key constraints. That is not wrong. But in my experience, the primary key is most often the clustered index. And when you build multiple indexes on the same table, it is smart to start with the clustered index – if you first build a nonclustered index, then that index has to be rebuilt when you later add a clustered index.

I did not try to add logic to look at which indexes are actually clustered and nonclustered to get the guaranteed best order. But I did switch the order. My version of the script now first generates primary key constraints, then unique constraints, and then the remaining indexes. So in most cases, this will build the clustered index first. Oh, and I did a similar change (using the reverse logic) in the part where constraints and indexes are dropped.

Another change I made quite quickly was to fix a small sloppiness. When recreating the foreign key constraints that originally were enabled and trusted, the code generated would force SQL Server to check the existing data twice. I removed the extra check.

I also did a small usability change. The script as posted returns two columns of output to SSMS. I removed the column with the sort order so that it outputs just the generated T-SQL statements. This makes it very easy to copy those statements and paste them in a new query window where I can review and then execute them.

After making those tweaks, I started testing the script. And that’s when the actual bugs popped up. I didn’t track all of them. One I remember is that wrong code was generated when a multi-column unique constraint overlaps with a multi-column primary key. After fixing that, I got errors where e.g. the same constraint was generated twice if it was a multi-column constraint (not sure if that’s an original bug or introduced by me when I fixed the first bug).

As I noticed more and more small issues, almost always related to multi-column constraints, I realized that the problem is that the script was originally completely column-based. For each column, it would check indexes, constraints, etc. That results either in duplication within the script (often, but not always, removed by using DISTINCT), or complex code to try to prevent the duplication.

So I decided that the best way to fix everything is to keep the outline of the script, but make the changed needed to do everything per table instead of per column, while still making sure not to do unneeded work (e.g. indexes on non-affected columns are never dropped and rebuilt).

After completing that work, I had a script that did what I needed: change the collation of all affected columns, including the dropping and recreating of objects that would prevent this collation change. I ran it on my test database and then used SQL Compare (I think this action alone already earned back the entire license fee) to verify that the two databases were now exactly the same except for the collations.

The script

Below you will find the final version of the script that I ended up using. There are some important points that I want to point out. Please take heed when you consider using the script:

  • The original version of the script had some weird exceptions that I didn’t really understand, and that were not always consistent. Probably due to the various sources that Philip C used when he assembled the script. I removed most of them.
  • The original version of the script also had logic to drop and recreate manually created statistics objects. I didn’t have any of those in the database I was working on, so to save some time I simply commented out that part of the code and didn’t try to figure out if there were any bugs there. Since I based some of the logic in the loop within the script, this part of the code almost certainly doesn’t work now. So if you need to use this script on a database that does have manually created statistics, you will have to uncomment these parts of the script and then make the necessary changes to get them to work.
  • I have tested this scrip on the database that I needed to fix. This database is quite complex and big, but I have no guarantee that all possible situations are covered. Please, do not ever run this script on important production databases without first testing it on a test copy of that database, and without having a good backup available.
  • The script is currently set to automatically find all columns that have a collation that is different from the database default collation. This is quite easy to change if your requirements are different.
  • The script posted below only generates the code to make the changes; it does not automatically execute it. The generated output is sent to SSMS as a single result set. I recommend setting SSMS to use “results to text” (Ctrl-T), and checking the query options to make sure that the maximum number of character per column is suffienctly large – SSMS defaults this to 256 which is insufficient; I always set this to the maximum allowed value which is 8192.
  • As mentioned above, I can not justify investing more time in this script now. The problem is fixed so I can’t improve it on customer time. And I need my own time to prepare my presentation for SQLBits next month.

The code

So without further ado, here is a script you can use, with the aforementioned precautions, to find all columns that do not have the database default collation and change them to that collation; including the dropping and recreating of objects as needed for this change.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this:

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.