Collation change script

5 Comments

Normally when I blog, I spend a lot of time making sure that get as close to perfection as I can. This post is an exception. The problem I am talking about is too far out of my main specialization area that I want to spend more time on it than I already have, yet what I have so far appears to be valuable enough already to warrant posting. Even in it’s current, unfinished and not fully tested state.

The problem

One of my customers called me to look at a problem where they were unable to upgrade a third party application due to some mysterious “collation conflict” error messages during the upgrade process. Those error messages originated from the SQL Server database used as the backend for the third party application. And since I am supposed to be the SQL Server expert, they were all looking at me for a solution.

I won’t bother you with the long process we went through for troubleshooting. Let’s just say that with better service providers, we probably would have found the issue earlier. (In fact, with better service providers it probably would never have happened). So let me jump straight to the root cause.

About two years ago, they had moved the database backend for this application from their old physical hardware to hosted hardware (not true cloud, a virtual cloud from a hosting provider). For weird reasons they had not asked me to set up the SQL Server instance, but instead asked the hosting provider. Who apparently felt confident that just clicking next should be good enough. They installed SQL Server, restored a backup of the old database, and called it a day.

Problem is: the new instance was set up with a default collation that was different from the default collation of the old instance. And hence different from the database default collation. And when that resulted in errors, they responded by changing the default collation. Now everything ran and they were happy.

Fast forward two years. The database now has lots of “old” tables that use the original default collation for all string columns. But also lots of “new” tables that have their string columns using the new default collation. And the upgrade failed because somewhere in the new code is a comparison between two character columns that now have incompatible collation.

The fix

I contacted the vendor to find out what collations are recommended and/or supported. I rebuilt the master database to set the default instance collation to the recommended collation, then switched the database default collation to match it. But I had one problem remaining: thousands of columns defined with the wrong collation.

Finding the columns with incorrect collations is easy enough. Scripting the ALTER TABLE statements to change their collation is also not a really big problem. But to change the collation, one has to first drop all indexes that contain the column, and drop all foreign key constraints that reference the column. That’s where it became challenging.

So what does an IT person do when faced with a challenge? Yes: they turn to Google. I found several sites with various tips and scripts. The one that looked most promising was found at StackOverflow. The script posted by Philip C appeared to do everything I needed, and probably even more.

But we all know the golden rule of internet: never trust anything or anyone blindly. So obviously I did not just run this on production. I restored a copy of the database to a test server, so I could experiment with the script. And boy, was I glad I did. There were various issues with the script. I tweaked a bit here, made a small change there … and then realized there was a basic problem that actually required me to do a quite major overhaul. I still was happy to have found the script because it absolutely gave me a head start, but it did end up taking more time than I had hoped for.

The bugs

The first bug I fixed was not actually a real bug. It was a suboptimal choice. The script generates lots of SQL statements in a temporary table, then at the end sends them all to the SSMS window where you can select the output, copy and paste into a new query window, and then execute. When outputting the statements, it sorts them by statement type to ensure that objects are first dropped, then collations are changed, and then objects are recreated in an order that ensure that objects with dependencies are generated after the objects they depend on.

But I disagree with some of the ordering choices. For instance, when recreating objects the script first creates indexes, then unique constraints, and then primary key constraints. That is not wrong. But in my experience, the primary key is most often the clustered index. And when you build multiple indexes on the same table, it is smart to start with the clustered index – if you first build a nonclustered index, then that index has to be rebuilt when you later add a clustered index.

I did not try to add logic to look at which indexes are actually clustered and nonclustered to get the guaranteed best order. But I did switch the order. My version of the script now first generates primary key constraints, then unique constraints, and then the remaining indexes. So in most cases, this will build the clustered index first. Oh, and I did a similar change (using the reverse logic) in the part where constraints and indexes are dropped.

Another change I made quite quickly was to fix a small sloppiness. When recreating the foreign key constraints that originally were enabled and trusted, the code generated would force SQL Server to check the existing data twice. I removed the extra check.

I also did a small usability change. The script as posted returns two columns of output to SSMS. I removed the column with the sort order so that it outputs just the generated T-SQL statements. This makes it very easy to copy those statements and paste them in a new query window where I can review and then execute them.

After making those tweaks, I started testing the script. And that’s when the actual bugs popped up. I didn’t track all of them. One I remember is that wrong code was generated when a multi-column unique constraint overlaps with a multi-column primary key. After fixing that, I got errors where e.g. the same constraint was generated twice if it was a multi-column constraint (not sure if that’s an original bug or introduced by me when I fixed the first bug).

As I noticed more and more small issues, almost always related to multi-column constraints, I realized that the problem is that the script was originally completely column-based. For each column, it would check indexes, constraints, etc. That results either in duplication within the script (often, but not always, removed by using DISTINCT), or complex code to try to prevent the duplication.

So I decided that the best way to fix everything is to keep the outline of the script, but make the changed needed to do everything per table instead of per column, while still making sure not to do unneeded work (e.g. indexes on non-affected columns are never dropped and rebuilt).

After completing that work, I had a script that did what I needed: change the collation of all affected columns, including the dropping and recreating of objects that would prevent this collation change. I ran it on my test database and then used SQL Compare (I think this action alone already earned back the entire license fee) to verify that the two databases were now exactly the same except for the collations.

The script

Below you will find the final version of the script that I ended up using. There are some important points that I want to point out. Please take heed when you consider using the script:

  • The original version of the script had some weird exceptions that I didn’t really understand, and that were not always consistent. Probably due to the various sources that Philip C used when he assembled the script. I removed most of them.
  • The original version of the script also had logic to drop and recreate manually created statistics objects. I didn’t have any of those in the database I was working on, so to save some time I simply commented out that part of the code and didn’t try to figure out if there were any bugs there. Since I based some of the logic in the loop within the script, this part of the code almost certainly doesn’t work now. So if you need to use this script on a database that does have manually created statistics, you will have to uncomment these parts of the script and then make the necessary changes to get them to work.
  • I have tested this scrip on the database that I needed to fix. This database is quite complex and big, but I have no guarantee that all possible situations are covered. Please, do not ever run this script on important production databases without first testing it on a test copy of that database, and without having a good backup available.
  • The script is currently set to automatically find all columns that have a collation that is different from the database default collation. This is quite easy to change if your requirements are different.
  • The script posted below only generates the code to make the changes; it does not automatically execute it. The generated output is sent to SSMS as a single result set. I recommend setting SSMS to use “results to text” (Ctrl-T), and checking the query options to make sure that the maximum number of character per column is suffienctly large – SSMS defaults this to 256 which is insufficient; I always set this to the maximum allowed value which is 8192.
  • As mentioned above, I can not justify investing more time in this script now. The problem is fixed so I can’t improve it on customer time. And I need my own time to prepare my presentation for SQLBits next month.

The code

So without further ado, here is a script you can use, with the aforementioned precautions, to find all columns that do not have the database default collation and change them to that collation; including the dropping and recreating of objects as needed for this change.

Off Topic: Charity and snooker
SQL injection

Related Posts

No results found

5 Comments. Leave new

  • Stefan Visagie
    July 3, 2019 15:04

    This is a great piece of work! Saved us trying to migrate databases into Azure SQL. Thank you very much.

    Reply
  • Great script! I’m adding it to the repertoire 🙂 I’m going to look at the statistics stuff to see if any changes needed there for my purposes.

    Reply
  • Walter Charrière
    August 12, 2019 18:26

    Hi

    I like to share a minor update, added drop and create column defaults alos required when changing collation (also put more GO separators in cases of large scripts and comment lines of what comes next after each GO.

    — SSMS results to text – max length 8192 (change from default!!!)

    /*******************************************************************************
    *
    * Created 2017-06-16 By Philip C
    *
    * This script will check individual columns collations and check it against the
    * database default collation, where they are different it will create the scripts
    * required to drop all the objects dependant on the column, change the collation
    * to the database default and then recreate the dependant objects.
    * Some of the code has been reused from stuff found online the majority from
    * Jayakumaur R who created scripts to drop and recreate constraints
    *
    *
    * Modified 2019-01-11 by Hugo Kornelis
    *
    * Removed double checking of recreated constraints
    * Modified the order of the output to minimize number of index rebuilds (assuming primary key is most often clsutered index)
    * Fixed a bug caused by unique constraints overlapping the primary key constraint
    * Fixed a bug in detection of foreign key constraints to drop and recreate
    * Fixed bugs with recreating multi-column UNIQUE constraints
    *
    * Modified 2019-08-12 by Walter Charrière
    *
    * Add drop and create default constraints
    *
    *********************************************************************************/

    SET NOCOUNT ON;
    SET ANSI_WARNINGS OFF;
    GO

    — Declare variables
    DECLARE @SchemaName sysname,
    @TableName sysname,
    @IndexName sysname,
    @ColumnName sysname,
    @index_id int,
    @is_unique nvarchar(100),
    @IndexTypeDesc nvarchar(100),
    @FileGroupName sysname,
    @is_disabled nvarchar(100),
    @IndexOptions nvarchar(MAX),
    @IndexColumnId int,
    @IsDescendingKey int,
    @IsIncludedColumn int,
    @TSQLScripCreationIndex nvarchar(MAX),
    @TSQLScripDisableIndex nvarchar(MAX),
    @object_id int,
    @has_key_constraint int,
    @has_index int,
    @has_foreign_key int,
    @has_stats int,
    @stats_id int,
    @FK_objectid int,
    @FK_name sysname,
    @UQ_objectid int,
    @UQ_name sysname,
    @UQ_index_id int,
    @has_unique_constraint int,
    @has_default_constraint int,
    @DatabaseCollation sysname,
    @IndexColumns nvarchar(MAX),
    @IncludedColumns nvarchar(MAX);

    — Temporary table for generated script
    CREATE TABLE #tempscriptstore
    (ScriptType varchar(20),
    script nvarchar(MAX));

    — Get dataabse default collation
    SELECT @DatabaseCollation = collation_name
    FROM sys.databases
    WHERE database_id = DB_ID();

    /************************************************************************************************************************************
    * Iterate over all the tables that have at least one colmun where collation doesn’t match the database default. *
    * Also checks for PRIMARY KEY, UNIQUE, and (referencing) FOREIGN KEY constraints, indexes, and manually created statistics. *
    * (Note that the counts are not accurate counts due to duplication, these should only be tested for zero or non-zero) *
    ************************************************************************************************************************************/
    DECLARE collationfix CURSOR LOCAL FORWARD_ONLY READ_ONLY STATIC FOR
    SELECT t.object_id,
    OBJECT_SCHEMA_NAME(t.object_id) AS schemaname,
    OBJECT_NAME(t.object_id) AS tablename,
    COUNT(kc.object_id) AS has_key_constraint,
    COUNT(ic.index_id) AS has_index,
    COUNT(fk.constraint_object_id) AS has_foreign_key,
    COUNT(st.stats_id) AS has_stats,
    COUNT(uq.object_id) AS has_unique_constraint,
    COUNT(dc.object_id) AS has_default_constraint
    FROM sys.tables AS t
    INNER JOIN sys.columns AS c
    ON c.object_id = t.object_id
    AND c.collation_name <> @DatabaseCollation — Table needs to have columns with “wrong” collation
    INNER JOIN sys.types AS ty
    ON ty.system_type_id = c.system_type_id
    AND ty.name <> N’sysname’ — Exclusion retained from Philip C’s original script
    LEFT JOIN sys.index_columns AS ic — Find indexes on any of the affected columns
    ON ic.object_id = c.object_id
    AND ic.column_id = c.column_id
    LEFT JOIN sys.key_constraints AS kc — Find primary key constraints related to an affected index
    ON kc.parent_object_id = c.object_id
    AND kc.unique_index_id = ic.index_id
    AND kc.type = ‘PK’
    LEFT JOIN sys.key_constraints AS uq — Find unique constraints related to an affected index
    ON uq.parent_object_id = c.object_id
    AND uq.unique_index_id = ic.index_id
    AND uq.type = ‘UQ’
    LEFT JOIN sys.foreign_key_columns AS fk — Find foreign key constraints on any of the affected columns
    ON fk.parent_object_id = c.object_id
    AND fk.parent_column_id = c.column_id
    LEFT JOIN sys.stats_columns AS st — Find statistics on any of the affected columns
    ON st.object_id = c.object_id
    AND st.column_id = c.column_id
    AND st.stats_column_id <> 1 — Retained from Philip C’s original script, no idea why this is in the query
    LEFT JOIN sys.default_constraints AS dc — Find default constraints on any of the affected columns
    ON dc.object_id = c.default_object_id
    AND dc.parent_column_id = c.column_id
    WHERE t.is_ms_shipped = 0 — Exclude Microsoft-shipped tables
    GROUP BY t.object_id;

    OPEN collationfix;
    FETCH NEXT FROM collationfix
    INTO @object_id,
    @SchemaName,
    @TableName,
    @has_key_constraint,
    @has_index,
    @has_foreign_key,
    @has_stats,
    @has_unique_constraint,
    @has_default_constraint;

    WHILE (@@FETCH_STATUS = 0)
    BEGIN;

    — DROP and CREATE statistics for manual statistics has been disabled for now
    /************************************************************************************************************************************
    * If the column has statistics that aren’t part of an index this creates the drop and recreate scripts *
    ************************************************************************************************************************************/
    IF @has_stats > 0
    AND @has_index = 0
    BEGIN
    DECLARE stats_cursor CURSOR FOR
    SELECT sc.stats_id
    FROM sys.stats_columns AS sc
    JOIN sys.stats AS s
    ON s.object_id = sc.object_id
    AND s.stats_id = sc.stats_id
    AND s.user_created = 1
    WHERE sc.object_id = @object_id
    AND sc.column_id = @Collation_columnid;
    OPEN stats_cursor;
    FETCH NEXT FROM stats_cursor
    INTO @stats_id;
    WHILE (@@FETCH_STATUS = 0)
    BEGIN
    –Create DROP Statistics Statement
    INSERT INTO #tempscriptstore
    (ScriptType,
    script)
    SELECT ‘DropStatistics’,
    ‘DROP STATISTICS ‘ + QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id)) + ‘.’
    + QUOTENAME(OBJECT_NAME(s.object_id)) + ‘.’ + QUOTENAME(s.name)
    FROM sys.stats AS s
    WHERE s.object_id = @object_id
    AND s.stats_id = @stats_id;

    */

    END;
    CLOSE collationfix;
    DEALLOCATE collationfix;

    /************************************************************************************************************************************
    * Returns all the created scripts in the correct order for running *
    * Also add GO in between some sections so the entire script can execute at once *
    ************************************************************************************************************************************/
    DECLARE @oddnum INT = 1;
    WITH ScriptInput
    AS (SELECT script,
    CASE WHEN ScriptType = ‘DropDefault’
    THEN 1
    WHEN ScriptType = ‘DropForeignKey’
    THEN 3
    WHEN ScriptType = ‘DropIndex’
    THEN 5
    WHEN ScriptType = ‘DropUniqueKey’
    THEN 7
    WHEN ScriptType = ‘DropPrimaryKey’
    THEN 9
    WHEN ScriptType = ‘DropStatistics’
    THEN 11
    WHEN ScriptType = ‘AlterCollation’
    THEN 13
    WHEN ScriptType = ‘AddPrimaryKey’
    THEN 15
    WHEN ScriptType = ‘AddUniqueKey’
    THEN 17
    WHEN ScriptType = ‘CreateIndex’
    THEN 19
    WHEN ScriptType = ‘AddStatistics’
    THEN 21
    WHEN ScriptType = ‘AddForeignKey’
    THEN 23
    WHEN ScriptType = ‘DisableForeignKey’
    THEN 25
    WHEN ScriptType = ‘CheckForeignKey’
    THEN 27
    WHEN ScriptType = ‘DisableIndex’
    THEN 29
    WHEN ScriptType = ‘CreateDefault’
    THEN 31
    ELSE 99
    END AS SortOrder
    FROM #tempscriptstore
    UNION ALL
    SELECT case when n.num > 0 then ‘GO — ‘ + descrip else descrip end,
    n.num
    FROM (VALUES (0, ‘Drop Default’),
    (2, ‘Drop Foreign Key’),
    (4, ‘Drop Index’),
    (6, ‘Drop Unique Key’),
    (8, ‘Drop Primary Key’),
    –(10, ‘Drop Statistics’),
    (12, ‘Alter Collation ‘),
    (14, ‘Add Primary Key ‘),
    (16, ‘Add Unique Key ‘),
    (18, ‘Create Index ‘),
    –(20, ‘Add Statistics ‘),
    (22, ‘Add Foreign Key ‘),
    (24, ‘Disable Foreign Key ‘),
    (26, ‘Check Foreign Key ‘),
    (28, ‘Disable Index ‘),
    (30, ‘Create Default ‘),
    (32, ‘End of script’)
    ) AS n (num, descrip) )
    SELECT ScriptInput.script
    FROM ScriptInput
    ORDER BY ScriptInput.SortOrder;

    DROP TABLE #tempscriptstore;

    Reply
  • Walter Charrière
    August 12, 2019 18:31

    Ups!! Sorry, hope this get the code correctly:
    Begin
    — SSMS results to text – max length 8192 (change from default!!!)

    /*******************************************************************************
    *
    * Created 2017-06-16 By Philip C
    *
    * This script will check individual columns collations and check it against the
    * database default collation, where they are different it will create the scripts
    * required to drop all the objects dependant on the column, change the collation
    * to the database default and then recreate the dependant objects.
    * Some of the code has been reused from stuff found online the majority from
    * Jayakumaur R who created scripts to drop and recreate constraints
    *
    *
    * Modified 2019-01-11 by Hugo Kornelis
    *
    * Removed double checking of recreated constraints
    * Modified the order of the output to minimize number of index rebuilds (assuming primary key is most often clsutered index)
    * Fixed a bug caused by unique constraints overlapping the primary key constraint
    * Fixed a bug in detection of foreign key constraints to drop and recreate
    * Fixed bugs with recreating multi-column UNIQUE constraints
    *
    * Modified 2019-08-12 by Walter Charrière
    *
    * Add drop and create default constraints
    *
    *********************************************************************************/

    SET NOCOUNT ON;
    SET ANSI_WARNINGS OFF;
    GO

    — Declare variables
    DECLARE @SchemaName sysname,
    @TableName sysname,
    @IndexName sysname,
    @ColumnName sysname,
    @index_id int,
    @is_unique nvarchar(100),
    @IndexTypeDesc nvarchar(100),
    @FileGroupName sysname,
    @is_disabled nvarchar(100),
    @IndexOptions nvarchar(MAX),
    @IndexColumnId int,
    @IsDescendingKey int,
    @IsIncludedColumn int,
    @TSQLScripCreationIndex nvarchar(MAX),
    @TSQLScripDisableIndex nvarchar(MAX),
    @object_id int,
    @has_key_constraint int,
    @has_index int,
    @has_foreign_key int,
    @has_stats int,
    @stats_id int,
    @FK_objectid int,
    @FK_name sysname,
    @UQ_objectid int,
    @UQ_name sysname,
    @UQ_index_id int,
    @has_unique_constraint int,
    @has_default_constraint int,
    @DatabaseCollation sysname,
    @IndexColumns nvarchar(MAX),
    @IncludedColumns nvarchar(MAX);

    — Temporary table for generated script
    CREATE TABLE #tempscriptstore
    (ScriptType varchar(20),
    script nvarchar(MAX));

    — Get dataabse default collation
    SELECT @DatabaseCollation = collation_name
    FROM sys.databases
    WHERE database_id = DB_ID();

    /************************************************************************************************************************************
    * Iterate over all the tables that have at least one colmun where collation doesn’t match the database default. *
    * Also checks for PRIMARY KEY, UNIQUE, and (referencing) FOREIGN KEY constraints, indexes, and manually created statistics. *
    * (Note that the counts are not accurate counts due to duplication, these should only be tested for zero or non-zero) *
    ************************************************************************************************************************************/
    DECLARE collationfix CURSOR LOCAL FORWARD_ONLY READ_ONLY STATIC FOR
    SELECT t.object_id,
    OBJECT_SCHEMA_NAME(t.object_id) AS schemaname,
    OBJECT_NAME(t.object_id) AS tablename,
    COUNT(kc.object_id) AS has_key_constraint,
    COUNT(ic.index_id) AS has_index,
    COUNT(fk.constraint_object_id) AS has_foreign_key,
    COUNT(st.stats_id) AS has_stats,
    COUNT(uq.object_id) AS has_unique_constraint,
    COUNT(dc.object_id) AS has_default_constraint
    FROM sys.tables AS t
    INNER JOIN sys.columns AS c
    ON c.object_id = t.object_id
    AND c.collation_name <> @DatabaseCollation — Table needs to have columns with “wrong” collation
    INNER JOIN sys.types AS ty
    ON ty.system_type_id = c.system_type_id
    AND ty.name <> N’sysname’ — Exclusion retained from Philip C’s original script
    LEFT JOIN sys.index_columns AS ic — Find indexes on any of the affected columns
    ON ic.object_id = c.object_id
    AND ic.column_id = c.column_id
    LEFT JOIN sys.key_constraints AS kc — Find primary key constraints related to an affected index
    ON kc.parent_object_id = c.object_id
    AND kc.unique_index_id = ic.index_id
    AND kc.type = ‘PK’
    LEFT JOIN sys.key_constraints AS uq — Find unique constraints related to an affected index
    ON uq.parent_object_id = c.object_id
    AND uq.unique_index_id = ic.index_id
    AND uq.type = ‘UQ’
    LEFT JOIN sys.foreign_key_columns AS fk — Find foreign key constraints on any of the affected columns
    ON fk.parent_object_id = c.object_id
    AND fk.parent_column_id = c.column_id
    LEFT JOIN sys.stats_columns AS st — Find statistics on any of the affected columns
    ON st.object_id = c.object_id
    AND st.column_id = c.column_id
    AND st.stats_column_id <> 1 — Retained from Philip C’s original script, no idea why this is in the query
    LEFT JOIN sys.default_constraints AS dc — Find default constraints on any of the affected columns
    ON dc.object_id = c.default_object_id
    AND dc.parent_column_id = c.column_id
    WHERE t.is_ms_shipped = 0 — Exclude Microsoft-shipped tables
    GROUP BY t.object_id;

    OPEN collationfix;
    FETCH NEXT FROM collationfix
    INTO @object_id,
    @SchemaName,
    @TableName,
    @has_key_constraint,
    @has_index,
    @has_foreign_key,
    @has_stats,
    @has_unique_constraint,
    @has_default_constraint;

    WHILE (@@FETCH_STATUS = 0)
    BEGIN;