Changing a database collation for new and existing objects in a few easy steps
Introduction
The other day a consultant from some company that is working for one of our clients approached me during the upgrade process of a software that I don't want to mention here and told me that they need a restore of the PRD database to the QA instance - and that they need the collation of the restored database changed to a case sensitive collation. There must have been some odd changes to the software from version X to version X+1 so that now they needed case sensitivity but: client is king. So I set down, did my research and found the following method to be applicable and I think also well understandable for beginners.
Now before I start...
... explaining the process of changing a DB's collation let's take a moment and think about what a collation is and what it does have an impact on. Let me mention at this point that this short paragraph does certainly not cover the topic of collation as a whole and in detail but the few most important aspects should be pointed out here. When we look at the standard collation that comes with SQL Server 2017 (SQL_Latin1_General_CP1_CI_AS) then the name of the collation already tells us quite a few things about it. But what is a collation in the first place?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.
This quote from is the best (and first :) ) I found. It basically tells us that the collation determines the sequence of the characters which is especially important when it comes to localized special characters like the German Ä (should it come after the A? Or at the end of the alphabet after the Z?). The collation also determines whether the used character set is case sensitive (CS; vs. CI = case insensitive) whether it is accent sensitive (AS vs. AI; important for i.e. French) and whether it is Kana sensitive (KS vs. KI; that's like... like AS vs, AI only from Asia or so...) aso. The two main aspects due to my experience are sorting and case sensitivity.
Server collation vs. user DB collation
We have to distinguish between the SQL Server Server-Collation which is set during the setup and is subsequently applied to the system databases (master, model, msdb and tempdb) and the user DB collation which is applied during the creation of the database. This blog covers the change of a user DB collation. The Server Collation can be changed with a setup option - however due to my experience if you have to change the server collation do a fresh install. Changing the collation of msdb is not funny even if the collation for master, model and tempdb has been changed. I have done that once and basically ended up fixing jobs with collation conflicts for 3 days. Collation conflicts can occur when 2 objects (i.e. 2 tables) with different collations are used in the same batch or the same statement. The experienced DBA solves this using the "collate" clause which allows us to change the collation of a specific value at runtime. But this is also off topic...
Carrying out the collation change
OK, so I had to change the collation of the database for all new and (!!!) all existing objects. I found the easiest way to do this is to actually carrying out these steps:
- Script out the "ALTER TABLE [table_name] ALTER COLUMN [column_name] datatype-and-size COLLATE new-collation-name (NOT) NULL" statements
- Script all indexes to a #temptable
- Delete all indexes generating the DROP INDEX statements from our #temptable
- Change the collation of the database
- Execute the scripts generated in step 1
- Recreate all indexes
Let's have a look into the procedure in detail...
Step 1 - generate scripts to change the tables' collation
I have found (and adapted) the following query that would script me all the tables that contain columns that are of interesst to me - these are tables that contain columns that have either of these datatypes: char, varchar, nvarchar, text
declare @TableName nvarchar(512)
declare @SQLText nvarchar(4000)
declare @ColumnName nvarchar(256)
declare @CharacterMaxLen nvarchar(100)
declare @datatype sysname
declare @IsNullable nvarchar(10)
declare @CollationName nvarchar(256) = N'Latin1_General_CS_AS'
declare MyTableCursor CURSOR FOR
select TABLE_NAME from INFORMATION_SCHEMA.TABLES
OPEN MyTableCursor
FETCH NEXT FROM MyTableCursor INTO @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE MyColumnCursor Cursor
FOR
SELECT COLUMN_NAME,DATA_TYPE, CHARACTER_MAXIMUM_LENGTH,IS_NULLABLE from information_schema.columns
WHERE table_name = @TableName AND (Data_Type LIKE '%char%' OR Data_Type LIKE '%text%') AND COLLATION_NAME <> @CollationName
ORDER BY ordinal_position
Open MyColumnCursor
FETCH NEXT FROM MyColumnCursor INTO @ColumnName, @DataType, @CharacterMaxLen, @IsNullable
WHILE @@FETCH_STATUS = 0
BEGIN
SET @SQLText = 'ALTER TABLE ' + @TableName + ' ALTER COLUMN [' + @ColumnName + '] ' + @DataType + '(' +
CASE WHEN @CharacterMaxLen = -1 THEN 'MAX' ELSE @CharacterMaxLen END + ') COLLATE '
+ @CollationName + ' ' + CASE WHEN @IsNullable = 'NO' THEN 'NOT NULL' ELSE 'NULL' END
PRINT @SQLText
FETCH NEXT FROM MyColumnCursor INTO @ColumnName, @DataType, @CharacterMaxLen, @IsNullable
END
CLOSE MyColumnCursor
DEALLOCATE MyColumnCursor
FETCH NEXT FROM MyTableCursor INTO @TableName
END
CLOSE MyTableCursor
DEALLOCATE MyTableCursor
This will generate us a number of statements that look kinda like this:
We save this script and continue with...
Step 2 - Script all indexes
There are several ways to do this (even via the SSMS GUI... you don't wanna do this via the SSMS GUI unless you need to get the girl/guy doing her/his internship of your back for... days... weeks... depending on the size of your DB - or the duration of her/his internship). I found the method suggested by Kendra Little to be very elegant altough it has some draw backs (i.e. compression and LOB compaction are not scripted but this can easily be corrected via dbatools.io or via Ola Hallengren's scripts ) - so I used the script in her article to script all indexes into a #temptable. Since the SELECT INTO statement is not part of her article I post the adapted script here. The name of my #temptable that will be referred to during this post is ##t
/*****************************************************************************
MIT License, http://www.opensource.org/licenses/mit-license.php
Contact: help@sqlworkbooks.com
Copyright (c) 2018 SQL Workbooks LLC
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom
the Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
*****************************************************************************/
SELECT
DB_NAME() AS database_name,
sc.name + N'.' + t.name AS table_name,
(SELECT MAX(user_reads)
FROM (VALUES (last_user_seek), (last_user_scan), (last_user_lookup)) AS value(user_reads)) AS last_user_read,
last_user_update,
CASE si.index_id WHEN 0 THEN N'/* No create statement (Heap) */'
ELSE
CASE is_primary_key WHEN 1 THEN
N'ALTER TABLE ' + QUOTENAME(sc.name) + N'.' + QUOTENAME(t.name) + N' ADD CONSTRAINT ' + QUOTENAME(si.name) + N' PRIMARY KEY ' +
CASE WHEN si.index_id > 1 THEN N'NON' ELSE N'' END + N'CLUSTERED '
ELSE N'CREATE ' +
CASE WHEN si.is_unique = 1 then N'UNIQUE ' ELSE N'' END +
CASE WHEN si.index_id > 1 THEN N'NON' ELSE N'' END + N'CLUSTERED ' +
N'INDEX ' + QUOTENAME(si.name) + N' ON ' + QUOTENAME(sc.name) + N'.' + QUOTENAME(t.name) + N' '
END +
/* key def */ N'(' + key_definition + N')' +
/* includes */ CASE WHEN include_definition IS NOT NULL THEN
N' INCLUDE (' + include_definition + N')'
ELSE N''
END +
/* filters */ CASE WHEN filter_definition IS NOT NULL THEN
N' WHERE ' + filter_definition ELSE N''
END +
/* with clause - compression goes here */
CASE WHEN row_compression_partition_list IS NOT NULL OR page_compression_partition_list IS NOT NULL
THEN N' WITH (' +
CASE WHEN row_compression_partition_list IS NOT NULL THEN
N'DATA_COMPRESSION = ROW ' + CASE WHEN psc.name IS NULL THEN N'' ELSE + N' ON PARTITIONS (' + row_compression_partition_list + N')' END
ELSE N'' END +
CASE WHEN row_compression_partition_list IS NOT NULL AND page_compression_partition_list IS NOT NULL THEN N', ' ELSE N'' END +
CASE WHEN page_compression_partition_list IS NOT NULL THEN
N'DATA_COMPRESSION = PAGE ' + CASE WHEN psc.name IS NULL THEN N'' ELSE + N' ON PARTITIONS (' + page_compression_partition_list + N')' END
ELSE N'' END
+ N')'
ELSE N''
END +
/* ON where? filegroup? partition scheme? */
' ON ' + CASE WHEN psc.name is null
THEN ISNULL(QUOTENAME(fg.name),N'')
ELSE psc.name + N' (' + partitioning_column.column_name + N')'
END
+ N';'
END AS index_create_statement,
si.index_id,
si.name AS index_name,
partition_sums.reserved_in_row_GB,
partition_sums.reserved_LOB_GB,
partition_sums.row_count,
stat.user_seeks,
stat.user_scans,
stat.user_lookups,
user_updates AS queries_that_modified,
partition_sums.partition_count,
si.allow_page_locks,
si.allow_row_locks,
si.is_hypothetical,
si.has_filter,
si.fill_factor,
si.is_unique,
ISNULL(pf.name, '/* Not partitioned */') AS partition_function,
ISNULL(psc.name, fg.name) AS partition_scheme_or_filegroup,
t.create_date AS table_created_date,
t.modify_date AS table_modify_date into ##t
FROM sys.indexes AS si
JOIN sys.tables AS t ON si.object_id=t.object_id
JOIN sys.schemas AS sc ON t.schema_id=sc.schema_id
LEFT JOIN sys.dm_db_index_usage_stats AS stat ON
stat.database_id = DB_ID()
and si.object_id=stat.object_id
and si.index_id=stat.index_id
LEFT JOIN sys.partition_schemes AS psc ON si.data_space_id=psc.data_space_id
LEFT JOIN sys.partition_functions AS pf ON psc.function_id=pf.function_id
LEFT JOIN sys.filegroups AS fg ON si.data_space_id=fg.data_space_id
/* Key list */ OUTER APPLY ( SELECT STUFF (
(SELECT N', ' + QUOTENAME(c.name) +
CASE ic.is_descending_key WHEN 1 then N' DESC' ELSE N'' END
FROM sys.index_columns AS ic
JOIN sys.columns AS c ON
ic.column_id=c.column_id
and ic.object_id=c.object_id
WHERE ic.object_id = si.object_id
and ic.index_id=si.index_id
and ic.key_ordinal > 0
ORDER BY ic.key_ordinal FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'),1,2,'')) AS keys ( key_definition )
/* Partitioning Ordinal */ OUTER APPLY (
SELECT MAX(QUOTENAME(c.name)) AS column_name
FROM sys.index_columns AS ic
JOIN sys.columns AS c ON
ic.column_id=c.column_id
and ic.object_id=c.object_id
WHERE ic.object_id = si.object_id
and ic.index_id=si.index_id
and ic.partition_ordinal = 1) AS partitioning_column
/* Include list */ OUTER APPLY ( SELECT STUFF (
(SELECT N', ' + QUOTENAME(c.name)
FROM sys.index_columns AS ic
JOIN sys.columns AS c ON
ic.column_id=c.column_id
and ic.object_id=c.object_id
WHERE ic.object_id = si.object_id
and ic.index_id=si.index_id
and ic.is_included_column = 1
ORDER BY c.name FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'),1,2,'')) AS includes ( include_definition )
/* Partitions */ OUTER APPLY (
SELECT
COUNT(*) AS partition_count,
CAST(SUM(ps.in_row_reserved_page_count)*8./1024./1024. AS NUMERIC(32,1)) AS reserved_in_row_GB,
CAST(SUM(ps.lob_reserved_page_count)*8./1024./1024. AS NUMERIC(32,1)) AS reserved_LOB_GB,
SUM(ps.row_count) AS row_count
FROM sys.partitions AS p
JOIN sys.dm_db_partition_stats AS ps ON
p.partition_id=ps.partition_id
WHERE p.object_id = si.object_id
and p.index_id=si.index_id
) AS partition_sums
/* row compression list by partition */ OUTER APPLY ( SELECT STUFF (
(SELECT N', ' + CAST(p.partition_number AS VARCHAR(32))
FROM sys.partitions AS p
WHERE p.object_id = si.object_id
and p.index_id=si.index_id
and p.data_compression = 1
ORDER BY p.partition_number FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'),1,2,'')) AS row_compression_clause ( row_compression_partition_list )
/* data compression list by partition */ OUTER APPLY ( SELECT STUFF (
(SELECT N', ' + CAST(p.partition_number AS VARCHAR(32))
FROM sys.partitions AS p
WHERE p.object_id = si.object_id
and p.index_id=si.index_id
and p.data_compression = 2
ORDER BY p.partition_number FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'),1,2,'')) AS page_compression_clause ( page_compression_partition_list )
WHERE
si.type IN (0,1,2) /* heap, clustered, nonclustered */
ORDER BY table_name, si.index_id
OPTION (RECOMPILE);
GO
The "into ##t" is in Line 87
So now we have our Indexes scripted into ##t we carry out a "SELECT * FROM ##t" and get sth. like this:
Step 3 - generate the DROP INDEX scripts
Next thing we do is we create our "DROP INDEX" scripts for the NC indexes
select N'drop index [' + index_name +'] on ' + table_name from ##t where index_id > 1
And the "ALTER TABLE DROP CONSTRAINT" scripts for the PKs
select N'alter table ' + table_name +' drop constraint ' + index_name from ##t where index_id = 1
And we get sth. like
respectively
So now we have our scripts we can start with the actual change. There are 2 things you should do at this point.
For Bugs Bunny's sake - take a backup
Change the Recovery Model to SIMPLE in order to avoid full transaction logs
At this point you should also prepare the "CREATE INDEX" resp. the "ALTER TABLE ADD CONSTRAINT" statements from our temp table as the connections to the DB will be broken after changing the DB collation.
Execute the following query to script the PKs (just copy the contents of the create_index_statement column):
select * from ##t where index_id = 1
Execute the following query to scripts all NC idx (again: the create_index_statement column)
select * from ##t where index_id > 1
Step 4 - Change the collation of the database
alter database [mydb] set single_user with rollback immediate
go
USE [master]
GO
ALTER DATABASE [mydb] COLLATE Latin1_General_CS_AS
GO
GO
alter database [mydb] set multi_user
This will assure that all new objects are created with the new collation
Step 5 - Delete all indexes
Execute the scripts you generated in Step 3 to delete all indexes. Make sure that you fist delete the NC indexes and then the PKs.
Step 6 - Run the ALTER TABLE statements
Execute the scripts generated in Step 1
Step 7 - Recreate the indexes
First create the PKs. Then create the NC indexes.
Esp. for the the PKs it is a good idea to seperate the statements with the batch seperator (normally this is GO but SSMS lets you define your own - in case you want your fellow DBAs in your company to hate you go to Tools --> Options --> Query Execution --> SQL Server --> General and
Run the scripts generated in Step 3 to create the PKs
Then create the NC indexes.
and that's basically it. Use this manual at your own risk - and have fun.
Nice. A few minor things could be improved, however. For instance, due to the implementation by Microsoft, the cursor loop will sometimes script ALTER TABLE statements for objects which are in fact VIEWS. (Yep, MS moves in mysterious ways.) Can easily be remedied by changing the code to something like this: declare MyTableCursor CURSOR FOR select TABLE_NAME from INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE <> 'VIEW'; -- or maybe you want to change it to WHERE TABLE_TYPE = 'BASE TABLE';
Very good. When trying to convert a db to a utf8 collation, there are problems with columns of type text though... and it seems as if unique constraints can't be droped the way the script suggests. but yes, thats about 4-5 years after the publication of this really helpful artcile. thanks.