#2 - MS SQL Server bad practices (update)

#2 - MS SQL Server bad practices (update)

Hi. My name is Petr Starichenko. I'm Development DBA and it's my (i don't believe) second and I hope not the last article about SQL. I decided to start with the simplest theme - others people (of course not yours) bad code .

And our task for today is find the maximum from two(some) columns and rewrite it in our table.

CREATE TABLE TestTable(Date1 DATE,
                       Date2 DATE,
                       MaxDate DATE)

GO         

INSERT INTO TestTable(Date1, Date2, MaxDate)
VALUES ('19000101', '20120205', NULL),
       ('19000101', NULL, NULL),
       (NULL, '20120205', NULL),
       ('19000101', '20120205', '20120205'),
       ('20120205', '20120205', '20120205'),
       (NULL, NULL, NULL)  
GO 1000        

Today i saw such solution from "Alex":

UPDATE TestTable
SET MaxDate = CASE
                  WHEN Date1 > Date2 THEN Date1
                  ELSE Date2
              END        

What do you think about it? Maybe you have some propositions? Do you see any problems in this code (I specially added some examples)?

  1. First of all this code does not work as expected. You can run it and see that we have a 'bad' row: (1900-01-01, NULL, NULL) and after updated it stays the same. It's because Date2 is NULL and we have UNKNOWN (read FALSE) in comparison Date1 > Date2 because comparisons between NULL value and any other value, return UNKNOWN because the value of each NULL is UNKNOWN. So we go to 'ELSE' branch and take Date2 as a result (and Date 2 is NULL). So when you have nullable column you must be very careful in comparisons.
  2. We update every row in our table. In our example it's not so scary, but for large tables it is a serious problem. Because Table Scan (Clustered Index Scan) it's just a small part of the cost (for my production table 10%. Here we have 16%). And for every UPDATE SQL Server must add data to The Transaction Log (it costs time and space) and check(change) value in table and in all indexes that contain column which we updated.

EDIT 2022-09-24

I forgot about indexes (for every index with your column SQL Server check the index (should it move your row?) and storing additional data with Snapshot Isolation in tempdb (in table's database in 2019)

END EDIT

No alt text provided for this image

I wrote it in that way, where we update only the necessary rows:

UPDATE t
SET MaxDate = CASE
                  WHEN Date1 > ISNULL(Date2, '19000101') THEN Date1
                  ELSE Date2
              END
FROM TestTable tt
WHERE (Date1 IS NOT NULL
       OR Date2 IS NOT NULL)
  AND (tt.MaxDate IS NULL
       OR tt.MaxDate < Date1
       OR tt.MaxDate < Date2)

        
No alt text provided for this image

If we have a lot of columns to find the MAX(MIN) I usually use CROSS APPLY because "CASE WHEN" with 3 or more columns looks bulky and unreadable :

UPDATE tt
SET MaxDate = new.MaxDate
FROM TestTable tt
CROSS APPLY
(
	SELECT MAX(v) MaxDate
	FROM (VALUES (Date1), (Date2)) as t(v) 
) new
WHERE (Date1 IS NOT NULL OR Date2 IS NOT NULL) 
	AND (tt.MaxDate IS NULL OR tt.MaxDate < new.MaxDate)         

Maybe in the future I will add an article about NULL value and how to work with nullable columns. Some weak spots (NOT IN for example)

If you have any questions please feel free to contact me

#PetrStar #sql #mssqlserver #update #badpractices

To view or add a comment, sign in

More articles by Petr Starichenko

Others also viewed

Explore content categories