T-SQL Tuesday 191 – String parsing (and why I hate intelligent keys)

T-SQL Tuesday 191 – String parsing (and why I hate intelligent keys)

T-SQL Tuesday logoWelcome to the delayed 191st T-SQL Tuesday. Our host is, once more, Steve Jones. This time, he asks us to write about our favorite string parsing. Which is an odd thing to ask: nobody ever said “I like doing string manipulation in T-SQL”. So instead of doing what Steve asked, I will provide some background, and they show some tricks that might make the task a bit less unpleasant.

Why intelligent keys are anything but intelligent

I am currently most known for my performance tuning and execution plan work. But when I started working with database, I actually came from a background of data modelling, database design, and normalization. And that has never fully left me. In fact, I have in the past two years created a whole series of YouTube videos about database design and normalization. And a much longer time ago, I recorded a Pluralsight course on this topic that is still available for viewing.

One of the very basics of schema design for a relational database is to store atomic values in every column. One column, one value, no more, no less. That automatically rules out all repeating groups. So a single column to list all my email addresses? Sorry. You are doing it wrong. And you will pay the price when you try to protect the integrity of your data. Or even just report on it.

But this atomicity rule also rules out composite values in a column. If your application needs to use surname and initials individually, then you must use two columns, one for the surname and another for the initials. That automatically prevent removes the necessity to write awkward string parsing in your code where you try to separate the two.

And that is why I have a thorough hatred of what is commonly known as “intelligent keys”. Granted, I can see why end users like the concept. For humans, identifying a subscription renewal by customer number, magazine code, and year is a lot easier than having to work with a randomly generated 8-digit number. Renewal “10451-PeMa-2025” is instantly recognizable as the 2025 subscription to People Magazine by customer 10451. Renewal “95023954” is just a meaningless number, and requires additional work to look up the customer, magazine, and year.

And just to be clear, I am fully okay with natural keys. In the example above, I would definitely declare a UNIQUE or PRIMARY KEY constraint on the columns for CustomerNumber, MagazineCode, and Year. But as three individual columns. And then I would, depending on the situation, decide whether or not to also add a generated and meaningless surrogate key – that should not be an automatic for each table, but decided on a case by case basis. But even if I add it, I would still keep the business key, and constrain it to ensure no violations ever happen.

The problems for the database start when people store this human-readable intelligent key as a single column in the database. And the problems then get worse when someone decides to save space in the database by now removing the CustomerID, MagazineCode, and Year columns from the Renewals table, “because they can be inferred from the RenewalKey”.

Dear developers. I get that you never had a course on database design. That is okay, you spent time learning things I cannot do. But would you fly a plane without training, or perform open heart surgery without the proper qualification? If you never learned how to design databases, then don’t Leave it to the professionals. Or learn the ropes first.

Dear developers. I get that your users like this intelligent key, this single value on their screen that includes all relevant information. But that’s presentation. We can present data in every way you can imagine, and a few you had never thought of. Presentation is done in the presentation layer.

Dear developers. Databases should store atomic values. “10451-PeMa-2025” is not atomic, it is three atomic values smashed together in a single field. Nothing wrong with this in a report, on a screen, or on a shipping label. But everything goes to hell in a handbasket if you store this as such in the database.

Parsing tricks

But of course, you, the reader of this blog, are a smart person. You would never commit such design sins. All your databases are perfectly normalized, with no composite column anywhere to be seen. You just suffer from other people’s issues. Bad designs from other departments. Code smells left by your predecessor. Third party products with a database schema that does not even warrant the name “design”. And now you have to deal with this. Here are some tricks that you might be able to use.

PARSENAME

The PARSENAME function is intended for a very specific function: to get the individual parts of a fully qualified object name in SQL Server. However, who says we are not allowed to abuse this function for other purposes?

You will need to work a bit on this. For starters, “10451-PeMa-2025” uses dashes as the separator. PARSENAME expects periods. So we need to replace these, and we need to be aware that we can now get errors if there are periods in any of the components. But if there are not, then a REPLACE will do the job just fine.

The second thing you need to do is to count the number of components. If there are more than four, PARSENAME can’t be used. In this case, there are three. That means that PARSENAME will assume that you specify database, schema, and object name. The object name (year) is last, but requires the value 1 for the object piece. In the same way, the MagazineCode would be object piece 2, and object piece 3 is then the CustomerID.

Finally, you need to be aware that, since PARSENAME thinks it works on SQL Server object names, it will remove [] or “” around the components. And if they are in the components but not fully around them, you might get NULL results. So basically, just don’t use PARSENAME if the intelligent key can contain any of the characters [, ], or “.

However, very often you will find that none of these potential obstacles apply. In that case, you can in fact use code such as the below to find the components of this intelligent key.

SELECT PARSENAME (REPLACE ('10451-PeMa-2025', '-', '.'), 1) AS Year;
SELECT PARSENAME (REPLACE ('10451-PeMa-2025', '-', '.'), 2) AS MagazineCode;
SELECT PARSENAME (REPLACE ('10451-PeMa-2025', '-', '.'), 3) AS CustomerID;

Fixed length

Of course, if we know that the length of each component of the intelligent key is always the same, then we can very easily use the SUBSTRING function. And for the first or last component, we can even shorten this to use LEFT or RIGHT.

SELECT SUBSTRING ('10451-PeMa-2025', 1, 5) AS Year;
SELECT SUBSTRING ('10451-PeMa-2025', 7, 4) AS MagazineCode;
SELECT SUBSTRING ('10451-PeMa-2025', 12, 4) AS CustomerID;

SELECT LEFT('10451-PeMa-2025', 5) AS CustomerID;
SELECT RIGHT('10451-PeMa-2025', 4) AS Year;

Now if you need even more complex stuff, such as for instance creating a new intelligent key where the MagazineCode is removed, but the customer code and year are still there (though I honestly don’t know why anyone would do that!!), then people often use the expressions above and concatenate them back together. But that is not needed. A much more intelligent solution is to use the STUFF function, which replaces part of a string (as indicated by start and end position) with a new replacement string … which can be empty!

SELECT STUFF ('10451-PeMa-2025', 7, 4, '') AS NewUnintelligentKey;

Much shorter than using a LEFT, a RIGHT, and then concatenating all of that together!

Varying length

But what if components can be of varying length? In our example, it is easy to imagine that the CustomerID might be just four positions for some customers, and the MagazineCode might be anything from three to seven characters. The year will of course always be four characters. But let’s for the sake of discussion assume that even that component can vary in length.

If you can use PARSENAME, then this does not matter. But there are restrictions to PARSENAME. So sometimes we need to use LEFT, RIGHT, SUBSTRING, and STUFF. But then, how do we find the correct starting position and/or length parameters to pass into these functions?

This is where the CHARINDEX comes in. But CHARINDEX only returns the first position it finds. If we need the second, then we need a second CHARINDEX, that starts searching after the first, by specifying a start position. But that start position then also requires a CHARINDEX call, and very soon, we get nightmarish constructions.

SELECT CHARINDEX ('-', '10451-PeMa-2025') AS FirstSeperator;
SELECT CHARINDEX ('-', '10451-PeMa-2025', CHARINDEX ('-', '10451-PeMa-2025') + 1) AS SecondSeperator;

Note that this is only to find the position of the first and second separator. To actually find the MagazineCode, we need to use these in a SUBSTRING expression such as this:

SELECT SUBSTRING ('10451-PeMa-2025',
                  CHARINDEX ('-', '10451-PeMa-2025') + 1,
                  CHARINDEX ('-',
                             '10451-PeMa-2025',
                             CHARINDEX ('-', '10451-PeMa-2025') + 1)
                - CHARINDEX ('-', '10451-PeMa-2025') - 1) AS MagazineCode;

And that is only for the second component! Imagine having an intelligent key that has seven components, and you need to extract the sixth!

Sometimes, you can simplify these expressions somewhat by using the REVERSE function. For instance, if you want just the Year, then these two expressions both give you that. I know which one I prefer!

SELECT RIGHT('10451-PeMa-2025', LEN ('10451-PeMa-2025')
                              - CHARINDEX ('-',
                                           '10451-PeMa-2025',
                                           CHARINDEX ('-', '10451-PeMa-2025') + 1)) AS Year;
SELECT RIGHT('10451-PeMa-2025', CHARINDEX ('-', REVERSE ('10451-PeMa-2025')) - 1) AS Year;

Regex?

Even with these tricks, string parsing tends to result in code that is ugly, long, hard to maintain, and error prone. So I guess it is a good thing that SQL Server 2025 will finally implement support for regular expressions.

Now, I really don’t like regex. Those expressions, too, are complex and hard to understand. And it will take me time to learn that syntax.

However, seeing the expressions above, which are even still relatively simple, makes the choice for me very easy. Don’t use intelligent keys, ever! But if that doesn’t work … then I will learn regex, just to avoid this code mess for future projects.

Black Friday 2025

Related Posts

No results found.

3 Comments. Leave new

  • I agree with the many issues with parsing intelligent keys. It becomes very painful when a join is done on the parsed value. However, I think it is acceptable to use an intelligent key provided that it is never parsed, which greatly limits the possible use cases. One that I am aware of is using an intelligent key for date (YYYYMMDD) and time (hhmmss) with calendar tables for date and time in a data warehouse. What do you think?

    Reply
  • As usual, Hugo, I agree with you. OTOH I did have design training. I might even dream in normal form. 😉

    I am also an advocate of atomic pretty much everything. Learning how to differentiate presentation from data management is something that has to be learned by all developers, database or otherwise.

    And you know I tend to read with a proofreader’s eye. I think the following is missing an important period between “don’t” and “Leave” (paragraph 8):
    “If you never learned how to design databases, then don’t Leave it to the professionals.”

    Reply
  • Thanks for this. I wish more people understood the value of database design!

    I’ve made use of PARSENAME a couple of times to break down strings, and it has proved to be effective.

    One option you make no mention of is STRING_SPLIT, which with the new ordinal value allows some neat tricks.

    select value as Year
    from (select value from string_split(‘10451-PeMa-2025’, ‘-‘, 1) where ordinal = 3) split

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close