fbpx

TALLY TABLES IN T-SQL

วิธีการรับมือกับการเปลี่ยนแปลงในการงาน และ ชีวิต

TALLY TABLES IN T-SQL

The Tally (or Numbers) table is one of the most useful constructs in intermediate-level SQL.  Have you ever written a T-SQL WHILE loop?  How about a CURSOR?  If your answer to either of those questions is yes, you’ve probably been coding some really poor-performing SQL and you may not even realize it.  Tally tables are one way that you may be able to avoid using a loop or a CURSOR, instead creating a true set-based solution that is blazingly fast and reliable.

TALLY TABLES IN SQL 2000

The ability to create a Tally table has been around for a long time, going way back to SQL Server’s roots in Sybase.  One of the most common methods of creating a permanent table is:

SELECT TOP 1000000 N=IDENTITY(INT, 1, 1)
INTO dbo.Numbers
FROM sys.all_columns a CROSS JOIN sys.all_columns b;


ALTER TABLE dbo.Numbers ADD CONSTRAINT NBR_pk PRIMARY KEY(N);


SELECT TOP 5 N FROM dbo.Numbers;

Obviously, in order to run this you need access to the sys table (all_columns) that is referenced.  Most of the time this particular table has at least 4,000 rows in it so constructing a Tally table with 1,000,000 (or even 10,000,000) rows is no problem at all.  Our Tally table will start with 1 (results below are from the final SELECT).

N
1
2
3
4
5

If you create a permanent Tally table, it’s important to have a CLUSTERED INDEX (the PRIMARY KEY) on the single column that’s in the table for performance reasons.  We’ll show some examples of unique ways that you can use a Tally table shortly, but first it’s good to know how to create them.  The example above works in SQL Server 2000.

Here’s another Tally table that can generate N’s between 0 and 2047 and can be used in SQL Server 2000.

SELECT N=number
FROM master..spt_values
WHERE type = 'P';

There are those that might be asking why in the world would you use an undocumented system table like this?  Well, rest assured that this table is used internally by many of SQL’s internal stored procedures, so it is not going away any time soon.

TALLY TABLES IN SQL 2005

When SQL Server 2005 came along and introduced the Common Table Expression (CTE) and the ROW_NUMBER() window function, it opened up some new ways to create what are known as in-line Tally tables.  Let’s look at a couple of examples.

WITH Tally (N) AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
SELECT TOP 5 N
FROM Tally;


-- Or if you don't have access to the sys tables use an in-line
-- Tally table known as a "Ben-Gan" style Tally
WITH lv0 AS (SELECT 0 x UNION ALL SELECT 0)
    ,lv1 AS (SELECT 0 x FROM lv0 a CROSS JOIN lv0 b) -- 4
    ,lv2 AS (SELECT 0 x FROM lv1 a CROSS JOIN lv1 b) -- 16
    ,lv3 AS (SELECT 0 x FROM lv2 a CROSS JOIN lv2 b) -- 256
    ,lv4 AS (SELECT 0 x FROM lv3 a CROSS JOIN lv3 b) -- 65,536
    ,lv5 AS (SELECT 0 x FROM lv4 a CROSS JOIN lv4 b) -- 4,294,967,296
    ,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5)
SELECT TOP (5) n
FROM Tally
ORDER BY n;

The second Tally table structure was first suggested by SQL MVP Itzik Ben-Gan.

Never mind how they work, just know that the first will generate somewhere around 16M+ rows and the other will create a little over 4 billion rows (!) if we did not apply the TOP 10.  The second one is a case of “stacked” or “cascaded” CTEs.  Both are extremely fast; the second one especially because it is done entirely in memory.

TALLY TABLES IN SQL 2008

SQL Server 2008 introduced something that’s known as a Table Value Constructor (TVC), which offers a very compact way to create a Tally table tailored specifically to the precise number of rows you need without using a TOP clause.  Here’s an example of that:

WITH Tally (n) AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
)
SELECT *
FROM Tally;

That Tally table generates exactly 1,000 rows because each VALUES TVC has exactly 10 zeroes in it.  We’ll now look at just a few of the remarkable ways that a Tally table can improve your SQL by helping you to write a set-based solution where otherwise you might need to resort to a loop or CURSOR.

USING A TALLY TABLE TO EXPAND A ROW SET

Let’s start with some sample data:

CREATE TABLE #Temp
(
    ID          INT IDENTITY PRIMARY KEY
    ,StartDT    DATETIME
);

INSERT INTO #Temp
SELECT '2014-02-18 09:20' UNION ALL SELECT '2014-02-19 05:35';


SELECT ID, StartDT
    ,TT=DATEPART(hour, DATEADD(hour, DATEDIFF(hour, 0, StartDT), 0))
    ,TD=DATEADD(day, DATEDIFF(day, 0, StartDT), 0)
FROM #Temp;

The results returned are this, where DATEADD(hour, DATEDIFF(hour, 0, StartDT), 0) is a nifty way to truncate the DATETIME value to the nearest hour (or use day instead to truncate to the current day).  Here are the results from the above query:

ID   StartDT                  TT   TD
1    2014-02-18 09:20:00.000  9    2014-02-18 00:00:00.000
2    2014-02-19 05:35:00.000  5    2014-02-19 00:00:00.000

Suppose what we want is 10 rows that correspond to ID=1, where each is incremented to the next hour of the day (starting at midnight).  Likewise, for ID=2 we want 6 rows.  We can use a small Tally table for this (remember there’s only 24 hours in a day):

WITH Tally (N) AS
(
    -- Tally table starting at 0
    SELECT 0 UNION ALL
    -- Now 24 more rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0)) c(n)
)
SELECT ID, StartDT
    ,TD=DATEADD(hour, N, DATEADD(day, DATEDIFF(day, 0, StartDT), 0))
FROM #Temp
CROSS JOIN Tally
WHERE N BETWEEN 0 AND DATEPART(hour, DATEADD(hour, DATEDIFF(hour, 0, StartDT), 0))
ORDER BY ID, TD;

Note that this Tally table has been modified to start at 0.  The results of this query are:

ID  StartDT   TD
1   2014-02-18 19:20:00.000   2014-02-18 00:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 01:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 02:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 03:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 04:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 05:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 06:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 07:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 08:00:00.000
1   2014-02-18 19:20:00.000   2014-02-18 09:00:00.000
2   2014-02-19 15:35:00.000   2014-02-19 00:00:00.000
2   2014-02-19 15:35:00.000   2014-02-19 01:00:00.000
2   2014-02-19 15:35:00.000   2014-02-19 02:00:00.000
2   2014-02-19 15:35:00.000   2014-02-19 03:00:00.000
2   2014-02-19 15:35:00.000   2014-02-19 04:00:00.000
2   2014-02-19 15:35:00.000   2014-02-19 05:00:00.000

Wasn’t that easy and really cool?

USING A TALLY TABLE TO REMOVE REPEATED CHARACTERS IN A STRING

There are a lot of situations in T-SQL where you need to parse a string.  Many of the built-in functions that T-SQL provides stop at the first occurrence of something.  But what if you need to perform the same operation on many characters within the string and a built-in function simply won’t do it for you?
This is where a Tally table can really save you.  We’ll start with one that has 8000 rows so we can parse each character out of VARCHAR(8000) length string.  Follow the comments in the code to see how it works.

DECLARE @S VARCHAR(8000) = 'Aarrrgggh!';


WITH Tally (N) AS
(
    -- 8000 rows (max length of the VARCHAR string)
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) e(n)
)
SELECT n, s
FROM
(
    -- Always choose the first element
    SELECT n=1, s=LEFT(@S, 1) UNION ALL
    -- Include each successive next element as long as it’s different than the prior
    SELECT N, CASE
               WHEN SUBSTRING(@S, N-1, 1) <> SUBSTRING(@S, N, 1)
               THEN SUBSTRING(@S, N, 1)
               -- Repeated characters are assigned NULL by the CASE
              END
    FROM Tally
    WHERE N BETWEEN 2 AND LEN(@S)
) a
-- Now we filter out the repeated elements
WHERE s IS NOT NULL;

The results displayed for this are below.  Note that it works just fine with strings of length 1 or 0 also.

n   s
1   A
3   r
6   g
9   h
10  !

It is also case-insensitive, so it removes the lowercase a (second character) because it is a repeat of the first character (capital A).  It could be made case sensitive if necessary.  Now all we need to do is combine the rows that remain back into our final string using a technique that is calledCreating a Comma-separated List by author is Microsoft Certified Master (MCM) Wayne Sheffield.  It just needs a slight modification to not include the comma separator and it can be applied to this case as follows.

WITH Tally (N) AS
(
    -- 8000 rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) e(n)
)
SELECT OriginalString=@S, NewString=
(
    SELECT '' + s
    FROM
    (
        -- Same base query as the prior example starts here
        SELECT n, s
        FROM
        (
            -- Always choose the first element
            SELECT n=1, s=LEFT(@S, 1) UNION ALL
            -- Include each successive next element as long as it's
            -- different than the prior
            SELECT N, CASE
                       WHEN SUBSTRING(@S, N-1, 1) <> SUBSTRING(@S, N, 1)
                       THEN SUBSTRING(@S, N, 1)
                       -- Repeated characters are assigned NULL by the CASE
                      END
            FROM Tally
            WHERE N BETWEEN 2 AND LEN(@S)
        ) a
        -- Now we filter out the repeated elements
        WHERE s IS NOT NULL
    ) a
    ORDER BY n
    FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(8000)');

Now our result is:

OriginalString  NewString
Aarrrgggh!      Argh!

That is very fast and way cool, so maybe we want to save this as a utility FUNCTION.  SQL has many types of User-defined FUNCTIONs, e.g., scalar, table-valued, etc.  Scalar-valued functions are known to be total performance nightmares.  By far the fastest type of FUNCTION if you can write it this way is to use what’s known as an inline Table Valued Function (iTVF).  We can do this for our new utility FUNCTION.

CREATE FUNCTION dbo.RemoveRepeatedChars
(
    @S VARCHAR(8000)
)
RETURNS TABLE WITH SCHEMABINDING
RETURN


WITH Tally (N) AS
(
    -- 8000 rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) e(n)
)
SELECT OriginalString=@S, NewString=
(
    SELECT '' + s
    FROM
    (
        SELECT n, s
        FROM
        (
            -- Always choose the first element
            SELECT n=1, s=LEFT(@S, 1) UNION ALL
            -- Include each successive next element as long as it's
            -- different than the prior
            SELECT N, CASE
                       WHEN SUBSTRING(@S, N-1, 1) <> SUBSTRING(@S, N, 1)
                       THEN SUBSTRING(@S, N, 1)
                       -- Repeated characters are assigned NULL by the CASE
                      END
            FROM Tally
            WHERE N BETWEEN 2 AND LEN(@S)
        ) a
        -- Now we filter out the repeated elements
        WHERE s IS NOT NULL
    ) a
    ORDER BY n
    FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(8000)');

An iTVF with SCHEMABINDING is the fastest possible FUNCTION we can write and it works for this case.  We can now call our function to parse as many strings as we need (like from a column in a table).

WITH SampleData (s) AS
(
    SELECT 'Aarrrgggh!'
    UNION ALL SELECT 'Noo repeaatting chharracterss wanntted.'
)
SELECT *
FROM SampleData
CROSS APPLY dbo.RemoveRepeatedChars(s);

CROSS APPLY is something we’ll cover in a future blog but we’ve provided some references at the end if you’d like to learn more now.  The results we get from the above script are:

OriginalString                           NewString
Aarrrgggh!                               Argh!
Noo repeaatting chharracterss wanntted.  No repeating characters wanted.

PARSING A COMMA SEPARATED LIST OF VALUES

A more common problem in SQL is the parsing of a comma-separated list of values.  We’ve saved the best for last, because in this case we can truly point you to a best-of-breed solution.

How many of you have created a SQL FUNCTION to split a delimited list of values?  Chances are you pulled down some tired old looping code from somewhere on the Internet that works but is dog slow.
You probably don’t even realize how bad it is.  Now is the time to locate it and replace it as what we’re about to tell you about will simply blow the doors off of any other SQL-based delimited string splitter!

SQL MVP Jeff Moden wrote the outstanding Tally OH! An Improved SQL 8K “CSV Splitter” Function (viewed more than 40,000 times as of this writing), which contains lots of outstanding information about Tally tables (another of his articles is referenced at the end), how they can be used to split a delimited string and the performance comparison with alternate solutions.  At the end of the article, you can download DelimitedSplit8K, optimized specifically for use with VARCHAR(8000) strings, so don’t try to modify it to VARCHAR(MAX).  I have provided a link to that article as over the years the FUNCTION has been revised and improved a few times, so if you go there you’ll be sure to get the most current version.

It also contains information on CLR-based splitters that can be even faster and handle the case of VARCHAR(MAX).

CONCLUSIONS

Let’s do a recap of all of the things we learned today:

  • The various ways to create a permanent or generate an in-line Tally table in SQL 2000, 2005 and 2008.
  • The formula for using DATEDIFF/DATEADD to truncate a date to just the day or to the hour.
  • How to expand a row set using a Tally table.
  • How to parse a string using a Tally table.
  • How to use the FOR XML PATH method to concatenate rows into a single character string.
  • How to create a high-performance, schema-bound, in-line Table Valued FUNCTION in SQL.
  • Where to find the fastest SQL-based delimited string splitter on the planet (DelimitedSplit8K).

We’ve also come away with two utility FUNCTIONs we can add to our SQL tool chest (see below for a third one):

  • RemoveRepeatedChars
  • DelimitedSplit8K

For further reading on CROSS APPLY (and its companion OUTER APPLY), here are two great articles by SQL MVP Paul White to get you started.

Further suggested reading: