Recursive SQL


SQL Tips

Add a Unique Constraint to a Column

CURSOR

DELETE

Dirty Reads and Phantom Reads

INSERT

Recursive SQL

ROW_NUMBER() - Automatic Paging

SELECT, FROM

SQL Default Date Format

UPDATE

WHERE

Sponsored Links

73058_New Scooba® 230 Floor Washing Robot + Free Shipping!

 

Recursive SQL

Here we look at how to make some Recursive SQL in SQL Server.

The issue I needed to solve was that I had a table of items. and each item could potentially have it's own parent. These item's were structured by parent and child into a hierarchy which could be many levels deep, and may change in the future. I needed something which could operate all a value, such as the item's id, during a select statement with multiple joins and other operations, and work out which item was it's top-most level parent as part of the overall select statement, with minimal coding and minimal performance hit. As I am predominantly a C# .NET developer, I wasn't too sure how to do this, but here I've put my journey on how I managed to achieve this outcome.

Common Table Expressions with Recursion

The first part of my adventure saw me finally learning hot to do CTEs, Common Table Expressions, as they have the ability to be recursive.

Common Table Expression (CTE)

Common Table Exressions were introduced in SQL 2005, and in their most basic form they act as a sort of temporary store of data, almost like a temporary table, though more like a table variable. You declare the Common Table Expression (CTE) using the 'WITH' keyword, provide a name for the CTE, and optionally the columns. With this you can then fill the CTE with the results of a select statement, such as here:


WITH myCTE ([ID]
    ,[TimeStamp]
    ,[Town]
    ,[City])

AS
(
    SELECT TOP 1000 [ID]
        ,[TimeStamp]
        ,[Town]
        ,[City]
        FROM [MyDatabase].[dbo].[MyTable]
)

SELECT * FROM myCTE

Recursive Common Table Expressions - Part 1

So that's a basic CTE, next in order to try to understand how recursion works in a CTE, I built a very basic version to help me start to understand what's happening.

To make a recursive Common Table Expression you need an 'anchor' point and a 'recursive' point (or member as they're known). So inside the 'WITH' statement, I have 2 Select statements, which are combined with a 'Union All', the first Select statement is the anchor member and the second select statement is the recursive member. In order to make it recurse around the CTE, you need to reference the CTE. I do this by selecting from the CTE in the recursive member.

Now the below example will actually throw an error, but we can see from the results that we are doing some sort of recursion:


with myCTE as
(
    select 'a' as 'letter'
    union all
    select letter from myCTE
)

select * from myCTE OPTION(MAXRECURSION 10)

The results of this query look like:


letter
a
a
a
a
a
a
a
a
a
a
a

OPTION (MAXRECURSION n)

Note the the use of the Query Hint 'MAXRECURSION' used on the Select statement when called on the CTE. This puts a limit on the amount of recursion, in this case 10, so the recursion will only happen 10 times, and once it is reached, the query will end and an error is returned. The default for MAXRECURSION is 100 (SQL Server 2008).

with myCTE as ( select 'a' as 'letter' union all select 'b' from myCTE) select * from myCTE OPTION(MAXRECURSION 10)

Recursive Common Table Expressions - Part 2

To delve into the recursion a little more, to see what is happening in a basic view, I've changed the query a little. The first select contains an 'a', and the second select contains a 'b'. Now we can see that the anchor appears once in our results, where as the recursive part is repeated in each recursion.


with myCTE as
(
    select 'a' as 'letter'
    union all
    select 'b' from myCTE)

select * from myCTE OPTION(MAXRECURSION 10)


letter
a
b
b
b
b
b
b
b
b
b
b

Recursive Common Table Expressions - Part 3

So now we can see how the recursion works, it's time to apply this method to some real table data.

Here we have two Select statements inside the CTE, joined by a 'UNION ALL'. The rows are part of a message structure. Messages can be sent, and replies are associated to the original message by use of a ParentID column. The first Select gets a set of rows where the the 'Parent' column 'IS NULL'. In this database this means that there is no parent to that message, therefore it must be a top level message. So basically we initially get a list of all the top level messages, or in otherwords, all the first messages to be sent, but none of the replies to those messages (as replies have a value for their ParentID).

Secondly after the 'UNION ALL', a second Select statement selects rows from the same database table and makes an 'INNER JOIN' onto the CTE where it's ParentID matches the MessageID of the first select statement. With that we are saying get a list of all the messages whose ParentID matches the MessageID from the messages which are all the initial messages in a conversation. These get joined into the results set in the CTE so eventually they will also be searched in the messages where their MessageID matches a ParentID from any other messages.

Also be careful on the 2nd Select, whether you are selecting columns from the CTE or from the table which it joins.


WITH myCTE ([Message_ID], [Subject], [Text], [First_Parent_Message_ID]) AS

(
    SELECT [Message_ID]
        ,[Subject]
        ,[Text]
        ,[First_Parent_Message_ID]
    FROM [MyDB].[dbo].[Messages] anchor
    WHERE First_Parent_Message_ID IS NULL

    UNION ALL

    SELECT recurse.[Message_ID]
        ,recurse.[Subject]
        ,recurse.[Text]
        ,recurse.[First_Parent_Message_ID]
    FROM [MyDB].[dbo].[Messages] recurse
    INNER JOIN myCTE cteAnchor
    ON recurse.First_Parent_Message_ID = cteAnchor.Message_ID
)

SELECT * FROM myCTE m
ORDER BY m.First_Parent_Message_ID desc, m.Subject desc, m.Message_ID

OPTION (MAXRECURSION 100)


Here is a sample of the results from this query:


19  2nd test  replying and replying   18
20  2nd test  hello  .............    18
21  2nd test  hello there  .......... 18
22  2nd test  hello   ............    18
23  2nd test  hello there  .......... 18
24  2nd test  reply now  ............ 18
25  2nd test  replied again  ........ 18

What this has done, is for each message, it has recursively worked it's way through the messages to calculate the MessageID of the most top level message related to that particular message. So if a conversation has about 10 messages, each one now shows the MessageID of the first message which started the conversation.

Now this finally achieved what I wanted, which was for every item, to work out it's highest level parent, which is done by looking at the parentID, and then if the message of that parent also has a ParentID, then getting that message and looking at it's ParentID,.... repeat, repeat until we get to the highest one, which doesn't have a ParentID. Result, however, there was a slight performance hit to do this. On about 50k rows, it was taking over 1 second every time, which is far too slow if you want to have a fast and responsive application. So after learning a little bit about recursive common table expressions, I moved on. Perhaps more experimentation with the WHERE clauses but time is an issue and I need an answer now.

Recursive Functions

Next I had a look at Recursive Functions in SQL Server, namely Table Valued functions.

This was alot easier to understand, basically we write a SQL Function, and to perform the recursion inside the function by calling the same function again whilst we are running in that function. You need to be careful not to produce an endlessly recursive function.


ALTER FUNCTION [dbo].[fnGetParentMessage]
(
    -- Add the parameters for the function here
    @MessageID int
)
RETURNS

@ParentMessageTable TABLE
(
    -- Add the column definitions for the TABLE variable here
    ParentID int
)

AS
BEGIN
-- Fill the table variable with the rows for your result set

    DECLARE @ParentID int
    SELECT @ParentID = [ParentID] FROM [MyDB].[dbo].[Messages] a WHERE a.MessageID = @MessageID

    IF (@ParentID IS NOT NULL)
    BEGIN
        SELECT @ParentID = ParentID FROM dbo.fnGetParentMessage(@ParentID)
    END
    ELSE
    BEGIN
        SELECT @ParentID = @MessageID
    END

    INSERT @ParentMessageTable
    SELECT @ParentID

RETURN
END

So this function returns the ParentID, by looking at the ParentID column, and then calling itself to get the ParentID of the ParentID it had just found. Calling itself repeats the same process again and again until it finds a message which doesn't have a ParentID, indicating that it has now found the top level parent.