Monday, July 10, 2006

SQL - Recursive Cursor with tree structure data


This article aims for the same result but keeps your table as simple as possible. It also allows us to do some pretty neat things with recursive stored procedures. My examples try to cover both posts and the hierarchical numbering scheme from the question. It should work for any generic "expanding hierarchy" problem.

We'll start with a table that looks like this: *

=====================================================================

CREATE TABLE [Posts] (
[PostID] [int] IDENTITY (1, 1) NOT NULL ,
[Subject] [char] (50) NULL ,
[ParentID] [int] NULL ,
[PostSortKey] [datetime] NOT NULL
)
GO

ALTER TABLE [Posts] WITH NOCHECK
ADD CONSTRAINT [DF_Posts_SortKey] DEFAULT (getdate()) FOR [PostSortKey]
GO *

=====================================================================


The PostID column is our primary key for this table. Subject is the subject of the post. For simplicity sake I removed all other post related fields that were not absolutely necessary such as author and the body of the post. ParentID is the PostID of the post for the parent. If this post is a top level post, then this field will be zero. PostSortKey is the sort order for the posts. This will usually be a datetimefield. In my example here, I used a default of GETDATE() to populate the field. In the specific example of this question it could be whatever the user wanted.

My example will use a dataset that looks like this: *

PostID Subject ParentID PostSortKey
1 First Post 0 2000-12-06 20:54:29.407
2 Second Post 0 2000-12-06 20:54:29.407
3 Child of First Post 1 2000-12-06 20:54:29.407
4 Child of Second Post 2 2000-12-06 20:54:29.407
5 Child of First Child 3 2000-12-06 20:54:29.407
6 Another FP Child 1 2000-12-06 21:04:49.217
7 Smallest Kid 5 2000-12-06 21:18:49.203
8 Another Munchkin 3 2000-12-06 21:28:22.040
*

Populating a table like this should be very easy. You just need to insert a record with its parent ID.

This example is composed of two pieces of code. The first is a SQL Script to create a temporary table and call the stored procedure. It looks like this: *

===========================================================


Create Table #NestedPosts (
SortID int IDENTITY (1,1), PostID int, PostKey varchar(200), PostLevel int
)*
*

exec getchildren 0, 1, ''

*

SELECT
SortID, P.PostID, ParentID,
PostLevel,
PostKey = LEFT(PostKey, 10),
PostSortKey = convert(varchar(19), PostSortKey, 120),
Subject = LEFT( SPACE( (PostLevel-1) * 2 ) + Subject , 40)
FROM
#NestedPosts N JOIN Posts P
ON N.PostID = P.PostID
Order by
SortID

*

DROP TABLE #NestedPosts

===========================================================


The temporary table is populated by the stored procedure. The final SELECT prints out the results. Next is the stored procedure. It looks like this: *

===========================================================

CREATE PROC GetChildren
(
@ParentID int, @PostLevel int, @ParentPostKey varchar(200)
)
AS

BEGIN

SET NOCOUNT ON
DECLARE @NextLevel int, @Counter int, @PostKey varchar(200)
SET @Counter = 1

-- Build a cursor to loop through all the kids of this post

DECLARE c1 CURSOR LOCAL
FOR SELECT PostID FROM Posts
WHERE ParentID = @ParentID Order by PostSortKey ASC

OPEN c1

FETCH NEXT FROM c1 INTO @ParentID

WHILE @@FETCH_STATUS = 0

BEGIN
-- Build a key up as we go
IF @PostLevel = 1
SET @PostKey = convert(varchar, @Counter)
ELSE
SET @PostKey = @ParentPostKey + '.' + convert(varchar, @Counter)

-- Put this record in the temp table
INSERT #NestedPosts (PostID, PostKey, PostLevel)
VALUES (@ParentID, @PostKey, @PostLevel)

SET @NextLevel = @PostLevel + 1

-- Process all the children for this post
EXEC GetChildren @ParentID, @NextLevel, @PostKey

SET @Counter = @Counter + 1

-- And get the next record at this level

FETCH NEXT FROM c1 INTO @ParentID
END

CLOSE c1
DEALLOCATE c1
SET NOCOUNT OFF
END

===========================================================

*
This stored procedure runs through all the child posts for a given parent post and puts them into the temp table. It also builds up the hierarchical numbering scheme that was asked about in the question. The really tricky part is that it calls itself in order to handle any child records of the current record it's processing. You can have up to 32 levels of nesting in your stored procedure.

That means this little sample is limited to 32 levels of posts. I also could have used a variable called @@NESTLEVEL which SQL Server uses to track how far down a recursion tree you are. I built and manually maintained a variable called @PostLevel for the task.

The output from the SELECT statement above looks like this: *

SortID PostID ParentID PostLevel PostKey PostSortKey Subject
1 1 0 1 1 2000-12-06 20:54:29 First Post
2 3 1 2 1.1 2000-12-06 20:54:29 Child of First Post
3 5 3 3 1.1.1 2000-12-06 20:54:29 Child of First Child
4 7 5 4 1.1.1.1 2000-12-06 21:18:49 Smallest Kid
5 8 3 3 1.1.2 2000-12-06 21:28:22 Another Munchkin
6 6 1 2 1.2 2000-12-06 21:04:49 Another FP Child
7 2 0 1 2 2000-12-06 20:54:29 Second Post
8 4 2 2 2.1 2000-12-06 20:54:29 Child of 2nd Post


* That's about all there is to this. Try it and let me know what you think.


No comments: