cancel
Showing results for 
Search instead for 
Did you mean: 

How do I build a result set of a tree structure in a depth-first search?

VolkerBarth
Contributor
9,258

Say, I have a classical self-referencing table to build a tree structure, such as the table "Employees" in the SQL Anywhere demo database. There is the typical ManagerID column that links the current employee to its superior.

The docs do contain a sample how to list all employees according to their hierarchie rank and their names. This is (of course) done with a recursive common table expression.

Breck has a similar sample in his blog (and book) - which I borrow from here as the number of employees is much smaller, and so it's easier to understand (confine the tree chart on his page):

CREATE TABLE employee ( 
   employee_id  INTEGER NOT NULL,
   manager_id   INTEGER NOT NULL REFERENCES employee ( employee_id ),
   name         VARCHAR ( 20 ) NOT NULL,
   salary       NUMERIC ( 20, 2 ) NOT NULL,
   PRIMARY KEY ( employee_id ) );

INSERT INTO employee VALUES ( 1,  1,  'Ainslie',  1000000.00 );
INSERT INTO employee VALUES ( 2,  1,  'Briana',    900000.00 );
INSERT INTO employee VALUES ( 3,  1,  'Calista',   900000.00 );
INSERT INTO employee VALUES ( 4,  1,  'Delmar',    900000.00 );
INSERT INTO employee VALUES ( 5,  2,  'Electra',   750000.00 );
INSERT INTO employee VALUES ( 6,  3,  'Fabriane',  800000.00 );
INSERT INTO employee VALUES ( 7,  3,  'Genevieve', 750000.00 );
INSERT INTO employee VALUES ( 8,  4,  'Hunter',    800000.00 );
INSERT INTO employee VALUES ( 9,  6,  'Inari',     500000.00 );
INSERT INTO employee VALUES ( 10, 6,  'Jordan',    100000.00 );
INSERT INTO employee VALUES ( 11, 8,  'Khalil',    100000.00 );
INSERT INTO employee VALUES ( 12, 8,  'Lisette',   100000.00 );
INSERT INTO employee VALUES ( 13, 10, 'Marlon',    100000.00 );
INSERT INTO employee VALUES ( 14, 10, 'Nissa',     100000.00 );

A query to list all employees in order of their hierarchive level (i.e. as a breadth-first search) can be written as such (as a simplified version of Breck's sample):

WITH RECURSIVE employee_hierarchie
   ( level,
     manager_id,
     employee_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
            e.manager_id     AS manager_id,
            e.employee_id    AS employee_id,
            e.name           AS name
       FROM employee e where e.employee_id = e.manager_id
     UNION ALL
     SELECT h.level + 1,
            e.manager_id,
            e.employee_id,
            e.name
       FROM employee_hierarchie h
            INNER JOIN employee e
                    ON h.employee_id = e.manager_id
      WHERE e.employee_id <> e.manager_id)
SELECT *
  FROM employee_hierarchie
  ORDER BY level, employee_id;

This gives the following result:

level,manager_id,employee_id,name
1,1,1,Ainslie
2,1,2,Briana
2,1,3,Calista
2,1,4,Delmar
3,2,5,Electra
3,3,6,Fabriane
3,3,7,Genevieve
3,4,8,Hunter
4,6,9,Inari
4,6,10,Jordan
4,8,11,Khalil
4,8,12,Lisette
5,10,13,Marlon
5,10,14,Nissa

QUESTION:

However, how do I write a query to return a result set that is ordered in a depth-first manner, i.e. each superior should be followed by its direct and indirect subordinates in recursive order?

I.e. the result set for this sample should look like:

level,manager_id,employee_id,name
1,1,1,Ainslie
2,1,2,Briana
3,2,5,Electra
2,1,3,Calista
3,3,6,Fabriane
4,6,9,Inari
4,6,10,Jordan
5,10,13,Marlon
5,10,14,Nissa
3,3,7,Genevieve
2,1,4,Delmar
3,4,8,Hunter
4,8,11,Khalil
4,8,12,Lisette

Accepted Solutions (0)

Answers (4)

Answers (4)

VolkerBarth
Contributor

This is an attempt to build such a query by means of a particular "sort_string" column. It does work, however I guess there should be better approaches.

Basically, for each level, I concatenate the "order" of the parent node and the relative order of the current child, by means of using ROW_NUMBER() over the according partition, separated by a dash:

WITH RECURSIVE employee_hierarchie
   ( level,
     sort_string,
     manager_id,
     employee_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
            CAST (ROW_NUMBER() OVER (ORDER BY e.employee_id) AS LONG VARCHAR) AS sort_string,
            e.manager_id     AS manager_id,
            e.employee_id    AS employee_id,
            e.name           AS name
       FROM employee e where e.employee_id = e.manager_id
     UNION ALL
     SELECT h.level + 1,
            h.sort_string || '-' || CAST (ROW_NUMBER() OVER (PARTITION BY e.manager_ID ORDER BY e.employee_id) AS LONG VARCHAR),
            e.manager_id,
            e.employee_id,
            e.name
       FROM employee_hierarchie h
            INNER JOIN employee e
                    ON h.employee_id = e.manager_id
      WHERE e.employee_id <> e.manager_id)
SELECT sort_string, level, manager_id, employee_id, name
  FROM employee_hierarchie
 ORDER BY sort_string;

This returns the following result (here including the "sort_string" for clarity:


Re How do I build a result set of a tree structure in a depthfirst search

I'd like to note that the ROW_NUMBER() approach makes it possible to use any kind of ordering within each level, i.e. it could easily be modified to order by name/salary/whatever.

CAVEAT: It should be noted that the sort_string is sorted lexically, so for more than 9 children, the row numbers need to be formatted in a way to make it numerically sortable, such as adding a fitting number of leading zeroes. This could be done by turning the basic queries including the ROW_NUMBER() in derived queries ones and then by using something like "REPEAT('0', <maxlevel> - LENGTH(row_nr)) || row_nr" to build a fixed-size row-number.


As stated above, this approach does work, but I hope you have better suggestions!

VolkerBarth
Contributor
0 Kudos

For a discussion on how to add "left-padded numbers", see this FAQ...

Former Member

Not a new approach and i fear there isn't a really better solution for depth first oder in SA, but i'm using a long binary instead of a long varchar column.

WITH RECURSIVE employee_hierarchie
( level, 
  sort_col, 
  manager_id, 
  employee_id,
  name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
        CAST (e.employee_id AS LONG BINARY) AS sort_col,
        e.manager_id     AS manager_id,
        e.employee_id    AS employee_id,
        e.name           AS name
   FROM employee e where e.employee_id = e.manager_id
 UNION ALL
 SELECT h.level + 1,
        h.sort_col + CAST (e.employee_id AS LONG BINARY),
        e.manager_id,
        e.employee_id,
        e.name
   FROM employee_hierarchie h
        INNER JOIN employee e
                ON h.employee_id = e.manager_id
  WHERE e.employee_id <> e.manager_id)
SELECT sort_col, level, manager_id, employee_id, name
FROM employee_hierarchie
ORDER BY sort_col;

There are only 4 bytes per level needed, you don't need padding and it's nevertheless readable. I couldn't get it working with row_number() btw. as i got some strange error messages, but that isn't needed in this example anyway.

Another idea would be storing the "node" order in the base table and ensuring it's correctness through insert/update/delete trigger. Could be worth it especially when you have a high read/write ratio.

VolkerBarth
Contributor
0 Kudos

Thanks, that's a reasonable solution, too, and it's easier to read.

The drawback might be the "fixed" ordering by the PK values (employee_id here), whereas the row_number() approach allows any kind of ordering within each level.

(FWIW, we have used the "concatenate PKs with a dash" sort_string in other applications as it makes it pretty easy to filter for particular PK values within the result set, such as "WHERE sort_string like '%-1234-%; something that's surely more difficult when PK values are concatenated without a delimiter...)

Former Member
0 Kudos

Yes you are right, it only makes sense to use numbers for the ordering when row_number isn't called. But as row_number should return numbers, it should be possible to use it with a cast, i just wasn't able to get it working in short time.

I don't see what prevents you from adding a delimiter btw. and byte_substr could also be used to search in a specific position.

I just copied your code from your answer and i get the same error: -921 Invalid recursion
Is it just me or is there something wrong with your ROW_NUMBER() usage? 12.0.1.3769 11.0.1.2837

VolkerBarth
Contributor
0 Kudos

Hm, strange thing, I had tested the code with 12.0.1.3726 and copied it unmodified from DBISQL. Now, running 12.0.1.3769, I get the same error as you, both in DBISQL and dbisqlc. - I'm gonna try with another box.

VolkerBarth
Contributor
0 Kudos

I justed tested again with 12.0.1.3726, and that does work, and varying the ORDER BY clause in the ROW_NUMBER() (as noted in my answer) does also work as expected - say, to replace "ORDER BY e.Employee_id" with "ORDER BY e.salary DESC"...


However, with 12.0.1.3769 it does not work but raises the -921 error. Now the question is: Is the new behaviour a bug or a bug fix?

Breck_Carter
Participant
0 Kudos

Sorting by employee_number is meaningless... it is just a fluke of the example that employees are numbered sequentially from left to right, top to bottom.

VolkerBarth
Contributor
0 Kudos

I asked this as just another question here:

http://sqlanywhere-forum.sap.com/questions/13725

Former Member
0 Kudos

I've seen this kind of recursive processing hammer database processing before. Look at it this way:

If most of the tree structure does not change, you are rebuilding the tree every time you access its nodes. In some processing that sucks up a lot of time.

I would love to see someone try this out: Manage the tree organization in an XML document, and work with the tree through that. The nodes in the XML document would have to map quickly to the rows in the tables. I'm not sure how efficiently that could be done. Could Sybase acquire a complete tree of information by "joining" XML nodes to SQL rows and do so faster than the tradition "rebuild the tree every time using recursion" approach?

I know that DB2 has full-on non-text XML DB structures built in. DB2 should be able to do something like this extremely fast. Tree based applications (as in semi-structured XML documents) would be a good reason for Sybase to introduce actual XML database structures. The world is becoming more oriented towards semi-structured data every day.

VolkerBarth
Contributor
0 Kudos

Well, in my case performance is not a problem, the according tables are not huge. And as there's not an implicit (i.e. static) ordering of child nodes, I guess a "store in desired order"-approach would not work at all. - Furthermore, the data is structured and does fit well in a RDMS. Using XML would not be of help here. Just my situation, apparently:)


Therefore, I'd suggest that you ask for a better XML integration as a separate question, tagged with "product-suggestion". AFAIK, product management does listen on this forum.

VolkerBarth
Contributor
0 Kudos

Not really unexpected: Breck has blogged in detail on these topics here:

Nevertheless, I'd personally prefer if the ordering of child nodes within its parent node could be done with the help of ROW_NUMBER() (or an adequate function) and would not require a particular user-defined function to do so, as in Breck's second blog article. However, as of 12.0.1.3769, the usage of ROW_NUMBER() as shown in my answer does not work anymore, as is discussed here. (Aside: I hope the discussion is still open...)