Hi,
I've developed a new algorithm to handle the hierarchy data. Hopefully the CMSMS team will consider this in the future release.
Basically I'm using a field named "counter" to keep track of the order of each node being visited using the pre-order traversal algorithm.The hierarchy number is generated in php script instead of fetching from database. After deleting or moving some nodes, the counter and rank are updated immediately. The counter update takes only 2 sqls and the rank updates will take O(level) sqls, where level is the maximum depth of the idhierarchy.
Currently CMSMS is using what I called an order path (0001.00002...) to keep track of each content node's order, using counter can avoid that as I discovered.
Using this method, we can easily apply paging effect to the content list, thus we can handle thousands and thousands of pages, which I think will lay the foundation for CMSMS to handle large website in the future.
A demo link is here: http://www.pamground.com/main/work/node ... /index.php
With my best,
Jim
A suggestion on hierarchy data structure in CMSMS
Re: A suggestion on hierarchy data structure in CMSMS
Jim,
Very interesting stuff. I'm actually looking to go in a totally different direction for 2.0. Have you seen the nested set model at all? http://dev.mysql.com/tech-resources/art ... -data.html for some examples.
Basically, the benefits that we get from that are the ability to grab a full "path" from trunk to branch in 1 query and it also allows us to know if a node has a child without any additional queries. I'm hoping that this will mean I have to preload less into memory on every page load and easy pull paths to nodes instead of loading all parents and parent's parents etc when you're 4 or 5 levels deep.
Either way is an improvement from the current system, that's for sure. You can obviously tell I winged it the first time.
Thanks!
Very interesting stuff. I'm actually looking to go in a totally different direction for 2.0. Have you seen the nested set model at all? http://dev.mysql.com/tech-resources/art ... -data.html for some examples.
Basically, the benefits that we get from that are the ability to grab a full "path" from trunk to branch in 1 query and it also allows us to know if a node has a child without any additional queries. I'm hoping that this will mean I have to preload less into memory on every page load and easy pull paths to nodes instead of loading all parents and parent's parents etc when you're 4 or 5 levels deep.
Either way is an improvement from the current system, that's for sure. You can obviously tell I winged it the first time.

Thanks!
Re: A suggestion on hierarchy data structure in CMSMS
Hi Ted,
Thanks for your compliment.
Yes, I've done a study on the nested set model. Personally, this article http://www.sitepoint.com/article/hierar ... a-database
I think is a better implementation of it. The one you mentioned use table alias, which might slow down the query time a bit. The sitepoint one use purely left and right property to do it, which I think is faster.
The reason why I develop my own one is, with purely left and right property, it seems that I can not easily compute the hierarchy number at eachh paging load, thus I use the id hierarchy alternative. What I think is, in the actual application, the idhierarchy will not be too long, 10 level is already considered so deep, most website or e-commerce system has around 5 levels. Let's say the average length of each id is 6 and we have 10 level idhierarchy, and we load 20 record per page, the memory consumption, in the worst case, will roughly be:
7*10*20 = 1400 (since we have the . separator, I add 1 on 6)
I think it's affordable.
But if I used associate array to hold the rank, I found that this will be much lower.
If we have an option to turn off the hierarchy number, we don't actually need to fetch the idhierarchy, whcih will significantly lower the memory consumption.
I'm on the way of thinking if there's a way to compute the hierarchy number easily with left,right and rank? Maybe you could give me some hints? The hierarchy number part is the key point, if this is solved, rest of them are solved.
With my best,
Jim
Thanks for your compliment.
Yes, I've done a study on the nested set model. Personally, this article http://www.sitepoint.com/article/hierar ... a-database
I think is a better implementation of it. The one you mentioned use table alias, which might slow down the query time a bit. The sitepoint one use purely left and right property to do it, which I think is faster.
The reason why I develop my own one is, with purely left and right property, it seems that I can not easily compute the hierarchy number at eachh paging load, thus I use the id hierarchy alternative. What I think is, in the actual application, the idhierarchy will not be too long, 10 level is already considered so deep, most website or e-commerce system has around 5 levels. Let's say the average length of each id is 6 and we have 10 level idhierarchy, and we load 20 record per page, the memory consumption, in the worst case, will roughly be:
7*10*20 = 1400 (since we have the . separator, I add 1 on 6)
I think it's affordable.
But if I used associate array to hold the rank, I found that this will be much lower.
If we have an option to turn off the hierarchy number, we don't actually need to fetch the idhierarchy, whcih will significantly lower the memory consumption.
I'm on the way of thinking if there's a way to compute the hierarchy number easily with left,right and rank? Maybe you could give me some hints? The hierarchy number part is the key point, if this is solved, rest of them are solved.

With my best,
Jim
Last edited by javathunderbird on Fri Jun 08, 2007 1:30 am, edited 1 time in total.
Re: A suggestion on hierarchy data structure in CMSMS
Hi Ted,
I've figured out the the way to compute the hierarchy number on each paging load using the nested set model, I had to admit that it's better than my previous algorithm since we don't need to have the idhierarchy stored in the database, just left,right,rank, 3 fields are enough. I'm going to write up another demo later on... nested set model is the one! Hope CMSMS can become a mature and competitive CMS in the future.
After I get my hands on the nested set model, I encountered many little details which sort of take back what I said.
I think a combination of my previous method and the nested set model (Temperarily called it Modified Nested Set Model, or I prefer calling it "Lazy Nested Set Model") should be used.
In this combination method:
1. left and right are purely used to fetch a specific node's path-to-node and family tree(the node itself and its subtree), thus the idhierarchy is not needed.
2. four fields --- left,right,counter,rank, are used to represent the tree and tree traversal.
3. when swapping two nodes, the left and right are not changed, instead, we change the rank and counter only, since the shape of the tree is not changed, only the traversal order is changed.
4. when moving a specific node to a new parent, left,right,counter,rank should be all updated since the shape of the tree is changed
5. when deleting a bunch of nodes scattered around different levels, left and right do not need to be updated, only rank and counter need to be. (That's why I called it lazy)
With this method, this is a protential cons, after doing a lot of (maybe 1,000,000 times) "change a node to a new parent" actions, the left and right might increase pretty soon, we can have a "tree compression" to rebuild left and right, it's mainly meant for saving memory in both php page load and space in database. I think for a website with less than 100,000 pages, it's ok not compressing the tree.
All being said, this method is only one of many forms of tree algorithm based on nested set model, a better solution could be made.
Hope this note can be considered during the future development of CMSMS and hope this note has its contribution to this process.
With my best,
Jim
I've figured out the the way to compute the hierarchy number on each paging load using the nested set model, I had to admit that it's better than my previous algorithm since we don't need to have the idhierarchy stored in the database, just left,right,rank, 3 fields are enough. I'm going to write up another demo later on... nested set model is the one! Hope CMSMS can become a mature and competitive CMS in the future.
After I get my hands on the nested set model, I encountered many little details which sort of take back what I said.
I think a combination of my previous method and the nested set model (Temperarily called it Modified Nested Set Model, or I prefer calling it "Lazy Nested Set Model") should be used.
In this combination method:
1. left and right are purely used to fetch a specific node's path-to-node and family tree(the node itself and its subtree), thus the idhierarchy is not needed.
2. four fields --- left,right,counter,rank, are used to represent the tree and tree traversal.
3. when swapping two nodes, the left and right are not changed, instead, we change the rank and counter only, since the shape of the tree is not changed, only the traversal order is changed.
4. when moving a specific node to a new parent, left,right,counter,rank should be all updated since the shape of the tree is changed
5. when deleting a bunch of nodes scattered around different levels, left and right do not need to be updated, only rank and counter need to be. (That's why I called it lazy)
With this method, this is a protential cons, after doing a lot of (maybe 1,000,000 times) "change a node to a new parent" actions, the left and right might increase pretty soon, we can have a "tree compression" to rebuild left and right, it's mainly meant for saving memory in both php page load and space in database. I think for a website with less than 100,000 pages, it's ok not compressing the tree.
All being said, this method is only one of many forms of tree algorithm based on nested set model, a better solution could be made.
Hope this note can be considered during the future development of CMSMS and hope this note has its contribution to this process.
With my best,
Jim
Last edited by javathunderbird on Sat Jun 09, 2007 8:56 am, edited 1 time in total.