Page 2 of 2
Re: Creation of CMSMS Project Groups
Posted: Wed Oct 26, 2005 7:58 pm
by Hans
Isn't this the wrong Topic for a discussion on PiSearch?
Hans
Re: Creation of CMSMS Project Groups
Posted: Thu Oct 27, 2005 12:10 am
by sjg
Isn't this the wrong Topic for a discussion on PiSearch?
Probably
I've been discussing the requirements for a robust search module.
My issue with the approach PiSearch takes is that it'll work fine on a site with a couple of pages and/or a site with very light traffic, but will utterly destroy the performance of a server that doesn't meet those criteria. Now, that may not matter to some users, so PiSearch is fine for sites that are small and lightly trafficked. Unfortunately, I have client sites that have some fairly heavy traffic, and some that are quite extensive (hundreds of pages). I'm currently doing search on those sites using external programs (Swish++), but that's not an option for everyone.
Also, as a side note, PiSearch's optimization to only load content of type "content" will not work with News, for example, or any other module-based content type that's not embedded in another page. Again, this won't matter to everyone. However, I'm 80% done with a Catalog module, which relies heavily on its own content types. Any search I would use on such a site would need to be able to handle *all* content types.
In any case, here's what I think the minimum acceptable requirements for a professional search module would be (these, of course, are only my opinion... however, my opinion has been shaped by long years of experience and painful lessons learned):
- Search must utilize an index generated asynchronously (e.g., don't generate your index every time you perform a search)
- Search should build the index by spidering the site, so page content is seen as the end-user would see them.
- Search should not induce a heavy load on the server, so the spidering should happen nightly and/or should be incrementally updated on content changes
- Search should support arbitrary content types without any custom configuration work.
- Search should not care about which underlying database is used by the CMS.
I've been meaning to write something like this, but it's not at the top of my priority queue right now. My current solution works for my clients, because I can influence where they host, and can do my own Unixy thing on those machines. As soon as I have a client where that is not the case but who still needs search, I'll be building a search module.
Re: Creation of CMSMS Project Groups
Posted: Thu Oct 27, 2005 3:21 pm
by Piratos
Look here
http://forum.cmsmadesimple.org/http://p ... 534#p10534
That is speed:
2005 pages are searched
Generated in 6.541597 seconds by CMS Made Simple 0.11-beta4 (cached) using 2024 SQL queries
205 pages are searched
Generated in 1.212142 seconds by CMS Made Simple 0.11-beta4 (cached) using 224 SQL queries
Addodb is in use !!!
Pages are always fetched if in the content is a smartytag.
it is better to test something and than write about it.
Re: Creation of CMSMS Project Groups
Posted: Thu Oct 27, 2005 6:21 pm
by sjg
Hey, if those numbers work for you, great!
As I said before, if the performance is good enough for your site, then the module is fine.
However, on some of the sites I have deployed, it's not acceptable for a search performed by a single user to take several seconds (consider how that scales when dozens of users are searching simultaneously). I'm not making these statements to be obnoxious. I'm merely stating my requirements for client sites. If it came across as a slam or attack, I apologize.
For what it's worth, the Swish++ approach yields the following timings for a single user's search of a site with roughly 1000 pages:
26 results found for "light"
real 0m0.007s
user 0m0.010s
sys 0m0.000s
(this is from the live server, with a fair number of people using it)
$ netstat -t | grep -i http | wc -l
1352
Re: Creation of CMSMS Project Groups
Posted: Thu Oct 27, 2005 6:27 pm
by Hans
Maybe it's a dumb question but it is possible to make a module of Swish++?
I read the site and it seems a good start for the badly needed searchmodule for CMSMS
Hans
Re: Creation of CMSMS Project Groups
Posted: Thu Oct 27, 2005 6:45 pm
by sjg
It shouldn't be too hard to make a Swish++ module, but it would require users have shell access. It would also only work under Unix.
Re: Creation of CMSMS Project Groups
Posted: Thu Oct 27, 2005 7:19 pm
by Piratos
This is the speed of cmsmadesimple home
Generated in 1.219569 seconds by CMS Made Simple 0.10.3 (cached) using 18 SQL queries
This is the speed by testing piserach (205 pages):
Generated in 1.051407 seconds by CMS Made Simple 0.11-beta4 (cached) using 20 SQL queries
Speed of the cms and searching plugins must go hand in hand - no one needs here a search-plugin that is faster as the cms and with this cms only some people are using more as 50 pages.
Re: Creation of CMSMS Project Groups
Posted: Fri Oct 28, 2005 7:09 am
by Piratos
Another result. Contents from the standrad new installation added with 5000 contents with a contentmakescript.
The time to present the results is 15.29580 seconds (word is found in 5000 pages worst case)
But the searchtime is only 4.384 seconds - the difference is the time the cms always need .
The cms needs 10.911801 seconds to present the homepage without any searching (total 5005 records in the database) !!!
And each time if you will view another page (without searching) you must wait this time.
Wishy's way to generate menusystems is wrong. That is the reason why the cms is so slow.
This - so i mean - is the first he has to do something - alle other is secondary.
Re: Creation of CMSMS Project Groups
Posted: Fri Oct 28, 2005 8:57 am
by petert
sjg wrote:
It shouldn't be too hard to make a Swish++ module, but it would require users have shell access. It would also only work under Unix.
I use phpdig, works great. It works like a search engine should, it 'reads' the webpages, not just the content from the database (that's lame).
Maybe somebody could join so we can make it into a module?
Re: Creation of CMSMS Project Groups
Posted: Tue Nov 28, 2006 9:39 pm
by qbrix
That would be great. Is there any work being done to the searching mechanism? Either on the standard module or a new one?
Ideally I would like one that can search pdf files. Swish-E or PhpDig can do it, but I'm not sure if I can enough time to make it into a module.
Any updates on this?