Creation of CMSMS Project Groups

Project Announcements. This is read-only, as in... not for problems/bugs/feature request.
Hans
Forum Members
Forum Members
Posts: 61
Joined: Sun Oct 09, 2005 10:49 am

Re: Creation of CMSMS Project Groups

Post by Hans »

Isn't this the wrong Topic for a discussion on PiSearch?
Hans
User avatar
sjg
Power Poster
Power Poster
Posts: 310
Joined: Thu Jan 27, 2005 5:11 pm
Location: Los Angeles, CA

Re: Creation of CMSMS Project Groups

Post by sjg »

Isn't this the wrong Topic for a discussion on PiSearch?
Probably :)

I've been discussing the requirements for a robust search module.

My issue with the approach PiSearch takes is that it'll work fine on a site with a couple of pages and/or a site with very light traffic, but will utterly destroy the performance of a server that doesn't meet those criteria. Now, that may not matter to some users, so PiSearch is fine for sites that are small and lightly trafficked. Unfortunately, I have client sites that have some fairly heavy traffic, and some that are quite extensive (hundreds of pages). I'm currently doing search on those sites using external programs (Swish++), but that's not an option for everyone.

Also, as a side note, PiSearch's optimization to only load content of type "content" will not work with News, for example, or any other module-based content type that's not embedded in another page. Again, this won't matter to everyone. However, I'm 80% done with a Catalog module, which relies heavily on its own content types. Any search I would use on such a site would need to be able to handle *all* content types.

In any case, here's what I think the minimum acceptable requirements for a professional search module would be (these, of course, are only my opinion... however, my opinion has been shaped by long years of experience and painful lessons learned):
  • Search must utilize an index generated asynchronously (e.g., don't generate your index every time you perform a search)
  • Search should build the index by spidering the site, so page content is seen as the end-user would see them.
  • Search should not induce a heavy load on the server, so the spidering should happen nightly and/or should be incrementally updated on content changes
  • Search should support arbitrary content types without any custom configuration work.
  • Search should not care about which underlying database is used by the CMS.
I've been meaning to write something like this, but it's not at the top of my priority queue right now.  My current solution works for my clients, because I can influence where they host, and can do my own Unixy thing on those machines. As soon as I have a client where that is not the case but who still needs search, I'll be building a search module.
Many modules available from the http://dev.cmsmadesimple.org
The CMS Made Simple Developer Cookbook is now available from Packt Publishers!
Piratos

Re: Creation of CMSMS Project Groups

Post by Piratos »

Look here  http://forum.cmsmadesimple.org/http://p ... 534#p10534

That is speed:

2005 pages are searched
Generated in 6.541597 seconds by CMS Made Simple 0.11-beta4 (cached) using 2024 SQL queries
205 pages are searched
Generated in 1.212142 seconds by CMS Made Simple 0.11-beta4 (cached) using 224 SQL queries

Addodb is in use !!!

Pages are always fetched if in the content is a smartytag.

it is better to test something and than write about it.
User avatar
sjg
Power Poster
Power Poster
Posts: 310
Joined: Thu Jan 27, 2005 5:11 pm
Location: Los Angeles, CA

Re: Creation of CMSMS Project Groups

Post by sjg »

Hey, if those numbers work for you, great!

As I said before, if the performance is good enough for your site, then the module is fine.

However, on some of the sites I have deployed, it's not acceptable for a search performed by a single user to take several seconds (consider how that scales when dozens of users are searching simultaneously). I'm not making these statements to be obnoxious. I'm merely stating my requirements for client sites. If it came across as a slam or attack, I apologize.

For what it's worth, the Swish++ approach yields the following timings for a single user's search of a site with roughly 1000 pages:

26 results found for "light"
real    0m0.007s
user    0m0.010s
sys    0m0.000s

(this is from the live server, with a fair number of people using it)

$ netstat -t | grep -i http | wc -l
    1352
Many modules available from the http://dev.cmsmadesimple.org
The CMS Made Simple Developer Cookbook is now available from Packt Publishers!
Hans
Forum Members
Forum Members
Posts: 61
Joined: Sun Oct 09, 2005 10:49 am

Re: Creation of CMSMS Project Groups

Post by Hans »

Maybe it's a dumb question but it is possible to make a module of Swish++?
I read the site and it seems a good start for the badly needed searchmodule for CMSMS
Hans
User avatar
sjg
Power Poster
Power Poster
Posts: 310
Joined: Thu Jan 27, 2005 5:11 pm
Location: Los Angeles, CA

Re: Creation of CMSMS Project Groups

Post by sjg »

It shouldn't be too hard to make a Swish++ module, but it would require users have shell access. It would also only work under Unix.
Many modules available from the http://dev.cmsmadesimple.org
The CMS Made Simple Developer Cookbook is now available from Packt Publishers!
Piratos

Re: Creation of CMSMS Project Groups

Post by Piratos »

This is the speed of cmsmadesimple home
Generated in 1.219569 seconds by CMS Made Simple 0.10.3 (cached) using 18 SQL queries

This is the speed by testing piserach (205 pages):

Generated in 1.051407 seconds by CMS Made Simple 0.11-beta4 (cached) using 20 SQL queries

Speed of the cms and searching plugins must go hand in hand - no one needs here a search-plugin that is faster as the cms and with this cms only some people are using more as 50 pages.
Piratos

Re: Creation of CMSMS Project Groups

Post by Piratos »

Another result. Contents from the standrad new installation added with 5000 contents with a contentmakescript.

The time to present the results is  15.29580 seconds (word is found in 5000 pages worst case)

But the searchtime is only 4.384 seconds  - the difference is the time the cms always need .

The cms needs 10.911801 seconds to present the  homepage without any searching  (total 5005 records in the database)  !!!

And each time if you will view another page (without searching) you must wait this time.

Wishy's way to generate menusystems is wrong. That is the reason why the cms is so slow.

This - so i mean - is the first he has to do something - alle other is secondary.
User avatar
petert
Power Poster
Power Poster
Posts: 282
Joined: Wed Feb 09, 2005 9:30 pm
Location: behind my desk

Re: Creation of CMSMS Project Groups

Post by petert »

sjg wrote: It shouldn't be too hard to make a Swish++ module, but it would require users have shell access. It would also only work under Unix.
I use phpdig, works great. It works like a search engine should, it 'reads' the webpages, not just the content from the database (that's lame).
Maybe somebody could join so we can make it into a module?
Mambo sucks, that's why I am here.
Now they call it Joomla, but it still sucks!

CMSMS rules!
qbrix
Forum Members
Forum Members
Posts: 24
Joined: Fri Nov 10, 2006 7:56 pm

Re: Creation of CMSMS Project Groups

Post by qbrix »

That would be great. Is there any work being done to the searching mechanism? Either on the standard module or a new one?

Ideally I would like one that can search pdf files. Swish-E or PhpDig can do it, but I'm not sure if I can enough time to make it into a module.

Any updates on this?
Post Reply

Return to “Announcements”