I would like to add PDF files to the list of things I can search for on my CMSMS site. In other words, I'd like to be able to upload a bunch of PDFs, index them into the normal search index tables, and have them show up in search results, labeled as PDFs.
I figure the hardest part would simply be to get the PDFs indexed.
And the other hard part would be to keep the index current, to reflect new PDFs and deleted PDFs.
Where can I look in the existing Search plugin for examples of how it does the indexing part?
Does a similar module already exist that I can look at as an example?
Should I look at creating a new module to implement this feature, or would you recommend a different method?
Are cron jobs implemented as a part of CMSMS anywhere? (for keeping the index up to date)
Indexing PDFs
Re: Indexing PDFs
There's no module for indexing pdf or documents. Would be nice if you could create someone or enhance Search module
.
CMSms have a set of registred Events, which can be enhanced by modules. If the event is coming (e.g. editing content) you can run an action which must be defined as user defined tag (or by module). Current Search module have such an action.

It's not a real cron job but there's something you can use for it. It's named EventManager (admin panel, Extensions > EventManager).davidlanier wrote: Are cron jobs implemented as a part of CMSMS anywhere? (for keeping the index up to date)
CMSms have a set of registred Events, which can be enhanced by modules. If the event is coming (e.g. editing content) you can run an action which must be defined as user defined tag (or by module). Current Search module have such an action.
-
- Support Guru
- Posts: 8169
- Joined: Tue Oct 19, 2004 6:44 pm
- Location: Fernie British Columbia, Canada
Re: Indexing PDFs
1) Search will not index pdf's..... some code would have to be written for this.
2) The event manager is not a cron thing.
3) It is possible to put a cron job in that uses wget on a CMS URL.
However, you either have to modify code to put that action on the frontend (not using any authentication)
or figure out how to create a session with wget so that it can automatically log in to the admin section
and trigger the action.
I haven't looked into this yet.
2) The event manager is not a cron thing.
3) It is possible to put a cron job in that uses wget on a CMS URL.
However, you either have to modify code to put that action on the frontend (not using any authentication)
or figure out how to create a session with wget so that it can automatically log in to the admin section
and trigger the action.
I haven't looked into this yet.
Follow me on twitter
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
-
- Support Guru
- Posts: 8169
- Joined: Tue Oct 19, 2004 6:44 pm
- Location: Fernie British Columbia, Canada
Re: Indexing PDFs
actually, after reading the wget manual and faq it should be possible with multiple wget commands to login to cms made simple, and then to issue admin commands.
Follow me on twitter
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
-
- New Member
- Posts: 5
- Joined: Mon Aug 13, 2007 11:01 pm
Re: Indexing PDFs
Great thoughts. Thanks for the replies. I don't know if I'll be able to dig that deep. But in case I do, where is the code in the search module that actually DOES the indexing? (so I can see a model to follow)
(this is probably more of a feature request): if there is interest in enabling a cron mechanism, a way to implement might be like this:
1- create a cron.php script
2- execute it either from command-line "/usr/bin/php cron.php" or via url, using appropriate wget coolness.
3- add some sort of api hooks, so that cron.php knows which things to execute.
Using this method avoids the problem of having to add a separate cron for each task that needs to be done. - let CMSMS handle that.
(this is probably more of a feature request): if there is interest in enabling a cron mechanism, a way to implement might be like this:
1- create a cron.php script
2- execute it either from command-line "/usr/bin/php cron.php" or via url, using appropriate wget coolness.
3- add some sort of api hooks, so that cron.php knows which things to execute.
Using this method avoids the problem of having to add a separate cron for each task that needs to be done. - let CMSMS handle that.
Re: Indexing PDFs
Has anyone made any progress on this? I need to do the exact same thing. Thanks!
Jeff T.
Jeff T.
Mmmmm... Tasty.