Page 1 of 1
front end file management and document indexing
Posted: Tue May 13, 2008 3:12 pm
by nwcon
I'm using Front End File Management to manage documents on my website, and I'm trying to find a way to allow indexing of these public documents by robots. I don't want robots to index everything in the uploads directory, nor everything accessed via /index.php?mact
There's one line in my robots.txt file that prevents indexing:
Disallow: /index.php?mact
If I remove that line, other content available via /index.php?mact gets indexed, which is not what I want.
Since there's no Allow statement in a standard robots.txt file, I'm left scratching my head how to allow indexing of my documents.
Anyone have any suggestions or pointers?
Regards,
nwcon
Re: front end file management and document indexing
Posted: Wed May 14, 2008 3:39 pm
by Pierre M.
Have links to what you want indexed.
Pierre M.
Re: front end file management and document indexing
Posted: Wed May 14, 2008 4:39 pm
by nwcon
Pierre,
What I'm trying to do is to have a 'repository' where I can upload documents and have them dynamically listed and linked on a given page. I started working on an indexing php script to do just that, but then came across Front End File Management.
There are pros and cons with each:
php indexing script
pros: easy to implement; direct links to documents
cons: doesn't integrate with cmsms, so Search module doesn't see or index content in documents; sitemapmadesimple won't see these links, so sitemap will have to be manually edited.
Front End File Management
pros: integrated into cmsms; categorizes docs; can control access to categories based on user's access (feusers)
cons: links generated are not direct links and so pose problems allowing access to robots for indexing, no pagination
Since front end file management seems to fit my needs the best, I'm trying to work through the issues with it. Really, if there were an option within each category to use direct links, problem solved! Right now, the links generated look like this
.../index.php?mact=Uploads,cntnt01,getfile,0&cntnt01showtemplate=false&cntnt01upload_id=34&cntnt01returnid=80
The problem with these links is that if I don't use the following in my robots.txt file, I allow robots to access much more than just my documents.
Disallow: /index.php?mact
And if I remove the above line from robots.txt and use lines like below, I will have to have a line for each module accessible via mact
Disallow: /index.php?mact=FrontEndUsers
Disallow: /index.php?mact=Print
I know google supports an Allow statement in robots.txt, but I don't want to rely on non-standard methods, not to mention this wouldn't help with other crawlers.
Thanks for the input. I'll keep brainstorming...
Hmm, it just occurred to me that if I used a php script to list and link files, that has nothing to do with the sitemapmadesimple problem. Since the howtos page is listed in the sitemap, links on that page will be found by robots anyway. Now it's just a matter of integrating the content of each document into the Search module.
Brainstorming....
Regards,
nwcon
Re: front end file management and document indexing
Posted: Wed May 14, 2008 4:53 pm
by calguy1000
This is a 'you can't have everything' problem.
Uploads provides indirect URLS so that it can do permissions checking, and download tracking, etc.
If there were direct urls you couldn't do that.
I could potentially add pretty url support to Uploads, which would make the robots stuff easier to work with, but I just haven't gotten there yet.
Re: front end file management and document indexing
Posted: Wed May 14, 2008 5:07 pm
by nwcon
calguy1000 wrote:
This is a 'you can't have everything' problem.
And I'm accustomed to that, but that's why we love open source...freedom to modify!
calguy1000 wrote:
Uploads provides indirect URLS so that it can do permissions checking, and download tracking, etc.
Understandable and I'm thankful!
I'm toying with my summary and details templates to see if I can provide direct links that way.
Regards,
nwcon
Re: front end file management and document indexing
Posted: Wed May 14, 2008 8:05 pm
by nwcon
Okay, I hacked the actions.default.php in the Uploads module directory to add suport for a direct url.
Now in my custom templates (summary and detailed) for Front End File Management, I can call {$entry->direct_url} to get the direct path to each file e.g.
direct_url}>{$entry->name}
Of course, if I have files that should be protected, I can use download_url instead.
I haven't tested this without pretty_urls yet, but it seems to work perfectly with pretty_urls and mod_rewrite.
Sumbitted patch to the Feature Request tracker for the Uploads module in CMSMS Forge.
Regards,
nwcon