front end file management and document indexing

For questions and problems with the CMS core. This board is NOT for any 3rd party modules, addons, PHP scripts or anything NOT distributed with the CMS made simple package itself.
Post Reply
nwcon
Forum Members
Forum Members
Posts: 18
Joined: Wed Mar 19, 2008 12:28 am

front end file management and document indexing

Post by nwcon »

I'm using Front End File Management to manage documents on my website, and I'm trying to find a way to allow indexing of these public documents by robots.  I don't want robots to index everything in the uploads directory, nor everything accessed via /index.php?mact

There's one line in my robots.txt file that prevents indexing:
Disallow: /index.php?mact

If I remove that line, other content available via /index.php?mact gets indexed, which is not what I want. 

Since there's no Allow statement in a standard robots.txt file, I'm left scratching my head how to allow indexing of my documents.

Anyone have any suggestions or pointers?

Regards,

nwcon
Pierre M.

Re: front end file management and document indexing

Post by Pierre M. »

Have links to what you want indexed.

Pierre M.
nwcon
Forum Members
Forum Members
Posts: 18
Joined: Wed Mar 19, 2008 12:28 am

Re: front end file management and document indexing

Post by nwcon »

Pierre,

What I'm trying to do is to have a 'repository' where I can upload documents and have them dynamically listed and linked on a given page.  I started working on an indexing php script to do just that, but then came across Front End File Management.

There are pros and cons with each:

php indexing script
  pros: easy to implement; direct links to documents
  cons: doesn't integrate with cmsms, so Search module doesn't see or index content in documents; sitemapmadesimple won't see these links, so sitemap will have to be manually edited.

Front End File Management
  pros: integrated into cmsms; categorizes docs; can control access to categories based on user's access (feusers)
  cons: links generated are not direct links and so pose problems allowing access to robots for indexing, no pagination


Since front end file management seems to fit my needs the best, I'm trying to work through the issues with it.  Really, if there were an option within each category to use direct links, problem solved!  Right now, the links generated look like this

.../index.php?mact=Uploads,cntnt01,getfile,0&cntnt01showtemplate=false&cntnt01upload_id=34&cntnt01returnid=80

The problem with these links is that if I don't use the following in my robots.txt file, I allow robots to access much more than just my documents.

Disallow: /index.php?mact

And if I remove the above line from robots.txt and use lines like below, I will have to have a line for each module accessible via mact

Disallow: /index.php?mact=FrontEndUsers
Disallow: /index.php?mact=Print

I know google supports an Allow statement in robots.txt, but I don't want to rely on non-standard methods, not to mention this wouldn't help with other crawlers.

Thanks for the input.  I'll keep brainstorming...

Hmm, it just occurred to me that if I used a php script to list and link files, that has nothing to do with the sitemapmadesimple problem.  Since the howtos page is listed in the sitemap, links on that page will be found by robots anyway.  Now it's just a matter of integrating the content of each document into the Search module.

Brainstorming....

Regards,

nwcon
calguy1000
Support Guru
Support Guru
Posts: 8169
Joined: Tue Oct 19, 2004 6:44 pm
Location: Fernie British Columbia, Canada

Re: front end file management and document indexing

Post by calguy1000 »

This is a 'you can't have everything' problem.

Uploads provides indirect URLS so that it can do permissions checking, and download tracking, etc.

If there were direct urls you couldn't do that.

I could potentially add pretty url support to Uploads, which would make the robots stuff easier to work with, but I just haven't gotten there yet.
Follow me on twitter
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
nwcon
Forum Members
Forum Members
Posts: 18
Joined: Wed Mar 19, 2008 12:28 am

Re: front end file management and document indexing

Post by nwcon »

calguy1000 wrote: This is a 'you can't have everything' problem.
And I'm accustomed to that, but that's why we love open source...freedom to modify!
calguy1000 wrote: Uploads provides indirect URLS so that it can do permissions checking, and download tracking, etc.
Understandable and I'm thankful!


I'm toying with my summary and details templates to see if I can provide direct links that way.

Regards,

nwcon
nwcon
Forum Members
Forum Members
Posts: 18
Joined: Wed Mar 19, 2008 12:28 am

Re: front end file management and document indexing

Post by nwcon »

Okay, I hacked the actions.default.php in the Uploads module directory to add suport for a direct url.

Now in my custom templates (summary and detailed) for Front End File Management, I can call {$entry->direct_url} to get the direct path to each file e.g.
direct_url}>{$entry->name}

Of course, if I have files that should be protected, I can use download_url instead.

I haven't tested this without pretty_urls yet, but it seems to work perfectly with pretty_urls and mod_rewrite.


Sumbitted patch to the Feature Request tracker for the Uploads module in CMSMS Forge.


Regards,

nwcon
Last edited by nwcon on Wed May 14, 2008 8:40 pm, edited 1 time in total.
Post Reply

Return to “CMSMS Core”