• twitter image
  • facebook image
  • youtube image
  • linkedin image
Language: CMS Made Simple Czech CMS Made Simple France CMS Made Simple Spain CMS Made Simple Hungary CMS Made Simple Russia CMS Made Simple Netherlands

All times are UTC




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: Robots.txt
PostPosted: Thu Feb 24, 2005 9:17 am 
Is it an idea to include a 'robots.txt' file to the default install??
Maybe for security reasons? Something like this.
Code:
User-agent:   *
Disallow:   /admin
Disallow:   /doc
Disallow:   /images
Disallow:   /install
Disallow:   /lib
Disallow:   /modules
Disallow:   /plugins
Disallow:   /tmp
Disallow:   /uploads


Top
  
 
 Post subject: Re: Robots.txt
PostPosted: Thu Feb 24, 2005 6:21 pm 
Offline
Dev Team Member
Dev Team Member
User avatar

Joined: Thu Jan 27, 2005 5:11 pm
Posts: 311
Location: Los Angeles, CA
spoetnik wrote:
Is it an idea to include a 'robots.txt' file to the default install??
Maybe for security reasons? Something like this.
Code:
User-agent:   *
Disallow:   /admin
Disallow:   /doc
Disallow:   /images
Disallow:   /install
Disallow:   /lib
Disallow:   /modules
Disallow:   /plugins
Disallow:   /tmp
Disallow:   /uploads


I think if we're worried about search engines and hackers being able to get into those areas, it'd be better to use the filesystem's permissions.

My understanding is that the kids who want to deface web sites either attack by pre-written scripts (in which case this won't help), or use robots.txt as a starting point to find out where interesting stuff may be hidden.

_________________
Many modules available from the http://dev.cmsmadesimple.org
The CMS Made Simple Developer Cookbook is now available from Packt Publishers!


Top
 Profile  
 
 Post subject: Re: Robots.txt  [SOLVED]
PostPosted: Fri Sep 23, 2005 2:32 am 
Offline
Power Poster
Power Poster
User avatar

Joined: Fri Jan 07, 2005 1:52 am
Posts: 449
Location: Sydney, OZ
Damn straight.

My organisation's site gets more bad requests for robots.txt (which I studiously avoid using) than the index page!

Kids today - I dunno. ;)

_________________
If at first you don't succeed, ask someone who knows how.


Top
 Profile  
 
 Post subject: Re: Robots.txt
PostPosted: Fri Sep 30, 2005 12:48 am 
robots.txt has it's uses, particluarly when you have a CMS system that has a lot of links with query variables in them. The bots cant get lost pretty quickly and go wandering around forever. I recently installed mediawiki on my box and I sent by search bot to index stuff - I came back 2 hours later (this is normally a 2 minute process) and it was still wandering around indexing the dumbest most useless pages. I don't know how long he would have stayed there fooling around, but I stopped him, deleted his search database (which was massive because of all the dumb pages indexed), and set up a robot.txt file, which saved the day.

Edit: using a proper robot.txt file can help the Googlebot not get lost as well.


Top
  
 
 Post subject: Re: Robots.txt
PostPosted: Tue Nov 01, 2005 8:36 am 
For the reasons outlined above robots.txt is a good idea. It is also possible to block direct access to the directories shown in the robots.txt. For example to stop direct running of scripts in the lib directory place a .htaccess file in that directory with the following  directive (assuming an apache webserver):


Order Deny,Allow
Deny from all


This will stop direct execution (ie. typing the full url) of any scripts in /lib directory, but will not stop inclusion of scripts via include/require statements.

Repeat for any directories where direct execution is not wanted.


Top
  
 
 Post subject: Re: Robots.txt
PostPosted: Tue Nov 01, 2005 10:40 am 
Offline
Administrator
Administrator
User avatar

Joined: Fri Jun 11, 2004 6:58 pm
Posts: 3338
Location: Fairless Hills, Pa USA
That's pretty nifty.  Can you do stuff like etc?

_________________
http://about.me/tedkulp


Top
 Profile  
 
 Post subject: Re: Robots.txt
PostPosted: Tue Nov 01, 2005 11:00 pm 
Offline
Power Poster
Power Poster
User avatar

Joined: Fri Jan 07, 2005 1:52 am
Posts: 449
Location: Sydney, OZ
It's a VERY simple protocol that hasn't been updated since 1994. What you're asking is beyond the scope I think but there are better mehtods for securing directories. This is only a basic instruction for benevolent robots. Others will simply ignore it, or use it to their advantage.

http://www.robotstxt.org/wc/exclusion-admin.html

_________________
If at first you don't succeed, ask someone who knows how.


Top
 Profile  
 
 Post subject: Re: Robots.txt
PostPosted: Wed Nov 02, 2005 5:37 am 
I think wishy was talking about Apache configuration.


Top
  
 
 Post subject: Re: Robots.txt
PostPosted: Wed Nov 02, 2005 5:39 am 
Offline
Power Poster
Power Poster
User avatar

Joined: Fri Jan 07, 2005 1:52 am
Posts: 449
Location: Sydney, OZ
Ah, right you are. Yes. [cough][cough]

Yes, Apache config through htaccess is the way.

_________________
If at first you don't succeed, ask someone who knows how.


Top
 Profile  
 
 Post subject: Re: Robots.txt
PostPosted: Tue Nov 29, 2005 8:36 pm 
Offline
Language Partners
Language Partners

Joined: Tue Nov 15, 2005 9:08 pm
Posts: 868
Hi,

I agree with the guy who said that script-kiddies would normally start by looking at robots.txt to find something sweet that should be normally ignored by the search bots. Thus I attach my own robots.txt that I currently use. Hope that helps someone.

[attachment deleted by admin]

_________________
unsigned double ZYV;


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Arvixe - A CMSMS Partner