Is it possible to stop a searchbot from accessing a single page from within the admin panel?
I've created page that notifies me via email when a visitor has accessed the page (a download), but it seems that it's being visited by the crawlers.
Thanks
exclude page from search engines
Re: exclude page from search engines
Searchengines are identified by the user agent.
You could create a UDT that checks the user agent and if it is a known search bot and the content property "searchable" is ot set return an empty page with header 403.
This UDT can be placed in the pagedata or in the template.
You could also try a robots.txt.
You could create a UDT that checks the user agent and if it is a known search bot and the content property "searchable" is ot set return an empty page with header 403.
This UDT can be placed in the pagedata or in the template.
You could also try a robots.txt.
-
- Support Guru
- Posts: 8169
- Joined: Tue Oct 19, 2004 6:44 pm
Re: exclude page from search engines
uhm, add a meta noindex tag... add a nofollow attribute to the links to that page...
Follow me on twitter
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
Re: exclude page from search engines
Code: Select all
<meta content="noindex, nofollow" name="robots"/>
Re: exclude page from search engines
But the metadata does not prevent a bot to access the page.
It just won't be indexed.
Anyway the whole page is already processed when the bot recieves the noindex, nofollow stuff.
And that means the script that informs you via email might already been processed no matter what is in the metadata.
This is why i would do this using a plugin or UDT to prevent the page to be rendered and avoid any scripts to be processed if the user agent is a just a bot.
By the way how is the script working?
Is it event driven or dou you have a plugin/UDT placed in your page?
Maybe you can just change your email script to check for the user agent and then decide if an email will be sent or not.
It just won't be indexed.
Anyway the whole page is already processed when the bot recieves the noindex, nofollow stuff.
And that means the script that informs you via email might already been processed no matter what is in the metadata.
This is why i would do this using a plugin or UDT to prevent the page to be rendered and avoid any scripts to be processed if the user agent is a just a bot.
By the way how is the script working?
Is it event driven or dou you have a plugin/UDT placed in your page?
Maybe you can just change your email script to check for the user agent and then decide if an email will be sent or not.
Re: exclude page from search engines
Thanks for the replies-
I have a UDT that captures the ip and useragent, and sends a formatted email, and an onload in the source that redirects to the download.
Strangely enough, the page has been crawled twice (google and yahoo) almost immediately after it was created, but not since. I worried that i would catch a flood of non human visitors, and still might once I actually submit the site to search engines, or when i add more links to the page.
I'll add the robots.txt as
if it gets out of control, maybe i'll attempt to set up a bot trap
http://www.kloth.net/internet/bottrap.php
As i'm more of a muddler than a programmer, scripting the UDT to trap for bots and then return a 403 header might take me an inordinate amount of time.
Thanks again!
I have a UDT that captures the ip and useragent, and sends a formatted email, and an onload in the source that redirects to the download.
Strangely enough, the page has been crawled twice (google and yahoo) almost immediately after it was created, but not since. I worried that i would catch a flood of non human visitors, and still might once I actually submit the site to search engines, or when i add more links to the page.
I'll add the robots.txt as
Code: Select all
User-agent: *
Disallow: /index.php?page=somepagename
if it gets out of control, maybe i'll attempt to set up a bot trap
http://www.kloth.net/internet/bottrap.php
As i'm more of a muddler than a programmer, scripting the UDT to trap for bots and then return a 403 header might take me an inordinate amount of time.
Thanks again!
Re: exclude page from search engines
If your script already captures the useragent you just need to check if it matches with any known bots. If not send the email. Else... just do nothing.edented wrote:
I have a UDT that captures the ip and useragent, and sends a formatted email, and an onload in the source that redirects to the download.
You could create an array that contains all known bots and after the script captured the useragent just check with this:
Code: Select all
$bots = array('bot1','bot2','bot3', ...);
if(!in_array($useragent,$bots)) {
send email...
}
You can do the same thing with bad ips.
And whenever your bot-trap (if you use one) identifies a new bot you can adapt the array and add the useragent and/or the ip address.
So you won't get emails if just bots crawl your site and you don't need to hide the page from bots search indexes.