Page 1 of 2
Some URLs redirect to home page instead of 404
Posted: Sun Nov 05, 2006 3:06 pm
by Grantovich
A long time ago, my site used a different CMS, and its URLs looked like this:
http://site.com/?q=node/32
Now I have CMSMS, and I would like all those old URLs to go to a proper 404 error page so the search engines will remove them from their indexes. The problem is, CMSMS keeps redirecting them to the home page, causing the search bots to think they are still active, and my home page is continuously re-indexed under 10 or 20 different (meaningless) URLs. All other nonexistant pages go to a 404 message, but for some reason, URLs in the form "site.com/?q=anything" are always redirected to the home page.
Here's my .htaccess (it makes things come out as "site.com/parent/child"):
Code: Select all
Options -Indexes
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^www\.(.*)$
RewriteRule ^(.*)$ http://%1/$1 [R,L]
RewriteCond %{REQUEST_FILENAME} !-f [NC]
RewriteCond %{REQUEST_FILENAME} !-d [NC]
RewriteRule ^(.+)$ index.php?page=$1 [QSA]
Anybody know how I can stop the mystery redirects?
Re: Some URLs redirect to home page instead of 404
Posted: Sun Nov 05, 2006 4:26 pm
by tsw
Grantovich wrote:
RewriteCond %{HTTP_HOST} ^www\.(.*)$
RewriteRule ^(.*)$ http://%1/$1 [R,L]
not sure but those rules looks like it might be the cause....
Re: Some URLs redirect to home page instead of 404
Posted: Sun Nov 05, 2006 5:42 pm
by Grantovich
tsw wrote:
Grantovich wrote:
RewriteCond %{HTTP_HOST} ^www\.(.*)$
RewriteRule ^(.*)$ http://%1/$1 [R,L]
not sure but those rules looks like it might be the cause....
Hmm... what that is supposed to do is strip the "
www." from the start of the address, if present. I commented out this section and tried again, but it still redirected "site.com/?q=anything" to the home page.
EDIT: I really should not be saying
redirect because that's not what it is. If you enter a /?q= URL into the browser, it
returns the home page, but doesn't actually
redirect you to the home page's "true" address.
Re: Some URLs redirect to home page instead of 404
Posted: Sun Nov 05, 2006 9:38 pm
by swgreed
Add this to your .htaccess
Code: Select all
ErrorDocument 404 http://www.domain.com/404.htm
Re: Some URLs redirect to home page instead of 404
Posted: Mon Nov 06, 2006 12:53 am
by Grantovich
swgreed wrote:
Add this to your .htaccess
Code: Select all
ErrorDocument 404 http://www.domain.com/404.htm
Doesn't seem to change anything. "
http://domain.com/?q=whatever" still returns the home page.
Re: Some URLs redirect to home page instead of 404
Posted: Wed Nov 08, 2006 9:42 pm
by Grantovich
Well, I have some good news: It appears that the search engines are gradually realizing that all of the /?q= URLs return the same page, and they are removing these URLs as duplicate content. The process is glacially slow, of course.
I tried making my own RewriteRule that would return a 410 error if the URL contained "?q=" anywhere in it, but I must not have a very good understanding of the mechanism, because it didn't change anything. Then again, for all I know the problem could be in CMSMS and have nothing whatsoever to do with htaccess. Anybody have any ideas?
Re: Some URLs redirect to home page instead of 404
Posted: Thu Nov 09, 2006 9:07 pm
by Pierre M.
HTTP "Gone" is 410 and not 404.
What about something like this :
RewriteRule ^?q=node/.* - [R=410,L]
Not quite sure, but see Apache doc.
And don't have duplicated content (same content available at multiple URLs ) : your site may be blacklisted as spammer.
Hope it helps.
PM
Re: Some URLs redirect to home page instead of 404
Posted: Fri Nov 10, 2006 12:55 am
by Grantovich
Pierre M. wrote:
HTTP "Gone" is 410 and not 404.
What about something like this :
RewriteRule ^?q=node/.* - [R=410,L]
Not quite sure, but see Apache doc.
And don't have duplicated content (same content available at multiple URLs ) : your site may be blacklisted as spammer.
Hope it helps.
PM
Ow! Adding this line results in every page on the entire site giving a "500 Internal Server Error". Strange, because it looks legitimate given my extremely limited knowledge of htaccess.
By the way, I don't have any (intentionally) duplicated content. The problem is that the home page is returned, not redirected, for all of the old ?q= URLs. As far as the search engines are concerned, all of the old URLs are completely different pages, but with exactly the same content. There is no redirecting involved; the home page effectively exists at all of those locations. It's that particular attribute of the problem that's driving me batty.
Re: Some URLs redirect to home page instead of 404
Posted: Sat Nov 11, 2006 7:59 am
by Pierre M.
I said I was not quite sure

Here is another try :
1°)Set up a static error page for 410 :
ErrorDocument 404 /sorry_not_found.html
ErrorDocument 410 /sorry_gone.html
2°)Tell the old stuff has gone with mod_alias :
Redirect gone /q=node
according to
http://httpd.apache.org/docs/2.2/mod/mo ... l#redirect
or use mod_rewrite (I've previously forgotten the '$') :
RewriteEngine On
RewriteBase /
RewriteRule ^?q=node/.*$ - [G,L]
...or may be RewriteRule ^?q=node/.*$ /sorry_gone.html [G,L]
PM
Re: Some URLs redirect to home page instead of 404
Posted: Sat Nov 11, 2006 3:58 pm
by Grantovich
Pierre M. wrote:
I said I was not quite sure

Here is another try :
1°)Set up a static error page for 410 :
ErrorDocument 404 /sorry_not_found.html
ErrorDocument 410 /sorry_gone.html
2°)Tell the old stuff has gone with mod_alias :
Redirect gone /q=node
according to
http://httpd.apache.org/docs/2.2/mod/mo ... l#redirect
or use mod_rewrite (I've previously forgotten the '$') :
RewriteEngine On
RewriteBase /
RewriteRule ^?q=node/.*$ - [G,L]
...or may be RewriteRule ^?q=node/.*$ /sorry_gone.html [G,L]
PM
I tried every one of the following (turns out the "500 Internal Server Error" comes from not escaping the question mark):
Redirect gone /?q=
RedirectMatch gone /\?q=.*
RewriteRule ^\?q=.*$ - [G,L]
RewriteRule ^\?q=.*$ /gone.htm [G,L]
RewriteRule \?q=.* /gone.htm [G,L]
RewriteRule (.*)\?q=(.*) /gone.htm [G,L]
None of them had any effect whatsoever. It especially irritates me that not even "RedirectMatch" worked, since I'm quite positive I'm using it the right way, and the documentation agrees. Argh.
Re: Some URLs redirect to home page instead of 404
Posted: Sun Nov 12, 2006 7:04 pm
by Pierre M.
Argh as you say.
So, there is no more any 500 ? Good point escaping the question mark. Well done, I had missed it.
What is your OS and what is your Apache ?
And what about the client side ? Have you tried with wget verbose on ?
I agree RedirectMatch should be the best. But tell us what is happening : the browser/wget gets 404, 3xx, 5xx, something else ?
(may be you should clear client side cache each time you test)
Sorry to ask for checking this, but with RewriteRule ^\?q=.*$ - [G,L] do you have RewriteEngine On and a good RewriteBase too ?
Please check the name and location of your .htaccess too.
And the files' permissions (.htaccess, gone.htm, 404.html...)
PM
Re: Some URLs redirect to home page instead of 404
Posted: Mon Nov 13, 2006 11:58 pm
by Grantovich
I'm not sure what server information I'm looking for, so I have just attached the result of the phpinfo() function on my site. Hopefully it will be of some use. (you will need to change the extension to
.htm to view it)
http://www.seoconsultants.com/tools/headers.asp
I have used this site to test the return codes for pages that start with "/?q=". Whether or not I have the RewriteRule or RedirectMatch in the htaccess, it always returns the same thing: 200 OK. I try a different URL each time to avoid caching issues (like "/?q=watermelon", "/?q=pineapple", etc).
As you can see in the original htaccess at the top of the post, I have RewriteEngine on and RewriteBase /. The file permissions must be set correctly, because the original file works and does exactly what I want it to.
[gelöscht durch Administrator]
Re: Some URLs redirect to home page instead of 404
Posted: Tue Nov 14, 2006 12:15 pm
by Pierre M.
So, your IE answers 404 and somebody else answers 200. At least one of you is wrong. You must have good information on this to go on.
So I resuggest : Have you tried with wget instead of IE ?
wget -v -S -o log1
http://my.server.tld/known/working/static/url1.html
wget -v -S -o log2
http://my.server.tld/known/redirected/static/url2.html
wget -v -S -o log3
http://my.server.tld/?q=kiwi
Please report a matrix of your RedirectMatch/RewriteRule combinaison vs these 3 columns, with code (2xx-5xx), content result and log in each cell.
BTW, please check your username/password to the db in config.php too.
PM
Re: Some URLs redirect to home page instead of 404
Posted: Wed Nov 15, 2006 12:37 am
by Grantovich
Pierre M. wrote:
So, your IE answers 404 and somebody else answers 200. At least one of you is wrong. You must have good information on this to go on.
I don't understand... what do you mean by "my IE answers 404 and somebody else answers 200"? Anyway, I was about to say I can't use wget because I'm on a shared server with no shell access, but then I remembered cron jobs. So here we go.
Key: htaccess line -- code returned by
site.com/static.html -- code returned by
site.com/parent/child (remapped by htaccess to
site.com/index.php?page=child) -- code returned by
site.com/?q=something
No change from original (see first post) -- 200 -- 200 -- 200
RedirectMatch gone /\q=.* -- 200 -- 200 -- 200
RedirectMatch gone (.*)\?q=(.*) -- 200 -- 200 -- 200
RewriteRule ^\?q=.*$ - [G,L] -- 200 -- 200 -- 200
RewriteRule ^\?q=.*$ /gone.htm [G,L] -- 200 -- 200 -- 200
RewriteRule \?q=.* /gone.htm [G,L] -- 200 -- 200 -- 200
The logs for all the static page requests look like this:
Code: Select all
--19:20:01-- http://site.com/static.html
=> `static.html'
Resolving site.com... x.x.x.x
Connecting to site.com|x.x.x.x|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 15 Nov 2006 00:20:01 GMT
Server: Apache/1.3.37 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2
mod_bwlimited/1.4 PHP/4.4.3 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.28
OpenSSL/0.9.7a
Last-Modified: Tue, 31 Oct 2006 01:03:45 GMT
ETag: "12bc028-0-4546a0f1"
Accept-Ranges: bytes
Content-Length: 0
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html
Length: 0 [text/html]
The logs for all the rewritten page requests look like this:
Code: Select all
--19:22:01-- http://site.com/parent/child
=> `child'
Resolving site.com... x.x.x.x
Connecting to site.com|x.x.x.x|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 15 Nov 2006 00:22:01 GMT
Server: Apache/1.3.37 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2
mod_bwlimited/1.4 PHP/4.4.3 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.28
OpenSSL/0.9.7a
X-Powered-By: PHP/4.4.3
Connection: close
Content-Type: text/html; charset=UTF-8
Length: unspecified [text/html]
The logs for all the ?q= requests look like this:
Code: Select all
--19:24:01-- http://site.com/?q=kiwi
=> `index.html?q=kiwi'
Resolving site.com... x.x.x.x
Connecting to site.com|x.x.x.x|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 15 Nov 2006 00:24:02 GMT
Server: Apache/1.3.37 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2
mod_bwlimited/1.4 PHP/4.4.3 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.28
OpenSSL/0.9.7a
X-Powered-By: PHP/4.4.3
Connection: close
Content-Type: text/html; charset=UTF-8
Length: unspecified [text/html]
I get the exact same logs for all the htaccess lines listed above. In every case, the static page returns itself, the remapped page returns its correct location (site.com/parent/child is correctly remapped to site.com/index.php?page=child), and the ?q= URL returns the home page (exactly the same as going to just site.com). Please correct me if I've left out some information that is important.
Re: Some URLs redirect to home page instead of 404
Posted: Wed Nov 15, 2006 10:25 pm
by Pierre M.
If Apache allways answers "200 OK", may be your Internet Explorer (IE) is bogus when reporting 404 ?
PM