Pretty urls and duplicate content
Posted: Thu Nov 13, 2008 10:17 am
Hi
I have been using CMSMS for some time now and love it. Thank you very much to those who have been involved in developing it.
I can't program to save my life and so can't contribute that way. I do know a little about SEO though and am hoping that in finding a solution to this problem on my site, some other users may benefit.
I recently used pretty urls on an install for the first time, having previously used the internal version. I have a difficulty with the amount of duplicate content that gets thrown up. For example:
This page: http://www.domain.com/how-cmsms-works
can be accessed at the following urls:
http://www.domain.com/how-cmsms-works
http://www.domain.com/how-cmsms-works/
http://www.domain.com/index.php/how-cmsms-works
http://www.domain.com/index.php/how-cmsms-works/
http://www.domain.com/index.php?page=how-cmsms-works
http://www.domain.com/index.php?page=how-cmsms-works/
It is quite possible that the page is also accessible directly via it's page id? I had assumed this was the case but can't get it to happen, so maybe not.
If we ignore the trailing slash issue on the assumption that visitors won't notice or care and search engines might be smart enough to figure it out for themselves, there are still 3 separate urls at which the same content can be accessed.
In the documentation, it says something to the effect of, "if you are putting pretty urls into an existing site don't worry about your previously advertised urls because they will continue to work". This is exactly what I want to avoid.
I can understand why someone would have a difficulty with this and would want old style urls to still function for visitors but it strikes me as a really bad idea. From an SEO point of view the general principle would be that each page should have one and only one url. Systematically creating duplicate content is a bad idea.
Outside a CMS, if I had to change the url of a page and didn't want to just let the old page die naturally, the correct way to deal with it is to 301 redirect from the old url to the new. The reason for this is to avoid committing search engine suicide with duplicate content.
I am hoping someone can help me with this. Ideally I need to set this up so that it works the following way:
I choose one of:
http://www.domain.com/how-cmsms-works
http://www.domain.com/how-cmsms-works/
the other either 404's or redirects to the canonical form.
These url forms:
http://www.domain.com/index.php/how-cmsms-works
http://www.domain.com/index.php/how-cmsms-works/
should both give 404's. This bit is probably the most important.
For those with existing urls, the ideal solution would be different and would involve 301 redirecting these url forms to the new url format.
I am guessing that the index.php?page= format has to stay. The index.php script has to be called somewhere (am I right?) but I can change "page" to something else in the config file. I am hoping that when I change "page" to "cccyy", old urls of the form www.domain.com/index.php?page=page-alias will stop working and give a 404 error.
Basically I need all of the old url forms to 404 so that the search engines remove them from their index and I am left without any duplicate content problems.
Does anyone have any idea how I would achieve this?
On a separate but related issue, /index.php?page followed by anything is showing the homepage, presumably with a 200 OK server response, which is another duplicate content issue.
I can't overstate how bad an idea it is to have a system that cheerfully churns out duplicate content and I am really hoping that this is a result of some kind of setup error on my part.
I would really, really appreciate any help anyone can give in solving this problem.
I have been using CMSMS for some time now and love it. Thank you very much to those who have been involved in developing it.
I can't program to save my life and so can't contribute that way. I do know a little about SEO though and am hoping that in finding a solution to this problem on my site, some other users may benefit.
I recently used pretty urls on an install for the first time, having previously used the internal version. I have a difficulty with the amount of duplicate content that gets thrown up. For example:
This page: http://www.domain.com/how-cmsms-works
can be accessed at the following urls:
http://www.domain.com/how-cmsms-works
http://www.domain.com/how-cmsms-works/
http://www.domain.com/index.php/how-cmsms-works
http://www.domain.com/index.php/how-cmsms-works/
http://www.domain.com/index.php?page=how-cmsms-works
http://www.domain.com/index.php?page=how-cmsms-works/
It is quite possible that the page is also accessible directly via it's page id? I had assumed this was the case but can't get it to happen, so maybe not.
If we ignore the trailing slash issue on the assumption that visitors won't notice or care and search engines might be smart enough to figure it out for themselves, there are still 3 separate urls at which the same content can be accessed.
In the documentation, it says something to the effect of, "if you are putting pretty urls into an existing site don't worry about your previously advertised urls because they will continue to work". This is exactly what I want to avoid.
I can understand why someone would have a difficulty with this and would want old style urls to still function for visitors but it strikes me as a really bad idea. From an SEO point of view the general principle would be that each page should have one and only one url. Systematically creating duplicate content is a bad idea.
Outside a CMS, if I had to change the url of a page and didn't want to just let the old page die naturally, the correct way to deal with it is to 301 redirect from the old url to the new. The reason for this is to avoid committing search engine suicide with duplicate content.
I am hoping someone can help me with this. Ideally I need to set this up so that it works the following way:
I choose one of:
http://www.domain.com/how-cmsms-works
http://www.domain.com/how-cmsms-works/
the other either 404's or redirects to the canonical form.
These url forms:
http://www.domain.com/index.php/how-cmsms-works
http://www.domain.com/index.php/how-cmsms-works/
should both give 404's. This bit is probably the most important.
For those with existing urls, the ideal solution would be different and would involve 301 redirecting these url forms to the new url format.
I am guessing that the index.php?page= format has to stay. The index.php script has to be called somewhere (am I right?) but I can change "page" to something else in the config file. I am hoping that when I change "page" to "cccyy", old urls of the form www.domain.com/index.php?page=page-alias will stop working and give a 404 error.
Basically I need all of the old url forms to 404 so that the search engines remove them from their index and I am left without any duplicate content problems.
Does anyone have any idea how I would achieve this?
On a separate but related issue, /index.php?page followed by anything is showing the homepage, presumably with a 200 OK server response, which is another duplicate content issue.
I can't overstate how bad an idea it is to have a system that cheerfully churns out duplicate content and I am really hoping that this is a result of some kind of setup error on my part.
I would really, really appreciate any help anyone can give in solving this problem.