[SOLVED] Pretty URLs can result in duplicate content

For questions and problems with the CMS core. This board is NOT for any 3rd party modules, addons, PHP scripts or anything NOT distributed with the CMS made simple package itself.
Locked
nivekiam

[SOLVED] Pretty URLs can result in duplicate content

Post by nivekiam »

Firstly, CMS Made Simple appears to be a great product.  I've never tried a CMS package before, but I'm looking at them for one of my sites.  WordPress and Drupal are o.k., but both have some inflexible issues I don't want to deal with.

Now, on to why I'm posting.  I'm speaking strictly in terms of SEO here.  Example:

http://www.cmsmadesimple.org/fdsa/downloads (which does not exist) gets to the same page as http://www.cmsmadesimple.org/downloads

This is not good in the eyes of most search engines.  Now there is no real reason a search engine or person should ever hit those non-existent URLs as they shouldn't be linked from anything, but it extra stuff that needs to be taken into consideration for site management.

I want my site organized like so:

www.example.com/
www.example.com/category/
www.example.com/category/product-blurb/
www.example.com/category/product-blurb/detail/

If someone doesn't get to the right URL I want to send them a 404 error and I want to be able to see this error in my weblogs (or in my CMS package if it's got some reporting functionality).  I don't want people to accidentally get to the right page from a bad link just as long as the last part of the URL is correct.  A search engine will start indexing that and see the same content at two pages and ding you for duplicate content.

BTW, hope you have rel="nofollow" being attached to all links otherwise Google will see those two example links as duplicate content.
Last edited by nivekiam on Thu Jan 10, 2008 5:16 pm, edited 1 time in total.
Pierre M.

Re: Pretty URLs can result in duplicate content

Post by Pierre M. »

Hello,
nivekiam wrote: I want my site organized like so:

www.example.com/
www.example.com/category/
www.example.com/category/product-blurb/
www.example.com/category/product-blurb/detail/

If someone doesn't get to the right URL I want to send them a 404 error and I want to be able to see this error in my weblogs (or in my CMS package if it's got some reporting functionality).  I don't want people to accidentally get to the right page from a bad link just as long as the last part of the URL is correct.  A search engine will start indexing that and see the same content at two pages and ding you for duplicate content.
Don't worry. CMSms can do what you describe. Just turn ON hierarchy in config.php and enable the builtin 404 feature.

Pierre M.
nivekiam

Re: Pretty URLs can result in duplicate content

Post by nivekiam »

Pierre M. wrote: Don't worry. CMSms can do what you describe. Just turn ON hierarchy in config.php and enable the builtin 404 feature.
The hierarchy feature was already on.  I turned on the 404 feature, that works if you get the last part of the URL incorrect, but if any other part of the URL (except for host name) is incorrect, then you get the page.  Creating duplicate content.

Yes, I tried clearing the cache and my local browser cache just to be sure.

I did search, but apparently not on the right terms.  I did find this thread http://forum.cmsmadesimple.org/index.ph ... 011.0.html and actually started digging into the code.

Example:

I have a page called news  It's located at www.example.com/products/news/  But I can still access that exact same page at www.example.com/news/ or even at http://www.example.com/fdsafdsa-f--f-d- ... news/  Two URLs that should be completely non-existent.
Pierre M.

Re: Pretty URLs can result in duplicate content

Post by Pierre M. »

nivekiam wrote: Example:
I have a page called news  It's located at www.example.com/products/news/  But I can still access that exact same page at www.example.com/news/ or even at http://www.example.com/fdsafdsa-f--f-d- ... news/  Two URLs that should be completely non-existent.
Thank you for this example.
If you want hierarchy and you have created /products/news/ use it so. As /news/ is advertised nowhere no bot should guess it and find duplicate content.
I agree that /somejunk/news/ should fire a 404 error. Again, don't worry, bots don't guess/try funky URLs hence no fear of duplicate content. But feel free to file a bug report about /somejunk/news/

BTW, you may be interested in this :
http://forum.cmsmadesimple.org/index.ph ... l#msg60321

Have fun with CMSms

Pierre M.
nivekiam

Re: Pretty URLs can result in duplicate content

Post by nivekiam »

Thanks Pierre.

I'll file a bug report.  I'll still try digging in to see if I can fix it.  I think solving this and fixing the ability to use the same "alias" would be able to be done at the same time with the same code.
giggler
Forum Members
Forum Members
Posts: 197
Joined: Tue Oct 09, 2007 7:08 am

Re: [SOLVED] Pretty URLs can result in duplicate content

Post by giggler »

This post is marked solved, what was the solution to this?
nivekiam

Re: [SOLVED] Pretty URLs can result in duplicate content

Post by nivekiam »

Nothing definitive yet.  I had marked this thread as "solved" because Pierre had answered my question and I consider it a dead thread.  I've come to realize it's really two different topics, but "URL enforcement" needs to be in place before multiple "page aliases" can be done.  At least as far as I can tell see in my head.

But so far, this is what I've worked up for URL Enforcement:

http://forum.cmsmadesimple.org/index.ph ... l#msg95342

This doesn't fix or get any where closer to allowing multiple aliases, that's going to take restructuring CMSms to know and care about the path/URL and no longer care about just the single page alias.

For instance.  You want to create to a couple of pages called photos, one under the page /2008 and one under the page /2007 so you would end up with:
/2008/photos
/2007/photos

In my mind, CMSms should not care or even have any knowledge of a "page alias" called "photos".  But should look at those two "page aliases" like so:
/2008/photos
/2007/photos

or like so:
index.php?page=/2008/photos
index.php?page=/2007/photos

Not
index.php?page=photos

So whether you had hierarchy turned off or on, there was always some sort of hierarchy.  Really making the setting of hierarchy in config.php obsolete, because CMSms would always use hierarchy.

I still need to file a bug report or feature request for this.  For the URL "enforcement", once I've polished some of it up, I'll try doing what ever I need to, to get it added to core code, if they'll accept it.  Otherwise I'll post my mod to the wiki.

I'll finish up what I'm working on either for URL Enforcement tonight or tomorrow night and post it in the thread I linked to above as well as here just to wrap everything up.
Locked

Return to “CMSMS Core”