Massive duplicate content

The place to talk about things that are related to CMS Made simple, but don't fit anywhere else.
Post Reply
jobo9968
New Member
New Member
Posts: 5
Joined: Fri Jan 03, 2014 6:22 pm

Massive duplicate content

Post by jobo9968 »

Hello.
I'm very new to CMSMS

Doing some SEO for a client with a website built on your script.
The site has huge problem with duplicate content
The homepage has 5 different urls for homepage

domain.com/
domain.com/index.php
domain.com/en_US/home
domain.com/index.php/home
domain.com/index.php/en_US/home

and every page has 4 urls

domain.com/something
domain.com/en_US/something
domain.com/index.php/en_US/something
domain.com/index.php/something

That what I found.
Could be more of those.
I'm not sure if that's normal for the script, and if you are aware of this issue. This is huge problem for SEO, as the duplicate content penalty from google is real.

I managed to redirect non www to www within the htaccess file as before the number of the same urls was doubled (10 urls for homepage)
Tried some other redirect tricks (for instance redirecting index.php to site root) but couldn't make it to work

Can you please help with any solution?
rgds

Jan
calguy1000
Support Guru
Support Guru
Posts: 8169
Joined: Tue Oct 19, 2004 6:44 pm

Re: Massive duplicate content

Post by calguy1000 »

This has been discussed numerous times before.

This is why you NEED to have the canonical URL's setup and working properly.
Follow me on twitter
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
User avatar
Dr.CSS
Moderator
Moderator
Posts: 12711
Joined: Thu Mar 09, 2006 5:32 am

Re: Massive duplicate content

Post by Dr.CSS »

I have a feeling these were URLs that you typed into the address bar of a browser, yes they will work but no search bot is going to guess URLs to see if you have duplicate content, CMSMS doesn't care what you put between mysite.com/ and the alias of the page, mysite.com/stupid/guessed/url/pagealias.html...

Unless I'm wrong and you found these URLs in google...

If you find duplicates of things like News look at a default page template for the canonical code in the <head> that you can use in your page templates...
jobo9968
New Member
New Member
Posts: 5
Joined: Fri Jan 03, 2014 6:22 pm

Re: Massive duplicate content

Post by jobo9968 »

I love this kind of replies on forums.
You know the forum inside out. So do I my stuff.
I tried so many different searches.
Spent nearly whole day trying to find the solution.
Maybe I searched for wrong things.
If you just pointed me to right direction (url) I was so thankful.

rgds
jobo9968
New Member
New Member
Posts: 5
Joined: Fri Jan 03, 2014 6:22 pm

Re: Massive duplicate content

Post by jobo9968 »

Dr. CSS
I'm not sure that you are right. At least in my situation
When I click the link in my menu I go to domain.com/index.php/en_US/something

While google has indexed it domain.com/something

Atm all indexed pages in google are pages not related to CMSMS
These are some wordpress addons.
Even domain is not indexed.
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: Massive duplicate content

Post by JohnnyB »

Patience Padawan ;)
Please review the following as it may help your situation.

Not all demo templates have the canonical link included, be sure you have this inside the < head >

Code: Select all

{if isset($canonical)}<link rel="canonical" href="{$canonical}" />{elseif isset($content_obj)}<link rel="canonical" href="{$content_obj->GetURL()}" />{/if}
Next, I highly recommend using config['root_url'] inside of the config.php file. New versions of CMSMS do not have this but I find it necessary because otherwise, relative urls could be parsed using an alternate domain name.

For example, if you have my-site.com (as the primary domain) and my-cool-site.com (as an add-on domain) pointing at the same installation without proper redirects in place, the links could use both domains depending upon how the site was reached by the visitor.

Setting the config['root_url'] leaves out the unknown.

Code: Select all

$config['root_url'] = 'http://www.my-site';
This also allows you to set the 'www' prefix (or not) depending upon how you want the site to be indexed by Google, et al.

Next, set url rewriting in the config.php file to give yourself those pretty seo URLs without seeing the 'index.php' in the path.

Code: Select all

$config['url_rewriting'] = 'mod_rewrite';
Go to the /docs/htaccess.txt to see the apache rewrite rules needed inside of your .htaccess file.

Finally, be sure there is a <base> tag being printed in your HTML page. Without it, there is too much guessing. The template tag {metadata} will provide that to you and be sure your config.php file does not contain config['show_base'] = false;
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
User avatar
Jo Morg
Dev Team Member
Dev Team Member
Posts: 1974
Joined: Mon Jan 29, 2007 4:47 pm

Re: Massive duplicate content

Post by Jo Morg »

Also, a list of installed modules and your current system info would go a long way in terms of getting proper help. It seems like you have a multi-lang type of module... but there is not much more one can add to help without knowing which modules and versions you have installed...
"There are 10 types of people in this world, those who understand binary... and those who don't."
* by the way: English is NOT my native language (sorry for any mistakes...).
Code of Condut | CMSMS Docs | Help Support CMSMS
My developer Page on the Forge
GeekMoot 2015 in Ghent, Belgium: I was there!
GeekMoot 2016 in Leicester, UK: I was there!
DevMoot 2023 in Cynwyd, Wales: I was there!
jobo9968
New Member
New Member
Posts: 5
Joined: Fri Jan 03, 2014 6:22 pm

Re: Massive duplicate content

Post by jobo9968 »

The site was just moved to another server 2 days before google deindexed the homepage.
Might be some issues with mod_rewrite? (I don't know to much about it)
As I said I managed to 301 non www to www.
This way I was able to lower down the number of urls by 50%

Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself

Do not see any changes in the urls the site is generating

the details

see also the site http://www.voltairediamonds.ie (I believe you can delete the url if you want)

CMS Version

1.4.1

Installed Modules

CMSMailer

1.73.14

FileManager

0.4.1

MenuManager

1.5.1

ModuleManager

1.2.1

nuSOAP

1.0.1

Printing

0.2.5

Search

1.5.1

ThemeManager

1.0.8

TinyMCE

2.4.8

FormBuilder

0.5.5

FormBrowser

0.2.3

FormSubmissions

0.2.3

CGExtensions

1.15.2

NMS

2.2

Cataloger

0.7

Blogs

0.3.3.1



Server API (server_api):

cgi-fcgi

Server Database (server_db_type):

MySQL (mysql)

Server Database Version (server_db_version):

5.5.34 Success

Server Software (server_software):

Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635

Server Operating System (server_os):

Linux 2.6.18-371.1.2.el5xen On i686
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: Massive duplicate content

Post by JohnnyB »

I don't see the canonical link being sent to the head of the home page or other pages I checked. This is important because if you have 4 ways a page can be reached, the canonical link will tell Google, and others which of those is the dominant page URL.

(https://support.google.com/webmasters/a ... 9394?hl=en)

The other strange thing I see is the link structure presented in the menu and in the navigation Links in the head seems to indicate that there is a multi language module or custom code generating links. Your base ref tag looks good, but everything else seems to think that the root_url is http://www.voltairediamonds.ie/index.php/en_US.

The next thing which you may already realize is that you are running a very old version - mid 2008 - version of CMSMS which no one will want to advise about. Now that I think about it, the canonical URL wasn't supported back then and I posted a somewhat flimsy way around it.
Just found it here: http://forum.cmsmadesimple.org/viewtopi ... =4&t=30923

But, with all that said, if you want to post your entire config.php file here, I'll take a quick look and see if there's anything you should correct.

BTW, the source on one of the pages shows a ton of code in comments that exposes your server name and path which could be a security risk for you.
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
User avatar
Jo Morg
Dev Team Member
Dev Team Member
Posts: 1974
Joined: Mon Jan 29, 2007 4:47 pm

Re: Massive duplicate content

Post by Jo Morg »

jobo9968 wrote:CMS Version

1.4.1
You really need to update CMSMS, mainly for security reasons, but also because you'll only get support for the latest two versions of CMSMS.
jobo9968 wrote:Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself

Do not see any changes in the urls the site is generating
The canonical code is meant to let the search engines know what is the proper link to access the content and index only that link. What you have there seems to be a mix of problems:
- Pretty URL's don't seem to be properly configured: http://docs.cmsmadesimple.org/customizi ... timization ;
- there is a sample .htacess on the docs dir of a standard CMSMS install, but you may get a copy from the latest CMSMS archive;
- The default page (the home of the site) seems to be inside a "en_US" Section Header there is no need;
- Usually IIRC, CMSMS doesn't use anything other than the base URL to link to the home page... something is very wrong there;
- Also the module Blogs doesn't even seem to be used on the site, you may consider deactivate it, and eventually uninstall it, if not in use.

PS: Didn't see JohnnyB's post, sorry... ;)
"There are 10 types of people in this world, those who understand binary... and those who don't."
* by the way: English is NOT my native language (sorry for any mistakes...).
Code of Condut | CMSMS Docs | Help Support CMSMS
My developer Page on the Forge
GeekMoot 2015 in Ghent, Belgium: I was there!
GeekMoot 2016 in Leicester, UK: I was there!
DevMoot 2023 in Cynwyd, Wales: I was there!
jobo9968
New Member
New Member
Posts: 5
Joined: Fri Jan 03, 2014 6:22 pm

Re: Massive duplicate content

Post by jobo9968 »

the

Code: Select all

$config['url_rewriting'] = 'mod_rewrite';
been added to the config.php file already. If it's not working properly I have no clue why.

Code: Select all

{metadata}
is within the template file too

It's seem to be all messed up.

The whole problem is fixed though
The site is back in the index.
After the site was moved to another hosting the 301 non www to www stopped working. Plus google noticed plenty of server errors.
The site been moved back to the old hosting and I did some 301s (this way I lowered down the number of homepages) and the site ranks great now.
Some lesson here how important canonical and either www or non www version of the website is important to rank.
There's new site under development so there will be no CMSMS upgrade needed.
Thanks guys for all your help
Post Reply

Return to “The Lounge”