Page 1 of 1

Massive duplicate content

Posted: Fri Jan 03, 2014 6:50 pm
by jobo9968
Hello.
I'm very new to CMSMS

Doing some SEO for a client with a website built on your script.
The site has huge problem with duplicate content
The homepage has 5 different urls for homepage

domain.com/
domain.com/index.php
domain.com/en_US/home
domain.com/index.php/home
domain.com/index.php/en_US/home

and every page has 4 urls

domain.com/something
domain.com/en_US/something
domain.com/index.php/en_US/something
domain.com/index.php/something

That what I found.
Could be more of those.
I'm not sure if that's normal for the script, and if you are aware of this issue. This is huge problem for SEO, as the duplicate content penalty from google is real.

I managed to redirect non www to www within the htaccess file as before the number of the same urls was doubled (10 urls for homepage)
Tried some other redirect tricks (for instance redirecting index.php to site root) but couldn't make it to work

Can you please help with any solution?
rgds

Jan

Re: Massive duplicate content

Posted: Fri Jan 03, 2014 6:53 pm
by calguy1000
This has been discussed numerous times before.

This is why you NEED to have the canonical URL's setup and working properly.

Re: Massive duplicate content

Posted: Fri Jan 03, 2014 6:56 pm
by Dr.CSS
I have a feeling these were URLs that you typed into the address bar of a browser, yes they will work but no search bot is going to guess URLs to see if you have duplicate content, CMSMS doesn't care what you put between mysite.com/ and the alias of the page, mysite.com/stupid/guessed/url/pagealias.html...

Unless I'm wrong and you found these URLs in google...

If you find duplicates of things like News look at a default page template for the canonical code in the <head> that you can use in your page templates...

Re: Massive duplicate content

Posted: Fri Jan 03, 2014 7:01 pm
by jobo9968
I love this kind of replies on forums.
You know the forum inside out. So do I my stuff.
I tried so many different searches.
Spent nearly whole day trying to find the solution.
Maybe I searched for wrong things.
If you just pointed me to right direction (url) I was so thankful.

rgds

Re: Massive duplicate content

Posted: Fri Jan 03, 2014 7:38 pm
by jobo9968
Dr. CSS
I'm not sure that you are right. At least in my situation
When I click the link in my menu I go to domain.com/index.php/en_US/something

While google has indexed it domain.com/something

Atm all indexed pages in google are pages not related to CMSMS
These are some wordpress addons.
Even domain is not indexed.

Re: Massive duplicate content

Posted: Fri Jan 03, 2014 7:45 pm
by JohnnyB
Patience Padawan ;)
Please review the following as it may help your situation.

Not all demo templates have the canonical link included, be sure you have this inside the < head >

Code: Select all

{if isset($canonical)}<link rel="canonical" href="{$canonical}" />{elseif isset($content_obj)}<link rel="canonical" href="{$content_obj->GetURL()}" />{/if}
Next, I highly recommend using config['root_url'] inside of the config.php file. New versions of CMSMS do not have this but I find it necessary because otherwise, relative urls could be parsed using an alternate domain name.

For example, if you have my-site.com (as the primary domain) and my-cool-site.com (as an add-on domain) pointing at the same installation without proper redirects in place, the links could use both domains depending upon how the site was reached by the visitor.

Setting the config['root_url'] leaves out the unknown.

Code: Select all

$config['root_url'] = 'http://www.my-site';
This also allows you to set the 'www' prefix (or not) depending upon how you want the site to be indexed by Google, et al.

Next, set url rewriting in the config.php file to give yourself those pretty seo URLs without seeing the 'index.php' in the path.

Code: Select all

$config['url_rewriting'] = 'mod_rewrite';
Go to the /docs/htaccess.txt to see the apache rewrite rules needed inside of your .htaccess file.

Finally, be sure there is a <base> tag being printed in your HTML page. Without it, there is too much guessing. The template tag {metadata} will provide that to you and be sure your config.php file does not contain config['show_base'] = false;

Re: Massive duplicate content

Posted: Fri Jan 03, 2014 8:48 pm
by Jo Morg
Also, a list of installed modules and your current system info would go a long way in terms of getting proper help. It seems like you have a multi-lang type of module... but there is not much more one can add to help without knowing which modules and versions you have installed...

Re: Massive duplicate content

Posted: Sat Jan 04, 2014 5:20 pm
by jobo9968
The site was just moved to another server 2 days before google deindexed the homepage.
Might be some issues with mod_rewrite? (I don't know to much about it)
As I said I managed to 301 non www to www.
This way I was able to lower down the number of urls by 50%

Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself

Do not see any changes in the urls the site is generating

the details

see also the site http://www.voltairediamonds.ie (I believe you can delete the url if you want)

CMS Version

1.4.1

Installed Modules

CMSMailer

1.73.14

FileManager

0.4.1

MenuManager

1.5.1

ModuleManager

1.2.1

nuSOAP

1.0.1

Printing

0.2.5

Search

1.5.1

ThemeManager

1.0.8

TinyMCE

2.4.8

FormBuilder

0.5.5

FormBrowser

0.2.3

FormSubmissions

0.2.3

CGExtensions

1.15.2

NMS

2.2

Cataloger

0.7

Blogs

0.3.3.1



Server API (server_api):

cgi-fcgi

Server Database (server_db_type):

MySQL (mysql)

Server Database Version (server_db_version):

5.5.34 Success

Server Software (server_software):

Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635

Server Operating System (server_os):

Linux 2.6.18-371.1.2.el5xen On i686

Re: Massive duplicate content

Posted: Sat Jan 04, 2014 9:00 pm
by JohnnyB
I don't see the canonical link being sent to the head of the home page or other pages I checked. This is important because if you have 4 ways a page can be reached, the canonical link will tell Google, and others which of those is the dominant page URL.

(https://support.google.com/webmasters/a ... 9394?hl=en)

The other strange thing I see is the link structure presented in the menu and in the navigation Links in the head seems to indicate that there is a multi language module or custom code generating links. Your base ref tag looks good, but everything else seems to think that the root_url is http://www.voltairediamonds.ie/index.php/en_US.

The next thing which you may already realize is that you are running a very old version - mid 2008 - version of CMSMS which no one will want to advise about. Now that I think about it, the canonical URL wasn't supported back then and I posted a somewhat flimsy way around it.
Just found it here: http://forum.cmsmadesimple.org/viewtopi ... =4&t=30923

But, with all that said, if you want to post your entire config.php file here, I'll take a quick look and see if there's anything you should correct.

BTW, the source on one of the pages shows a ton of code in comments that exposes your server name and path which could be a security risk for you.

Re: Massive duplicate content

Posted: Sat Jan 04, 2014 9:04 pm
by Jo Morg
jobo9968 wrote:CMS Version

1.4.1
You really need to update CMSMS, mainly for security reasons, but also because you'll only get support for the latest two versions of CMSMS.
jobo9968 wrote:Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself

Do not see any changes in the urls the site is generating
The canonical code is meant to let the search engines know what is the proper link to access the content and index only that link. What you have there seems to be a mix of problems:
- Pretty URL's don't seem to be properly configured: http://docs.cmsmadesimple.org/customizi ... timization ;
- there is a sample .htacess on the docs dir of a standard CMSMS install, but you may get a copy from the latest CMSMS archive;
- The default page (the home of the site) seems to be inside a "en_US" Section Header there is no need;
- Usually IIRC, CMSMS doesn't use anything other than the base URL to link to the home page... something is very wrong there;
- Also the module Blogs doesn't even seem to be used on the site, you may consider deactivate it, and eventually uninstall it, if not in use.

PS: Didn't see JohnnyB's post, sorry... ;)

Re: Massive duplicate content

Posted: Wed Jan 08, 2014 6:36 pm
by jobo9968
the

Code: Select all

$config['url_rewriting'] = 'mod_rewrite';
been added to the config.php file already. If it's not working properly I have no clue why.

Code: Select all

{metadata}
is within the template file too

It's seem to be all messed up.

The whole problem is fixed though
The site is back in the index.
After the site was moved to another hosting the 301 non www to www stopped working. Plus google noticed plenty of server errors.
The site been moved back to the old hosting and I did some 301s (this way I lowered down the number of homepages) and the site ranks great now.
Some lesson here how important canonical and either www or non www version of the website is important to rank.
There's new site under development so there will be no CMSMS upgrade needed.
Thanks guys for all your help