Massive duplicate content
Massive duplicate content
Hello.
I'm very new to CMSMS
Doing some SEO for a client with a website built on your script.
The site has huge problem with duplicate content
The homepage has 5 different urls for homepage
domain.com/
domain.com/index.php
domain.com/en_US/home
domain.com/index.php/home
domain.com/index.php/en_US/home
and every page has 4 urls
domain.com/something
domain.com/en_US/something
domain.com/index.php/en_US/something
domain.com/index.php/something
That what I found.
Could be more of those.
I'm not sure if that's normal for the script, and if you are aware of this issue. This is huge problem for SEO, as the duplicate content penalty from google is real.
I managed to redirect non www to www within the htaccess file as before the number of the same urls was doubled (10 urls for homepage)
Tried some other redirect tricks (for instance redirecting index.php to site root) but couldn't make it to work
Can you please help with any solution?
rgds
Jan
I'm very new to CMSMS
Doing some SEO for a client with a website built on your script.
The site has huge problem with duplicate content
The homepage has 5 different urls for homepage
domain.com/
domain.com/index.php
domain.com/en_US/home
domain.com/index.php/home
domain.com/index.php/en_US/home
and every page has 4 urls
domain.com/something
domain.com/en_US/something
domain.com/index.php/en_US/something
domain.com/index.php/something
That what I found.
Could be more of those.
I'm not sure if that's normal for the script, and if you are aware of this issue. This is huge problem for SEO, as the duplicate content penalty from google is real.
I managed to redirect non www to www within the htaccess file as before the number of the same urls was doubled (10 urls for homepage)
Tried some other redirect tricks (for instance redirecting index.php to site root) but couldn't make it to work
Can you please help with any solution?
rgds
Jan
-
- Support Guru
- Posts: 8169
- Joined: Tue Oct 19, 2004 6:44 pm
Re: Massive duplicate content
This has been discussed numerous times before.
This is why you NEED to have the canonical URL's setup and working properly.
This is why you NEED to have the canonical URL's setup and working properly.
Follow me on twitter
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
Please post system information from "Extensions >> System Information" (there is a bbcode option) on all posts asking for assistance.
--------------------
If you can't bother explaining your problem well, you shouldn't expect much in the way of assistance.
Re: Massive duplicate content
I have a feeling these were URLs that you typed into the address bar of a browser, yes they will work but no search bot is going to guess URLs to see if you have duplicate content, CMSMS doesn't care what you put between mysite.com/ and the alias of the page, mysite.com/stupid/guessed/url/pagealias.html...
Unless I'm wrong and you found these URLs in google...
If you find duplicates of things like News look at a default page template for the canonical code in the <head> that you can use in your page templates...
Unless I'm wrong and you found these URLs in google...
If you find duplicates of things like News look at a default page template for the canonical code in the <head> that you can use in your page templates...
Re: Massive duplicate content
I love this kind of replies on forums.
You know the forum inside out. So do I my stuff.
I tried so many different searches.
Spent nearly whole day trying to find the solution.
Maybe I searched for wrong things.
If you just pointed me to right direction (url) I was so thankful.
rgds
You know the forum inside out. So do I my stuff.
I tried so many different searches.
Spent nearly whole day trying to find the solution.
Maybe I searched for wrong things.
If you just pointed me to right direction (url) I was so thankful.
rgds
Re: Massive duplicate content
Dr. CSS
I'm not sure that you are right. At least in my situation
When I click the link in my menu I go to domain.com/index.php/en_US/something
While google has indexed it domain.com/something
Atm all indexed pages in google are pages not related to CMSMS
These are some wordpress addons.
Even domain is not indexed.
I'm not sure that you are right. At least in my situation
When I click the link in my menu I go to domain.com/index.php/en_US/something
While google has indexed it domain.com/something
Atm all indexed pages in google are pages not related to CMSMS
These are some wordpress addons.
Even domain is not indexed.
Re: Massive duplicate content
Patience Padawan
Please review the following as it may help your situation.
Not all demo templates have the canonical link included, be sure you have this inside the < head >
Next, I highly recommend using config['root_url'] inside of the config.php file. New versions of CMSMS do not have this but I find it necessary because otherwise, relative urls could be parsed using an alternate domain name.
For example, if you have my-site.com (as the primary domain) and my-cool-site.com (as an add-on domain) pointing at the same installation without proper redirects in place, the links could use both domains depending upon how the site was reached by the visitor.
Setting the config['root_url'] leaves out the unknown.
This also allows you to set the 'www' prefix (or not) depending upon how you want the site to be indexed by Google, et al.
Next, set url rewriting in the config.php file to give yourself those pretty seo URLs without seeing the 'index.php' in the path.
Go to the /docs/htaccess.txt to see the apache rewrite rules needed inside of your .htaccess file.
Finally, be sure there is a <base> tag being printed in your HTML page. Without it, there is too much guessing. The template tag {metadata} will provide that to you and be sure your config.php file does not contain config['show_base'] = false;

Please review the following as it may help your situation.
Not all demo templates have the canonical link included, be sure you have this inside the < head >
Code: Select all
{if isset($canonical)}<link rel="canonical" href="{$canonical}" />{elseif isset($content_obj)}<link rel="canonical" href="{$content_obj->GetURL()}" />{/if}
For example, if you have my-site.com (as the primary domain) and my-cool-site.com (as an add-on domain) pointing at the same installation without proper redirects in place, the links could use both domains depending upon how the site was reached by the visitor.
Setting the config['root_url'] leaves out the unknown.
Code: Select all
$config['root_url'] = 'http://www.my-site';
Next, set url rewriting in the config.php file to give yourself those pretty seo URLs without seeing the 'index.php' in the path.
Code: Select all
$config['url_rewriting'] = 'mod_rewrite';
Finally, be sure there is a <base> tag being printed in your HTML page. Without it, there is too much guessing. The template tag {metadata} will provide that to you and be sure your config.php file does not contain config['show_base'] = false;
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo
--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
Re: Massive duplicate content
Also, a list of installed modules and your current system info would go a long way in terms of getting proper help. It seems like you have a multi-lang type of module... but there is not much more one can add to help without knowing which modules and versions you have installed...
"There are 10 types of people in this world, those who understand binary... and those who don't."
* by the way: English is NOT my native language (sorry for any mistakes...).
Code of Condut | CMSMS Docs | Help Support CMSMS
My developer Page on the Forge
GeekMoot 2015 in Ghent, Belgium: I was there!
GeekMoot 2016 in Leicester, UK: I was there!
DevMoot 2023 in Cynwyd, Wales: I was there!
* by the way: English is NOT my native language (sorry for any mistakes...).
Code of Condut | CMSMS Docs | Help Support CMSMS
My developer Page on the Forge
GeekMoot 2015 in Ghent, Belgium: I was there!
GeekMoot 2016 in Leicester, UK: I was there!
DevMoot 2023 in Cynwyd, Wales: I was there!
Re: Massive duplicate content
The site was just moved to another server 2 days before google deindexed the homepage.
Might be some issues with mod_rewrite? (I don't know to much about it)
As I said I managed to 301 non www to www.
This way I was able to lower down the number of urls by 50%
Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself
Do not see any changes in the urls the site is generating
the details
see also the site http://www.voltairediamonds.ie (I believe you can delete the url if you want)
CMS Version
1.4.1
Installed Modules
CMSMailer
1.73.14
FileManager
0.4.1
MenuManager
1.5.1
ModuleManager
1.2.1
nuSOAP
1.0.1
Printing
0.2.5
Search
1.5.1
ThemeManager
1.0.8
TinyMCE
2.4.8
FormBuilder
0.5.5
FormBrowser
0.2.3
FormSubmissions
0.2.3
CGExtensions
1.15.2
NMS
2.2
Cataloger
0.7
Blogs
0.3.3.1
Server API (server_api):
cgi-fcgi
Server Database (server_db_type):
MySQL (mysql)
Server Database Version (server_db_version):
5.5.34 Success
Server Software (server_software):
Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Server Operating System (server_os):
Linux 2.6.18-371.1.2.el5xen On i686
Might be some issues with mod_rewrite? (I don't know to much about it)
As I said I managed to 301 non www to www.
This way I was able to lower down the number of urls by 50%
Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself
Do not see any changes in the urls the site is generating
the details
see also the site http://www.voltairediamonds.ie (I believe you can delete the url if you want)
CMS Version
1.4.1
Installed Modules
CMSMailer
1.73.14
FileManager
0.4.1
MenuManager
1.5.1
ModuleManager
1.2.1
nuSOAP
1.0.1
Printing
0.2.5
Search
1.5.1
ThemeManager
1.0.8
TinyMCE
2.4.8
FormBuilder
0.5.5
FormBrowser
0.2.3
FormSubmissions
0.2.3
CGExtensions
1.15.2
NMS
2.2
Cataloger
0.7
Blogs
0.3.3.1
Server API (server_api):
cgi-fcgi
Server Database (server_db_type):
MySQL (mysql)
Server Database Version (server_db_version):
5.5.34 Success
Server Software (server_software):
Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Server Operating System (server_os):
Linux 2.6.18-371.1.2.el5xen On i686
Re: Massive duplicate content
I don't see the canonical link being sent to the head of the home page or other pages I checked. This is important because if you have 4 ways a page can be reached, the canonical link will tell Google, and others which of those is the dominant page URL.
(https://support.google.com/webmasters/a ... 9394?hl=en)
The other strange thing I see is the link structure presented in the menu and in the navigation Links in the head seems to indicate that there is a multi language module or custom code generating links. Your base ref tag looks good, but everything else seems to think that the root_url is http://www.voltairediamonds.ie/index.php/en_US.
The next thing which you may already realize is that you are running a very old version - mid 2008 - version of CMSMS which no one will want to advise about. Now that I think about it, the canonical URL wasn't supported back then and I posted a somewhat flimsy way around it.
Just found it here: http://forum.cmsmadesimple.org/viewtopi ... =4&t=30923
But, with all that said, if you want to post your entire config.php file here, I'll take a quick look and see if there's anything you should correct.
BTW, the source on one of the pages shows a ton of code in comments that exposes your server name and path which could be a security risk for you.
(https://support.google.com/webmasters/a ... 9394?hl=en)
The other strange thing I see is the link structure presented in the menu and in the navigation Links in the head seems to indicate that there is a multi language module or custom code generating links. Your base ref tag looks good, but everything else seems to think that the root_url is http://www.voltairediamonds.ie/index.php/en_US.
The next thing which you may already realize is that you are running a very old version - mid 2008 - version of CMSMS which no one will want to advise about. Now that I think about it, the canonical URL wasn't supported back then and I posted a somewhat flimsy way around it.
Just found it here: http://forum.cmsmadesimple.org/viewtopi ... =4&t=30923
But, with all that said, if you want to post your entire config.php file here, I'll take a quick look and see if there's anything you should correct.
BTW, the source on one of the pages shows a ton of code in comments that exposes your server name and path which could be a security risk for you.
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo
--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
Re: Massive duplicate content
You really need to update CMSMS, mainly for security reasons, but also because you'll only get support for the latest two versions of CMSMS.jobo9968 wrote:CMS Version
1.4.1
The canonical code is meant to let the search engines know what is the proper link to access the content and index only that link. What you have there seems to be a mix of problems:jobo9968 wrote:Now. I have added the canonical piece of code @JohnnyB posted here.
the root url was actually set up
and I added the mod_rewrite code myself
Do not see any changes in the urls the site is generating
- Pretty URL's don't seem to be properly configured: http://docs.cmsmadesimple.org/customizi ... timization ;
- there is a sample .htacess on the docs dir of a standard CMSMS install, but you may get a copy from the latest CMSMS archive;
- The default page (the home of the site) seems to be inside a "en_US" Section Header there is no need;
- Usually IIRC, CMSMS doesn't use anything other than the base URL to link to the home page... something is very wrong there;
- Also the module Blogs doesn't even seem to be used on the site, you may consider deactivate it, and eventually uninstall it, if not in use.
PS: Didn't see JohnnyB's post, sorry...

"There are 10 types of people in this world, those who understand binary... and those who don't."
* by the way: English is NOT my native language (sorry for any mistakes...).
Code of Condut | CMSMS Docs | Help Support CMSMS
My developer Page on the Forge
GeekMoot 2015 in Ghent, Belgium: I was there!
GeekMoot 2016 in Leicester, UK: I was there!
DevMoot 2023 in Cynwyd, Wales: I was there!
* by the way: English is NOT my native language (sorry for any mistakes...).
Code of Condut | CMSMS Docs | Help Support CMSMS
My developer Page on the Forge
GeekMoot 2015 in Ghent, Belgium: I was there!
GeekMoot 2016 in Leicester, UK: I was there!
DevMoot 2023 in Cynwyd, Wales: I was there!
Re: Massive duplicate content
the been added to the config.php file already. If it's not working properly I have no clue why.
is within the template file too
It's seem to be all messed up.
The whole problem is fixed though
The site is back in the index.
After the site was moved to another hosting the 301 non www to www stopped working. Plus google noticed plenty of server errors.
The site been moved back to the old hosting and I did some 301s (this way I lowered down the number of homepages) and the site ranks great now.
Some lesson here how important canonical and either www or non www version of the website is important to rank.
There's new site under development so there will be no CMSMS upgrade needed.
Thanks guys for all your help
Code: Select all
$config['url_rewriting'] = 'mod_rewrite';
Code: Select all
{metadata}
It's seem to be all messed up.
The whole problem is fixed though
The site is back in the index.
After the site was moved to another hosting the 301 non www to www stopped working. Plus google noticed plenty of server errors.
The site been moved back to the old hosting and I did some 301s (this way I lowered down the number of homepages) and the site ranks great now.
Some lesson here how important canonical and either www or non www version of the website is important to rank.
There's new site under development so there will be no CMSMS upgrade needed.
Thanks guys for all your help