Page 1 of 1
News Pages not in Google
Posted: Tue Apr 27, 2010 12:51 pm
by Schaboo
Hi there. I've given each of my news item pages (from the news module) a unique page title (roughly as described in
http://forum.cmsmadesimple.org/index.ph ... 475.0.html) and yet these news pages are not showing up in Google organic search. Anybody got any idea why this might be?
Many thanks in advance and all the best.
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 1:28 pm
by Ziggywigged
Implement the CGFeedMaker module and submit the rss feed to Google as a sitemap.
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 2:00 pm
by JohnnyB
I use Sitemapmadesimple and then add in logic to grab articles from the news page. My full template is:
Code: Select all
{* modified sitemap template *}
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{foreach from=$output item='page'}
<url>
<loc>{$page->url}</loc>
<lastmod>{$page->date|date_format:"%Y-%m-%d"}</lastmod>
<priority>{$page->priority}</priority>
<changefreq>{$page->frequency}</changefreq>
</url>
{/foreach}
{capture assign='junk'}{news number='1000'}{/capture}
{foreach from=$items item=entry}
{assign var=utmpNEWS value=$entry->moreurl|replace:'//':'/60/'|replace:'http:/60':'http:/'}
<url>
<loc>{$utmpNEWS}</loc>
{if $entry->postdate}
<lastmod>{$entry->postdate|date_format:"%Y-%m-%d"}</lastmod>
{/if}
<priority>{$page->priority}</priority>
<changefreq>{$page->frequency}</changefreq>
</url>
{/foreach}
</urlset>
Note: Change 60 to your detail page ID number - you can find that int he URL when viewing a news article
I can't remember where I found this - either in the forums or someone's personal site. But, you can apply this approach to any module that generates Detail pages.
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 2:32 pm
by Schaboo
Hi there. Thanks to you both. I used MWW's template to create a sitemap, as I couldn't get CG Feedmaker to work. I'll keep an eye on Google results and let you know if this has done the trick.
Just out of interest why do you think the Googlebot didn't find these pages? - they were all well linked to from other pages (and other sites) that were indexing nicely. I just thought it would be a case of letting the crawlers do their work.
Thanks again.
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 2:48 pm
by JohnnyB
Generally they should be able to find them. Can we see a Link to your site?
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 2:59 pm
by Schaboo
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 3:19 pm
by JohnnyB
Right now the News URLs in the sitemap file are wrong.
Change the sitemap template where the number 60 is to the ID of your news pages. For example, your news article URLs are
http://www.roxan.co.uk/index.php/news/3 ... Rings.html so you would change the 60 to 58 for the above example.
BUT, since different news categories have different detail pages for the full article, you'll need to set up the sitemap templlate for news by section. For example, you will need to do this for each category:
Find the news ID for each category by looking at an article detail URL from each category. And change '60' to that ID number. Here is an example template to target specific News categories - you will need this for each category:
Code: Select all
{capture assign='junk'}{news number='1000' category='Your-Category-Name-Here'}{/capture}
{foreach from=$items item=entry}
{if $entry->category == 'Your-Category-Name-Here'}
{assign var=utmpNEWS value=$entry->moreurl|replace:'//':'/60/'|replace:'http:/60':'http:/'}
<url>
<loc>{$utmpNEWS}</loc>
{if $entry->postdate}
<lastmod>{$entry->postdate|date_format:"%Y-%m-%d"}</lastmod>
{/if}
<priority>{$page->priority}</priority>
<changefreq>{$page->frequency}</changefreq>
</url>
{/if}
{/foreach}
Change 'Your-Category-Name-Here' in both places to your category - use this block of code for each category and be sure to change the number 60 to the number in your news URL
P.S. you might need to edit a regular content page to get the sitemap to regenerate the URLs after changing the template.
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 3:22 pm
by JohnnyB
I think the next step is to look at your Google Webmaster account and see if there are any crawl errors.
My only other though is to use SEO 'pretty urls' for all the links on your site using mod_rewrite in case google is not crawling urls like
http://www.roxan.co.uk/index.php?page=news-poultry properly. For example, the pretty URL for this is
http://www.roxan.co.uk/news-poultry.html
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 3:46 pm
by Schaboo
Hi mmw. Thanks again. I'll have a crack at sorting the sitemap - I was being lazy!
I just noticed "Disallow: /index.php?mact" in the robots.txt file. Could this have been stopping the new pages indexing (before we applied pretty URLs)?
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 4:02 pm
by JohnnyB
Disallow: /index.php?mact"
yes, that is preventing those links. If you try, "site: roxan.co.uk" in the Google search box, you will see no pages indexed that use index.php? in the search results. Google Webmasters account allows you to test your robots.txt file to see how Google will crawl your site.
Re: News Pages not in Google
Posted: Tue Apr 27, 2010 4:11 pm
by Schaboo
Righto. I've changed the robots.txt file now and removed that line. We did not add that line, so I'm thinking its a cmsms default, in which case its something for others to look out for. Thanks for all your help