News Pages not in Google

Do something cool with CMS? Show us ...
This board is for 'Answers', and the discussion of answers... Not for questions.
Post Reply
Schaboo
Forum Members
Forum Members
Posts: 13
Joined: Thu Sep 10, 2009 8:21 pm

News Pages not in Google

Post by Schaboo »

Hi there. I've given each of my news item pages (from the news module) a unique page title (roughly as described in http://forum.cmsmadesimple.org/index.ph ... 475.0.html) and yet these news pages are not showing up in Google organic search.  Anybody got any idea why this might be?

Many thanks in advance and all the best.
Last edited by Schaboo on Tue Apr 27, 2010 12:54 pm, edited 1 time in total.
Ziggywigged
Power Poster
Power Poster
Posts: 424
Joined: Sat Feb 02, 2008 12:42 am

Re: News Pages not in Google

Post by Ziggywigged »

Implement the CGFeedMaker module and submit the rss feed to Google as a sitemap.
Take a penny, leave a penny.
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: News Pages not in Google

Post by JohnnyB »

I use Sitemapmadesimple and then add in logic to grab articles from the news page.  My full template is:

Code: Select all

{* modified sitemap template *}
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

{foreach from=$output item='page'}
<url>
  <loc>{$page->url}</loc>
  <lastmod>{$page->date|date_format:"%Y-%m-%d"}</lastmod>
  <priority>{$page->priority}</priority>
  <changefreq>{$page->frequency}</changefreq>
</url>
{/foreach}

{capture assign='junk'}{news number='1000'}{/capture}
{foreach from=$items item=entry}
{assign var=utmpNEWS value=$entry->moreurl|replace:'//':'/60/'|replace:'http:/60':'http:/'}
<url>
  <loc>{$utmpNEWS}</loc>
{if $entry->postdate}
  <lastmod>{$entry->postdate|date_format:"%Y-%m-%d"}</lastmod>
{/if}
<priority>{$page->priority}</priority>
  <changefreq>{$page->frequency}</changefreq>
</url>
{/foreach}

</urlset>
Note: Change 60 to your detail page ID number - you can find that int he URL when viewing a news article

I can't remember where I found this - either in the forums or someone's personal site.  But, you can apply this approach to any module that generates Detail pages.
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
Schaboo
Forum Members
Forum Members
Posts: 13
Joined: Thu Sep 10, 2009 8:21 pm

Re: News Pages not in Google

Post by Schaboo »

Hi there.  Thanks to you both. I used MWW's template to create a sitemap, as I couldn't get CG Feedmaker to work. I'll keep an eye on Google results and let you know if this has done the trick.
Just out of interest why do you think the Googlebot didn't find these pages? - they were all well linked to from other pages (and other sites) that were indexing nicely. I just thought it would be a case of letting the crawlers do their work.

Thanks again.
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: News Pages not in Google

Post by JohnnyB »

Generally they should be able to find them.  Can we see a Link to your site? 
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: News Pages not in Google

Post by JohnnyB »

Right now the News URLs in the sitemap file are wrong.

Change the sitemap template where the number 60 is to the ID of your news pages.  For example, your news article URLs are http://www.roxan.co.uk/index.php/news/3 ... Rings.html so you would change the 60 to 58 for the above example. 

BUT, since different news categories have different detail pages for the full article, you'll need to set up the sitemap templlate for news by section. For example, you will need to do this for each category:

Find the news ID for each category by looking at an article detail URL from each category.  And change '60' to that ID number.  Here is an example template to target specific News categories -  you will need this for each category:

Code: Select all

{capture assign='junk'}{news number='1000' category='Your-Category-Name-Here'}{/capture}
{foreach from=$items item=entry}
	{if $entry->category == 'Your-Category-Name-Here'}
	{assign var=utmpNEWS value=$entry->moreurl|replace:'//':'/60/'|replace:'http:/60':'http:/'}
	<url>
	  <loc>{$utmpNEWS}</loc>
	{if $entry->postdate}
	  <lastmod>{$entry->postdate|date_format:"%Y-%m-%d"}</lastmod>
	{/if}
	<priority>{$page->priority}</priority>
	  <changefreq>{$page->frequency}</changefreq>
	</url>
	{/if}
{/foreach}
Change 'Your-Category-Name-Here' in both places to your category - use this block of code for each category and be sure to change the number 60 to the number in your news URL

P.S. you might need to edit a regular content page to get the sitemap to regenerate the URLs after changing the template.
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: News Pages not in Google

Post by JohnnyB »

I think the next step is to look at your Google Webmaster account and see if there are any crawl errors. 

My only other though is to use SEO 'pretty urls' for all the links on your site using mod_rewrite in case google is not crawling urls like http://www.roxan.co.uk/index.php?page=news-poultry properly.  For example, the pretty URL for this is http://www.roxan.co.uk/news-poultry.html
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
Schaboo
Forum Members
Forum Members
Posts: 13
Joined: Thu Sep 10, 2009 8:21 pm

Re: News Pages not in Google

Post by Schaboo »

Hi mmw. Thanks again.  I'll have a crack at sorting the sitemap - I was being lazy!

I just noticed "Disallow: /index.php?mact" in the robots.txt file.  Could this have been stopping the new pages indexing (before we applied pretty URLs)?
JohnnyB
Dev Team Member
Dev Team Member
Posts: 731
Joined: Tue Nov 21, 2006 5:05 pm

Re: News Pages not in Google

Post by JohnnyB »

Disallow: /index.php?mact"
yes, that is preventing those links. If you try, "site: roxan.co.uk" in the Google search box, you will see no pages indexed that use index.php? in the search results.  Google Webmasters account allows you to test your robots.txt file to see how Google will crawl your site.
"The art of life lies in a constant readjustment to our surroundings." -Okakura Kakuzo

--
LinkedIn profile
--
I only speak/write in English so I may not translate well on International posts.
--
Schaboo
Forum Members
Forum Members
Posts: 13
Joined: Thu Sep 10, 2009 8:21 pm

Re: News Pages not in Google

Post by Schaboo »

Righto.  I've changed the robots.txt file now and removed that line.  We did not add that line, so I'm thinking its a cmsms default, in which case its something for others to look out for.  Thanks for all your help
Post Reply

Return to “Tips and Tricks”