Archive for the ‘SEO News’ Category

Tips on using feeds and information on subscriber counts in Reader

Wednesday, March 7th, 2007

Posted by Nick Baum, Google Reader Product Manager

Does your site have a feed? A feed can connect you to your readers and keep them returning to your content. Most blogs have feeds, but increasingly, other types of sites with frequently changing content are making feeds available as well. Some examples of sites that offer feeds:

Find out how many readers are subscribed to your feed
If your site has a feed, you can now get information about the number of Google Reader and Google Personalized Homepage subscribers. If you use Feedburner, you’ll start to see numbers from these subscriptions taken into account. You can also find this number in the crawling data in your logs. We crawl feeds with the user-agent Feedfetcher-Google, so simply look for this user-agent in your logs to find the subscriber number. If multiple URLs point to the same feed, we may crawl each separately, so in this case, just count up the subscriber numbers listed for each unique feed-id. An example of what you might see in your logs is below:

User-Agent: Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 4 subscribers; feed-id=1794595805790851116)

Making your feed available to Google
You can submit your feed as a Sitemap in webmaster tools. This will let us know about the URLs listed in the feed so we can crawl and index them for web search. In addition, if you want to make sure your feed shows up in the list of available feeds for Google products, simply add a <link> tag with the feed URL to the <head> section of your page. For instance:

<link rel=”alternate” type=”application/atom+xml” title=”Your Feed Title” href=”http://www.example.com/atom.xml” />

Remember that Feedfetcher-Google retrieves feeds only for use in Google Reader and Personalized Homepage. For the content to appear in web search results, Googlebot will have to crawl it as well.

Don’t yet have a feed?

If you use a content management system or blogging platform, feed functionality may be built right now. For instance, if you use Blogger, you can go to Settings > Site Feed and make sure that Publish Site Feed is set to Yes. You can also set the feed to either full or short and can add a footer. The URL listed here is what subscribers add to their feed readers. A link to this URL will appear on your blog.

More tips from the Google Reader team
In order to provide the best experience for your users, the Google Reader team has also put together some tips for feed publishers. This document covers feed best practices, common implementation pitfalls, and various ways to promote your feeds. Whether you’re creating your feeds from scratch or have been publishing them for a long time, we encourage you to take a look at our tips to make the most of your feeds. If you have any questions, please get in touch.
| Links to this post

| Post a Comment |

Using the robots meta tag

Wednesday, March 7th, 2007

Posted by Vanessa Fox

Recently, Danny Sullivan brought up good questions about how search engines handle meta tags. Here are some answers about how we handle these tags at Google.

Multiple content values
We recommend that you place all content values in one meta tag. This keeps the meta tags easy to read and reduces the chance for conflicts. For instance:

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

If the page contains multiple meta tags of the same type, we will aggregate the content values. For instance, we will interpret

<META NAME=”ROBOTS” CONTENT=”NOINDEX”>
<META NAME=”ROBOTS” CONTENT=”NOFOLLOW”>

The same way as:

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

If content values conflict, we will use the most restrictive. So, if the page has these meta tags:

<META NAME=”ROBOTS” CONTENT=”NOINDEX”>
<META NAME=”ROBOTS” CONTENT=”INDEX”>

We will obey the NOINDEX value.

Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there’s no need to tag pages with content values of INDEX or FOLLOW.

Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to “ROBOTS”. To provide instruction for only Googlebot, set the meta name to “GOOGLEBOT”. If you want to provide different instructions for different search engines (for instance, if you want one search engine to index a page, but not another), it’s best to use a specific meta tag for each search engine rather than use a generic robots meta tag combined with a specific one. You can find a list of bots at robotstxt.org.

Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta tags is interpreted in exactly the same way:

<meta name=”ROBOTS” content=”NOODP”>
<meta name=”robots” content=”noodp”>
<meta name=”Robots” content=”NoOdp”>

If you have multiple content values, you must place a comma between them, but it doesn’t matter if you also include spaces. So the following meta tags are interpreted the same way:

<METANAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
<META NAME=”ROBOTS” CONTENT=”NOINDEX,NOFOLLOW”>

If you use both a robots.txt file and robots meta tags
If the robots.txt and meta tag instructions for a page conflict, Googlebot follows the most restrictive. More specifically:

  • If you block a page with robots.txt, Googlebot will never crawl the page and will never read any meta tags on the page.
  • If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.

Valid meta robots content values
Googlebot interprets the following robots meta tag values:

  • NOINDEX - prevents the page from being included in the index.
  • NOFOLLOW - prevents Googlebot from following any links on the page. (Note that this is different from the link-level NOFOLLOW attribute, which prevents Googlebot from following an individual link.)
  • NOARCHIVE - prevents a cached copy of this page from being available in the search results.
  • NOSNIPPET - prevents a description from appearing below the page in the search results, as well as prevents caching of the page.
  • NOODP - blocks the Open Directory Project description of the page from being used in the description that appears below the page in the search results.
  • NONE - equivalent to “NOINDEX, NOFOLLOW”.

A word about content value “NONE”
As defined by robotstxt.org, the following direction means NOINDEX, NOFOLLOW.

<META NAME=”ROBOTS” CONTENT=”NONE”>

However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.
| Links to this post

| Post a Comment |

All about robots

Wednesday, March 7th, 2007

Posted by Dennis

Search engine robots, including our very own Googlebot, are incredibly polite. They work hard to respect your every wish regarding what pages they should and should not crawl. How can they tell the difference? You have to tell them, and you have to speak their language, which is an industry standard called the Robots Exclusion Protocol.

Dan Crow has written about this on the Google Blog recently, including an introduction to setting up your own rules for robots and a description of some of the more advanced options. His first two posts in the series are:
Controlling how search engines access and index your website
The Robots Exclusion Protocol
Stay tuned for the next installment.

While we’re on the topic, I’d also like to point you to the robots section of our help center and our earlier posts on this topic:
Debugging Blocked URLs
All About Googlebot
Using a robots.txt File