TinyApps.Org
Small is beautiful


 HOME

  0. Internet
  1. Text
  2. Graphics
  3. System
  4. File
  5. Misc
  6. Palm
  7. OS X

 BLOG

 DOCS

 FAQ

 RSS (?)





Getting Google to index your Blosxom blog correctly #
I originally asked (and answered) this question on Doctype, and reproduce it here (with slight modifications) for other Blosxom users:

Blosxom creates multiple copies of the same entry in different directories. For example, a given entry would have:
  • its own unique URL
  • a place in the topic index
  • a place in the month index
  • a place in the year index
I wanted Google to prefer the first entry, since it makes finding content easier. I tried using robots.txt to exclude indexing of the topic and date indexes, but then Google ignores or cannot find the unique URLs either.

I ended up adding

<meta name="robots" content="noindex, follow">

to the blog index pages via this this one-liner

find -name index.html -print0 | xargs -0 perl -pi -e 's/<head>/<head>\n<meta name="robots" content="noindex, follow">/g'

at the end of my static publishing routine.

That way, search engines can find the canonical URLs, but will ignore all of the topic and date index pages.

UPDATE 1: Just found someone with the same issue who also solved it with "noindex, follow".
UPDATE 2: After a few weeks Google was properly indexing my site, and continues to do so after more than a month.
UPDATE 3: Another report of using robots meta tag to fix search engine indexing.

/blosxom | Aug 13, 2010



Categories
/blosxom
/eink
/mac
/misc
/nix
/palm
/windows

Blosxom Archive
2012: 2 1
2011: 12 11 10 9 8 7 6 5 4 3 2 1
2010: 12 11 10 9 8 7 6 5 4 3 2 1
2009: 12 11 10 9 8 7 6 5 4 3 2 1
2008: 12 11 10 9 8 7 6 5 4 3 2 1
2007: 12 11 10 9 8 7 6 5 4 3 2 1
2006: 12 11 10 9 8 7 6 5 4 3 2 1
2005: 12 11 10

Blogger Archive
2005: 10 9 8 7 6 5 4 3 2 1
2004: 12 11 10 9 8 7 6 5 4 3 2 1
2003: 12 11 10 9 8 7 6

Ezine Archive
2004: 4 3 2 1
2003: 12 9 8 7 6 5 4 2 1
2002: 12 10 9 8 7 6 5 3 2 1
2001: 12 11 10