Getting Google to index your Blosxom blog correctly #

I originally asked (and answered) this question on Doctype, and reproduce it here (with slight modifications) for other Blosxom users:

Blosxom creates multiple copies of the same entry in different directories. For example, a given entry would have:

I wanted Google to prefer the first entry, since it makes finding content easier. I tried using robots.txt to exclude indexing of the topic and date indexes, but then Google ignores or cannot find the unique URLs either.

I ended up adding

<meta name="robots" content="noindex, follow">

to the blog index pages via this one-liner

find -name index.html -print0 | xargs -0 perl -pi -e 's/<head>/<head>\n<meta name="robots" content="noindex, follow">/g'

at the end of my static publishing routine.

That way, search engines can find the canonical URLs, but will ignore all of the topic and date index pages.

UPDATE 1: Just found someone with the same issue who also solved it with "noindex, follow".

UPDATE 2: After a few weeks Google was properly indexing my site, and continues to do so after more than a month.

UPDATE 3: Another report of using robots meta tag to fix search engine indexing.

/blosxom | Aug 13, 2010


Subscribe or visit the archives.