Getting Google to index your Blosxom blog correctly #
I originally asked (and answered) this question on Doctype, and reproduce it here (with slight modifications) for other Blosxom users:
Blosxom creates multiple copies of the same entry in different directories. For example, a given entry would have:
I wanted Google to prefer the first entry, since it makes finding content easier. I tried using robots.txt to exclude indexing of the topic and date indexes, but then Google ignores or cannot find the unique URLs either.
- its own unique URL
- a place in the topic index
- a place in the month index
- a place in the year index
I ended up adding
<meta name="robots" content="noindex, follow">
to the blog index pages via this this one-liner
find -name index.html -print0 | xargs -0 perl -pi -e 's/<head>/<head>\n<meta name="robots" content="noindex, follow">/g'
at the end of my static publishing routine.
That way, search engines can find the canonical URLs, but will ignore all of the topic and date index pages.
UPDATE 1: Just found someone with the same issue who also solved it with "noindex, follow".
UPDATE 2: After a few weeks Google was properly indexing my site, and continues to do so after more than a month.
UPDATE 3: Another report of using robots meta tag to fix search engine indexing.
/blosxom | Aug 13, 2010
Subscribe or visit the archives