Description

RecentChanges does not use index,follow, but index,nofollow.

Details

This Wiki.

Discussion

Currently into all pages but FrontPage, FindPage, SiteNavigation, and TitleIndex, a <meta name="robots" content="index,nofollow"> header is put in order to reduce the machine load and bandwidth loss caused by spiders.

If a searchbot could access the wiki's contents also via the RecentChanges page, it would be easier for it to re-index especially pages that have recently been changed, thus keeping search engine results more up-to-date. Plus, searchbots tend to re-index pages that change often more frequently, and RecentChanges virtually changes any time it is visited, whereas TitleIndex not necessarily changes, even if most of the wiki's contents was changed. This is why I believe RecentChanges to be the most important page for indexing, and should use index,follow, too.

This is a bug to the extent of good support for search engines. If they cannot search a wiki, it will not be found. So I think that we should fix the meta-tag.

/!\ You need to use "RecentChanges" for checking, no translation of it.

RecentChanges is very expensive page, and having all localized rc pages indexed and followed is expensive, and does not add new information, since all of them are listing the same changes in other pages.

TitleIndex, SiteNavigation and FindPage are not expensive, but all translation usually point to the same links in the wiki.

Localised FrontPage is usually edited for the wiki, and link to the wiki important pages. translated front pages are usually template pages that does not say anyting about the wiki, and there is no point to visit them.

Suggested fix:

  1. Only the localized version of RecentChanges, FrontPage, TitleIndex, FindPage and SiteNavigaion will have index, follow.

  2. localised version is a version that use the wiki default_lang

For example, on a Hebrew wiki, פתיחה, שינויים אחרונים, חיפוש, ומפתח דפים will be indexed and followed.

Maybe we should create a special page for indexing, which is cheaper to create then recent changes, for example, contain just a list of links to the pages that was last edited, one link per page, and list few weeks of changes so the robot can use only this page for indexing. Since the log history never changes, we can cache the contents of this page, and simply remove pages from the end of the list when we add new pages on each save operation.

The "special page" for search engines can be simply an action, and can be cheaper to produce then RecentChanges. But this is only a speculation, I don't know what make RecentChanges expensive, maybe we can simply make it much faster by caching, like we do for page statistics.

Plan

Now: apply the suggested fix, using only translated pages of the wiki default lang future: check FeatureRequests/AlternativeSpiderControlFeatures etc.

MoinMoin 1.5 seems to have a different behaviour here: When I'm looking on the Google cache version of the front page of a german-only wiki (e.g. ÜberSicht), the link bar is in english (that is RecentChanges, FindPage and so on), hence allowing the search bot to access the english-language RecentChanges (without nofollow) via the front page, even if this one is customized and/or if an other language is the default. Am I missing any point, or can we close this issue? -- MartinBayer 2006-01-22 18:58:15


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/RecentChangesUsesNofollow (last edited 2007-10-29 19:06:05 by localhost)