Description

The Bing bot generates a lot of page directories on disk for Calendar pages while scanning my wiki (ver. 1.9.4).

Steps to reproduce

  1. look in the web server log, e.g. /var/log/apache2/access.log

Example

Apache2 log:
65.52.104.87 - - [17/Jul/2012:18:13:54 +0200] "GET /wiki9/HelpOnMacros/MonthCalendar/2007-08-22?action=edit&template=MonthCalendarTemplate HTTP/1.1" 404 1938 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

added page, e.g.: HelpOnMacros(2f)MonthCalendar(2f)1998(2d)02(2d)09

Component selection

Details

MoinMoin Version

1.9.4

OS and Version

Ubuntu 10.04

Python Version

Server Setup

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

Workaround

Add to MoinMoin/config/multiconfig.py "bingbot"

    ('ua_spiders',
     ('archiver|bingbot|cfetch|charlotte|crawler|gigabot|googlebot|heritrix|holmes|htdig|httrack|httpunit|'
      'intelix|jeeves|larbin|leech|libwww-perl|linkbot|linkmap|linkwalk|litefinder|mercator|'
      'microsoft.url.control|mirror| mj12bot|msnbot|msrbot|neomo|nutbot|omniexplorer|puf|robot|scooter|seekbot|'
      'sherlock|slurp|sitecheck|snoopy|spider|teleport|twiceler|voilabot|voyager|webreaper|wget|yeti'),

Discussion

This is an issue in general: if one starts editing, a pagedir / edit-log will get created (even if one never saves).

Plan


CategoryMoinMoinBug

MoinMoin: MoinMoinBugs/BingBotGeneratesCalendarPages (last edited 2012-07-17 17:36:01 by ThomasWaldmann)