How to avoid wiki LinkSpam on all moin moin wikis globally.

See also AntiSpamFeatures, NetworkOfMoinMoinWikis

Install

For version above 1.6.0, use:

from MoinMoin.security.antispam import SecurityPolicy

Otherwise:

from MoinMoin.util.antispam import SecurityPolicy
#...

If you are using older moin versions, please upgrade.

How it works

That extension will fetch this page from MoinMaster: BadContent. This page will be automatically kept in sync with moinmaster, do not edit it or your edits will be overwritten.

Together with LocalBadContent (you can use it to add own regular expressions), this builds your spammer protection. Any save with links that match one of those regular expressions will be denied.

The BadContent page is #acl All:read so spammers can't edit it. In fact, only wiki admins on moinmaster can change it.

Format of (Local)BadContent

#format plain
## Any line starting with a # will not be considered as regex,
## but any other line will! So do not put other text or wiki markup on this page
## or it will be considered as bad content, too - which might drive you crazy until
## you notice what went wrong.
spammer.com
anotherspammer.com
...

The code markup ({{{ and }}}) used to show the listing above MUST NOT be put on the BadContent page.

Contribute!

If you want to contribute spam link patterns, use this page: BadContent

Discussion

This method does NOT use IP address based banning, but content (link) based banning. That blacklist on BadContent contains patterns to match spam links. Technically, it would be possible to censor "offensive words" with it, too, but this won't happen through the moinmaster list.

If you are looking for IP banning, try BlackList (discussion about that see there).

This solution is not limited to moinmoin. If you can process our regular expressions, feel free to use them. You either can get them via http request (using ?action=raw) or via xmlrpc2. But if you want to use it for other wiki engines, please make a mirror of that page on the engines' site and direct people there.

Need some sort of 'Good Content' filter

For some Wikis, particular items in the BadContent listing are inappropriate. I work on a MoinMoin Wiki about poker at www.overcards.com; I routinely need to edit the local copy of the BadContent because all links with "poker" in them are banned. I can see why others might consider all such links spam, but on a poker wiki, such a restriction is crippling!

I don't like the idea of pulling the word from the master, though I'm tempted frequently. . and I don't want to disconnect from the global solution and the daily updates. Any simple way out?

I can't tell from the discussions and code fragment below whether this is an accepted "White" pages patch, of if it's the subject of the debate. Perhaps that refactoring is due? -- MentalNomad

I think that I have seen a patch which allows you to have a whitelist. Anyway, it would not be difficult to write. But in this case, you can fix BadContent on MoinMaster as well (see EditingOnMoinMaster).

Problems with this patch

   1     def save(self, editor, newtext, datestamp, **kw):
   2         BLACKLISTPAGES = ["BadContent","LocalBadContent"]
   3         WHITELISTPAGES = ["GoodContent","LocalGoodContent"]
   4         if not editor.page_name in BLACKLISTPAGES + WHITELISTPAGES:
   5             request = editor.request
   6             blacklist = []
   7             for pn in BLACKLISTPAGES:
   8                 do_update = (pn != "LocalBadContent")
   9                 blacklist += getblacklist(request, pn, do_update)
  10             whitelist = []
  11             for pn in WHITELISTPAGES:
  12                 do_update = (pn != "LocalGoodContent")
  13                 whitelist += getblacklist(request, pn, do_update)
  14             for page_re in whitelist:
  15                 blacklist.remove( page_re )

Thomas Waldman responded:


/!\ If the MoinMaster wiki hangs saving pages hangs, too. We need a sensible timeout!!!!! 20 seconds?

More Ideas

Mark SPAM on revert

Offer a check box "Mark links as spam" in the revert action. If the user checks the box the removed lines are searched for external links. If the user may write LocalBadContent the links are added to it (do RE quoting!). If the user is not allowed to edit LocalBadContent the link is added to LocalBadContent/StagingArea and may be moved to LocalBadContent by another user later on.

Perhaps strip the file name and use the domain name only.

Distributed mark on revert

Same as "Mark SPAM on revert" but the links are added to MoinMoin:LocalBadContent/StagingArea. This would give us the chance to easily get all the spam out there to add it to our list. This feature would only be enabled if the wiki uses AntiSpamGlobalSolution to avoid too much double hits.

chongqing

As spammer try to trick google to get better ranking by abusing our rank/reputation we could use our rank to lower their ranking.

See http://chongqed.org/chongqed.html

Experimental list export: http://distribute.chongqed.org/

Text file tab seperated lines with

URL

key words

link to Chongqed.org

We could simple add these link automatically to our pages. Perhaps invisible white on white or hidden under a fixed image. We could show this links for search engine bots only.

Don't do that! http://www.google.com/webmasters/faq.html: "The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. ... To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings."

If you aren't into chongqing, you could also use http://blacklist.chongqed.org/ to block the spammers that are in the chongqed.org database.

We have discontinued the distribute list but are discussing another better method to accomplish the same thing. -- Joe(at)chongqed.org

LocalBadContent Changes Underlay

I wish changes to LocalBadContent were made to the UnderlayPages version.

My ~10-15 MoinMoin instances tend to get hit in series, and I don't like having to change 10-15 pages when a spammer decides to test the waters.

Right now, I just edit the underlay page directly. But I'd like to be able to do so remotely, without ssh'ing in.

-- LionKimbro 2005-02-08 08:40:28

The purpose of the current underlay directory is easy upgrades, not sharing of local content. Shared content is real need and will have to be solved outside of the underlay directory.

As a solution, I would write a macro and action that lets you edit the contents of the local bad content underlay page. The macro can get the raw text of the page, show it in a text area, and let you save the new text (with possibly no backup). Require admin rights to use that macro and you have an easy remote way.

Blocking chinese

Many wikis out there don't have legitimate chinese content, but they do get lots of chinese spam.

If you are sure your wiki has no legitimate chinese content, you can use this on your LocalBadContent page:

/!\ Never ever put that onto the master BadContent page. Of course there are also quite some wikis with legitimate chinese content!

/!\ Also be aware that if some CJK user of your wiki created a homepage and put some stuff on it in his/her language, he maybe won't be able to save his page again if you use those regexes, so be careful!

# and
# or
# you (2nd means respect)
# we (2nd is Simplified Chinese)
我們
我们

A even more universal regex is to forbid all "CJK Unified Ideographs" (Chinese, Japanese, Korean), which are in U+4E00 - U+9FCF:

[\u4e00-\u9fc3]


CategorySpam

MoinMoin: AntiSpamGlobalSolution (last edited 2011-09-15 01:47:50 by QingpingHou)