Description

If a page name starts with an accented character, those pages appear in the end of the list after a search (i.e., a category search), rather than where they would belong if the sort order were alphabetical.

The same applies for page names in lower case letters. Althoough I agree that that's not the rule, strange (by the rules) names do appear now and then in my wiki.

Steps to reproduce

  1. Run a wiki in an environment where accented characters are common (Germany, Switzerland, …)
  2. Create a page with a name starting in an accented character and ad it to some category
  3. List the pages in that category

Example

SearchResultSortedByAsciiNotAlphabet.png

Component selection

Details

MoinMoin Version

1.8.1

OS and Version

Windows XP SP2

Python Version

2.5.4

Server Setup

Apache 2.2 / mod_wsgi

Language you are using the wiki in (set in the browser/UserPreferences)

German (de)

Workaround

I've modified the file MoinMoin/search/results.py to use the system's default locale. Diff to the original:

--- search/results.orig.py      2009-01-06 23:22:35.218750000 +0100
+++ search/results.py   2009-01-06 23:24:13.796875000 +0100
@@ -15,6 +15,10 @@
 from MoinMoin import wikiutil
 from MoinMoin.Page import Page
 
+import locale
+locale.setlocale(locale.LC_ALL, "")
+localized_cmp=lambda p1, p2: locale.strcoll(p1[0], p2[0])
+
 ############################################################################
 ### Results
 ############################################################################
@@ -257,7 +261,7 @@
     def _sortByPagename(self):
         """ Sorts a list of found pages alphabetical by page name """
         tmp = [(hit.page_name, hit) for hit in self.hits]
-        tmp.sort()
+        tmp.sort(cmp=localized_cmp)
         self.hits = [item[1] for item in tmp]
 
     def stats(self, request, formatter, hitsFrom):

/!\ You need to run diff -u orig new.

(./) done, thanks

/!\ How is the locale of the system used for the server related to the content in the wiki?

That question is a good one… As I'm in a very confined intranet environment (and because it's the "Workaround" section ;-) ), it was of no concern for me: the server locale is ok for all clients.

Suggestions:

The above fix breaks Xapian searching for attachments of MIME type application/octet-stream, which relies on string.letters being strictly ASCII. Another modification fixes this:

--- filter/application_octet_stream.orig.py     2008-08-31 22:00:52.000000000 +0200
+++ filter/application_octet_stream.py  2009-02-16 22:29:58.127190800 +0100
@@ -36,7 +36,7 @@
 norm = string.maketrans('', '')
 
 # builds a list of all non-alphanumeric characters:
-non_alnum = string.translate(norm, norm, string.letters+string.digits)
+non_alnum = string.translate(norm, norm, string.ascii_letters+string.digits)
 
 # translate table that replaces all non-alphanumeric by blanks:
 trans_nontext = string.maketrans(non_alnum, ' '*len(non_alnum))

/!\ You need to run diff -u orig new.

(./) done, thanks

Discussion

Plan


CategoryMoinMoinBug

MoinMoin: MoinMoinBugs/SearchResultSortedByAsciiNotAlphabet (last edited 2009-02-17 16:55:58 by securemail3)