Description

Problem when unzipping a .zip file that contains a file name with a non-ascii file name.

Task: find a good name for a encoding list, add a config attribute for it with a sane default, extend the unzipping code to try decoding using that list (also look at other places in moin dealing with that sort of problem). Research the zip standard and its known extensions and suggest a better way for handling this (this might mean filing a bug / feature request for python's stdlib).

Steps to reproduce

  1. create file with the name "obrázky.jpg" ( I did it under czech windows )
  2. zip the file
  3. upload the file
  4. try to unzip it

Happens in THIS wiki (currently moin 1.5.6+).

Details

traceback.html

In this special case filename must be decoded with iso-8859-1 decoder.

The problem is we maybe never can know which encoding the user used to encode the filename (or is this put somewhere into the zip header?), so we just could try utf-8 and iso-8859-1 maybe and if everything fails maybe force decoding it with ascii and throw away invalid chars.

Workaround

Use ASCII filenames in ZIPs.

Discussion

In intranet case we usually know which encoding users can use, so can this be configurable ?

just an idea for discussion, just drop the signs which can't be decoded, let the user rename the file to the right name using the right charset. That patch is not completed, name change in msg is missing and probably a hint.

With something like this the annoying signs which can be also entered to the upload form will be removed.

diff -r 57d85b82bc3e MoinMoin/action/AttachFile.py
--- a/MoinMoin/action/AttachFile.py     Tue Jun 10 09:02:51 2008 +0200
+++ b/MoinMoin/action/AttachFile.py     Tue Jun 10 21:23:17 2008 +0200
@@ -191,6 +191,13 @@
 
     # replace illegal chars
     target = wikiutil.taintfilename(target)
+
+    # replace chars which can't be decode by config.charset
+    for x in target:
+        try:
+            x.decode(config.charset)
+        except UnicodeError:
+            target = target.replace(x, '_')
 
     # get directory, and possibly create it
     attach_dir = getAttachDir(request, pagename, create=1)

Character encoding auto-detection in Python

Plan


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/UnzipFilesWithNonAsciiName (last edited 2008-12-22 23:48:16 by ReimarBauer)