Description

Note: this was fixed in moin-2.0, see: https://bitbucket.org/thomaswaldmann/moin-2.0/issue/41/non-ascii-download-filenames-dont-work

Attachments with non ASCII names are saved with wrong name when downloaded.

Browsers support:

Browser

Correct Name

Comments

IE 6.0

No

Firefox

Yes

Firefox3

Yes

IE 7.0

No

When opening a word docuemnt inside the browser, the file name is displayed using url encoding (%xy) in the tab, but the window show the correct name.

Safari

No

Opera

Yes

Steps to reproduce

  1. upload any file with non ASCII filename
  2. download file

Example

Examples attachments:

Downloading in IE7:

wiki_rus_files_bad.png

Downloading in Safari: hebrew.png

Downloading in Firefox:

wiki_rus_files_ok.png

Component selection

Details

This Wiki.

Workaround

Use Firefox.

And/or use ASCII filenames.

Discussion

MoinMoin sends invalid Content-Disposition header with non ASCII characters:

Content-Disposition: inline; filename="test עברית.txt"

Firefox somehow decode the filename as utf-8 (maybe its a default). Safari and IE try to decode the filename differently, which lead to wrong name, but it is correct behavior. The standard does not allow non ASCII characters in header parameters.

RFC 2231 describe how to use non ASCII characters. It should look like this:

Content-Disposition: inline; filename*=utf-8'en'test%20%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt

email.Utils.encode_rfc2231 can be used to create correct non ASCII headers:

>>> from email.Utils import encode_rfc2231
>>> encode_rfc2231('עברית', 'utf-8', 'en')
"utf-8'en'%D7%A2%D7%91%D7%A8%D7%99%D7%AA"

Browsers support:

Browser

Support RFC2231

Comments

IE 6.0

No

Firefox

Yes

Both incorrect and correct test cases works

IE 7.0

No

Tested by AlexanderAgibalov

Safari

No

Same for other WebKit based browsers(OmniWeb, Shira). See WebKit bug 15287 Both incorrect and correct test cases do not work

Opera

No

Opera 9.5

Yes

Tested by julian.reschke@gmx.de (as far as I recall, this also worked in earlier releases)

Here is a simple CGI script I used to test this.

download.py

There is no solution that works on all browsers using valid or invalid content-disposition at this time.

As a short term solution, we can move the filename into the path:

http://example.com/pagename/%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt?acion=AttachFile&do=get

The action should assume that the last url component is the filename, and the one before is the page of the attachment.

A long term solution will be to treat attachments as pages, so each attachment is accessible as a sub page of its parent page. For example, the attachment "עברית.txt" will be accessible as:

http://example.com/pagename/%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt

Alternative long term solution is to change the url format to:

http://example.com/files/page/filename

files action will expect "filename" to be an attachment of page.

http://example.com/files can give a list of all attachments. http://example.com/files/page a list of all attachments on a page. This can also work for other actions.

<!> I wonder if anyone has addressed this problem ever since or has it been fixed in v.1.6? I'm getting more and more IE users in my Wiki, so the issue becomes more and more irritating :(

OK, now we need to know for which browsers this "sub-item" method works:

Please help testing

Just try to go there and do a "save as" in the browser - does it give the correct filename?

Link: http://example.com/pagename/%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt

Expected filename: test עברית.txt (don't be confused if it is rendered right-to-left)

Browser

works with getting filename out of the sub-item name in the path

FF2

yes

FF3

yes

IE6

No

IE7

IE8

yes

Opera

yes

Konqueror (3.5.5)

yes

Safari

yes

links

no

lynx

yes 'p'-key

Plan

TODO: test with firefox 3 beta


Patch on this

I also meet with this problem when using attachment with Chinese filename. It is obviously if attachments direct serving mode is turned off in Moin 1.6.

:( (I am running a patched MoinMoin 1.8 with attachment direct serving at http://www.ossxp.com, so not noticed it, until my client complains.)

I analysed the packages between virous browsers and the web server, finaly I notice the response package from the web server which contains a incorrect 'Content-Disposition:' cause the trouble.

Below is my patch.

   1 Download file may corrupt if filename not encode correctly in Content-Disposition header for some web browser;
   2 
   3 diff -r 6278b366fb32 MoinMoin/Page.py
   4 --- a/MoinMoin/Page.py	Wed Nov 19 10:25:26 2008 +0800
   5 +++ b/MoinMoin/Page.py	Wed Nov 19 10:25:28 2008 +0800
   6 @@ -1047,6 +1047,7 @@
   7                  # TODO: fix the encoding here, plain 8 bit is not allowed according to the RFCs
   8                  # There is no solution that is compatible to IE except stripping non-ascii chars
   9                  filename_enc = "%s.txt" % self.page_name.encode(config.charset)
  10 +                filename_enc = wikiutil.content_disposition_encode(filename_enc, request)
  11                  request.setHttpHeader('Content-Disposition: %s; filename="%s"' % (
  12                                        content_disposition, filename_enc))
  13          else:
  14 diff -r 6278b366fb32 MoinMoin/action/AttachFile.py
  15 --- a/MoinMoin/action/AttachFile.py	Wed Nov 19 10:25:26 2008 +0800
  16 +++ b/MoinMoin/action/AttachFile.py	Wed Nov 19 10:25:28 2008 +0800
  17 @@ -872,7 +872,7 @@
  18              'Content-Type: %s' % content_type,
  19              'Last-Modified: %s' % timestamp,
  20              'Content-Length: %d' % os.path.getsize(fpath),
  21 -            'Content-Disposition: %s; filename="%s"' % (content_dispo, filename_enc),
  22 +            'Content-Disposition: %s; filename="%s"' % (content_dispo, wikiutil.content_disposition_encode(filename_enc, request)),
  23          ])
  24  
  25          # send data
  26 diff -r 6278b366fb32 MoinMoin/action/backup.py
  27 --- a/MoinMoin/action/backup.py	Wed Nov 19 10:25:26 2008 +0800
  28 +++ b/MoinMoin/action/backup.py	Wed Nov 19 10:25:28 2008 +0800
  29 @@ -39,7 +39,7 @@
  30      filename = "%s-%s.tar.%s" % (request.cfg.siteid, dateStamp, request.cfg.backup_compression)
  31      request.emit_http_headers([
  32          'Content-Type: application/octet-stream',
  33 -        'Content-Disposition: inline; filename="%s"' % filename, ])
  34 +        'Content-Disposition: inline; filename="%s"' % wikiutil.content_disposition_encode(filename, request), ])
  35  
  36      tar = tarfile.open(fileobj=request, mode="w|%s" % request.cfg.backup_compression)
  37      # allow GNU tar's longer file/pathnames
  38 diff -r 6278b366fb32 MoinMoin/action/cache.py
  39 --- a/MoinMoin/action/cache.py	Wed Nov 19 10:25:26 2008 +0800
  40 +++ b/MoinMoin/action/cache.py	Wed Nov 19 10:25:28 2008 +0800
  41 @@ -154,7 +154,7 @@
  42          # TODO: fix the encoding here, plain 8 bit is not allowed according to the RFCs
  43          # There is no solution that is compatible to IE except stripping non-ascii chars
  44          filename = filename.encode(config.charset)
  45 -        headers.append('Content-Disposition: %s; filename="%s"' % (content_disposition, filename))
  46 +        headers.append('Content-Disposition: %s; filename="%s"' % (content_disposition, wikiutil.content_disposition_encode(filename, request)))
  47  
  48      meta_cache = caching.CacheEntry(request, cache_arena, key+'.meta', cache_scope, do_locking=do_locking, use_pickle=True)
  49      meta_cache.update({
  50 diff -r 6278b366fb32 MoinMoin/wikiutil.py
  51 --- a/MoinMoin/wikiutil.py	Wed Nov 19 10:25:26 2008 +0800
  52 +++ b/MoinMoin/wikiutil.py	Wed Nov 19 10:25:28 2008 +0800
  53 @@ -2624,3 +2624,46 @@
  54                            ( authtype == 'w' and user.may.write(pagename) ) ) ):
  55                  return "( " + _("Permission denied for macro: %s")% macro_name + " )";
  56      return None
  57 +
  58 +def content_disposition_encode(text,request=None):
  59 +    """
  60 +    UTF filename in Content-Disposition:
  61 +        IE: failed to download
  62 +        Chrome: wront filename
  63 +        FF: works. (Firefox,Epiphany,Iceweasel,Iceape,Galeon)
  64 +        Opera: works.
  65 +        Safari: wrong filename
  66 +    URL encode filename in Content-Disposition:
  67 +        IE: works.
  68 +        Chrome: works.
  69 +        FF: wrong filename. (Firefox,Epiphany,Iceweasel,Iceape,Galeon)
  70 +        Opera: wrong filename
  71 +        Safari: wrong filename
  72 +    """
  73 +    if isinstance(text, unicode):
  74 +        text = text.encode('utf-8')
  75 +    do_url_encode = None
  76 +    if request:
  77 +        ua = request.http_user_agent
  78 +        ## browsers shoud url encode: MSIE, Chrome
  79 +        for browser in ["MSIE",
  80 +                        "Chrome"]:
  81 +            if browser in ua:
  82 +                do_url_encode = True
  83 +        ## should NOT url encode: Firefox, Opera
  84 +        if do_url_encode is None:
  85 +            for browser in ["Opera",
  86 +                            "Firefox",
  87 +                            "Epiphany",
  88 +                            "Iceweasel",
  89 +                            "Iceape",
  90 +                            "Galeon",]:
  91 +                if browser in ua:
  92 +                    do_url_encode = False
  93 +        # should convert to OS's charset.
  94 +        if do_url_encode is None and "Safari" in ua:
  95 +            do_url_encode = False
  96 +
  97 +    if do_url_encode:
  98 +        text = urllib.quote(text)
  99 +    return text
 100 

30460_content_disposition_encode.patch

-- JiangXin 2008-10-16 14:40:30

Hi JiangXin,

thanks for your patch!

Could you please:

Another thing I am thinking about is whether the content-disposition encoding has to depend on the browsers version.

Assuming that there is one correct way to do it, the other way must be wrong and thus be a bug in those browsers. Assuming they fix that bug some day, we have another problem, because the code won't be able to handle that change and then the fixed browsers will fail due to the code in moin.

Also, someone suggested above that the officially correct method is defined in rfc2231. But your code does not use this at all, right?

-- ThomasWaldmann 2008-11-29 11:09:56

Notes for EasyToDo

The task for this EasyToDo is to fix the problem described above in a RFC-compliant and yet compatible way to existing browsers.

Requires: access to Windows, Linux, Mac OS X machines where you can install and test lots of different browsers

Task includes:

Time estimates:

Note: maybe task can be split into multiple tasks on different platforms.


CategoryMoinMoinBug

MoinMoin: MoinMoinBugs/Non-ASCII attachment names corrupted on download (last edited 2011-07-30 12:37:34 by ThomasWaldmann)