Description

After an upgrade to 1.8.2, attaching a file (32KB .odt) the wiki returns an error:

BadZipfile
Bad magic number for central directory
 * args = ('Bad magic number for central directory',)

Attachments for that page can no longer be viewed. Checking the filesystem shows the attachment was uploaded successfully. Deleting the attachment via the filesystem restores the attachment view function.

Steps to reproduce

  1. Upload an attachment to Moin 1.8.2

Example

wiki-error.png

Component selection

Details

HTML traceback page attached: moinbugBadZipfile.tar.gz

Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/MoinMoin/request/__init__.py", line 1311, in run
    handler(self.page.page_name, self)
  File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 513, in execute
    msg = handler(pagename, request)
  File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 601, in _do_upload
    upload_form(pagename, request, msg)
  File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 533, in upload_form
    send_uploadform(pagename, request)
  File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 493, in send_uploadform
    request.write(_get_filelist(request, pagename))
  File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 386, in _get_filelist
    return _build_filelist(request, pagename, 1, 0)
  File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 337, in _build_filelist
    is_package = packages.ZipPackage(request, fullpath).isPackage()
  File "/usr/lib/python2.4/site-packages/MoinMoin/packages.py", line 486, in __init__
    self.zipfile = zipfile.ZipFile(filename)
  File "zipfile.py", line 210, in __init__
    self._GetContents()
  File "zipfile.py", line 230, in _GetContents
    self._RealGetContents()
  File "zipfile.py", line 262, in _RealGetContents
    raise BadZipfile, "Bad magic number for central directory"
BadZipfile: Bad magic number for central directory

MoinMoin Version

1.8.2

OS and Version

Debian 4.0 etch

Python Version

2.4.4

Server Setup

debian etch, apache2, libapache2-mod-wsgi, moinmoin 1.8.2

Server Details

web server

Language you are using the wiki in (set in the browser/UserPreferences)

English

Workaround

Discussion

please can you tell what you get for

>:/etc > grep odt mime.types

it should be

application/vnd.oasis.opendocument.text odt

Likly no bug. Please check if you have a recent MoinMoin/wikiutil.py. You should have in line 841: '.odt': 'application/vnd.oasis.opendocument.text', This means we do already extend the mimetype mapping.

Do you have used the --force parameter for your update installation. Was the old py cache code *.pyc file removed?

service:/etc# grep odt mime.types 
application/vnd.oasis.opendocument.text                         odt
service:/etc#

wikiutil.py shows this:

service:~# grep -n odt /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.py
919: '.odt': 'application/vnd.oasis.opendocument.text',
service:~# ls -l /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.py*
-rw-r--r-- 1 root root 90056 2009-02-08 05:25 /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.py
-rw-r--r-- 1 root root 80796 2009-04-20 22:17 /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.pyc
service:~# 

The problem appears to be local to that page, in that on other pages I can upload attachments and they are displayed correctly in the attachment view without the BadZipfile error. Also the SystemAdmin macro attachment viewer has no issue. I've ensured the permissions and ownership of that page and subfolders is the same as any other Page (owned by www-data and rw). To clarify, the attachment is uploaded ok, despite the badzipfile error message. I can also refer to it with an attachment: link and download it from the attachment page. It is just the attachment list view that causes the error on this page.

I'll outline my install/upgrade procedure in case I've missed something:

moin 1.7.2# python -v setup.py install --record=/root/moininstall.log
moin 1.7.2# ../createinstance.sh /srv/var/moin/hcwiki
config moin
config apache
backup INSTANCE/data
mv INSTANCE INSTANCE.bak
moin 1.8.2# for i in `cat /root/moininstall.log` ; do rm $i ; done
moin 1.8.2# python -v setup.py install --record=/root/moininstall.log
moin 1.8.2# ../createinstance.sh /srv/var/moin/hcwiki
restore edit-log event-log data/pages data/users to $INSTANCE
# su - www-data
www-data$ moin --config-dir=/etc/moin/hcwiki --wiki-url=service.harmsconsulting.com/hcwiki migration data
www-data$ moin --config-dir=/etc/moin/hcwiki --wiki-url=service.harmsconsulting.com/hcwiki maint cleancache
config moin
config apache

Is that sufficient to replace the old installation? I can't find any reference to the --force parameter (presumably to setup.py) - I looked in BasicInstallation and grep force setup.py. Cheers and regards, -- HarmsCon 2009-04-23 00:55:57

Do you have stopped the wiki server process at the time you cleaned the cache?

Do you use custom macros for your attachment list. The standard macros work, see here;

There are 6 attachment(s) stored for this page.

  • [get | view] (2012-10-19 07:25:15, 12239.5 KB) [[attachment:badfile.ppt]]
  • [get | view] (2009-04-22 04:36:32, 21.6 KB) [[attachment:moinbugBadZipfile.tar.gz]]
  • [get | view] (2009-04-24 19:55:06, 40.6 KB) [[attachment:test.odt]]
  • [get | view] (2009-04-22 06:44:37, 8.6 KB) [[attachment:testdocument.odt]]
  • [get | view] (2009-04-22 04:34:21, 75.4 KB) [[attachment:wiki-error.png]]
  • [get | view] (2012-10-19 07:59:15, 0.2 KB) [[attachment:zf.py]]
 All files | Selected Files: delete move to page copy to page


/!\ Please check whether you can unpack that odt file triggering the problem with unzip command. Also check whether you can load that file with openoffice. If you succeed with both, please attach the file here so we can reproduce the problem.

Thomas, you are a legend! I never realised odt files were zipped, and trying to unzip it as suggested resulted in the start of central directory not found error message. I repaired the doc and re-uploaded it with a clean result. Thanks to you all for your time, josh. -- HarmsCon 2009-04-27 01:05:52

OK, good that we have found the problem being in the document. :)

But anyway: moin should not completely fall over such a problem, but catch it somehow. This is why we still need some defective zip file to reproduce the problem. I am leaving the bug open for fixing the exception handler to deal with such stuff nicely.


/me has got this problem too, with PPT (PowerPoint file). Reediting it, saving and uploading fixed the problem, but MoinMoin shouldn't choke on bad attachments: it renders the entire attachment management subpage to be inaccessible and that's bad. The following patch helps in my case (MoinMoin 1.9.5). -- rea[at]codelabs[dot].ru, 17.10.2012

Can you give us such a PPT file please.

Here it is: badfile.ppt It is a sanitized version of our PPT file, since it contains some material I don't really want to share with people, so it was filled with zeros from the beginning. For me, it happens with Python 2.7, whose zipfile.py has the following stanzas

def _EndRecData(fpin):
    """Return data from the "End of Central Directory" record, or None.

    The data is a list of the nine items in the ZIP "End of central dir"
    record followed by a tenth item, the file seek offset of this record."""

    # Determine file size
    fpin.seek(0, 2)
    filesize = fpin.tell()

    # Check to see if this is ZIP file with no archive comment (the
    # "end of central directory" structure should be the last item in the
    # file if this is the case).
    try:
        fpin.seek(-sizeEndCentDir, 2)
    except IOError:
        return None
    data = fpin.read()
    if data[0:4] == stringEndArchive and data[-2:] == "\000\000":
        # the signature is correct and there's no comment, unpack structure
        endrec = struct.unpack(structEndArchive, data)
        endrec=list(endrec)

        # Append a blank comment and record start offset
        endrec.append("")
        endrec.append(filesize - sizeEndCentDir)

        # Try to read the "Zip64 end of central directory" structure
        return _EndRecData64(fpin, -sizeEndCentDir, endrec)

    # Either this is not a ZIP file, or it is a ZIP file with an archive
    # comment.  Search the end of the file for the "end of central directory"
    # record signature. The comment is the last item in the ZIP file and may be
    # up to 64K long.  It is assumed that the "end of central directory" magic
    # number does not appear in the comment.
    maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0)
    fpin.seek(maxCommentStart, 0)
    data = fpin.read()
    start = data.rfind(stringEndArchive)
    if start >= 0:
        # found the magic number; attempt to unpack and interpret
        recData = data[start:start+sizeEndCentDir]
        endrec = list(struct.unpack(structEndArchive, recData))
        commentSize = endrec[_ECD_COMMENT_SIZE] #as claimed by the zip file
        comment = data[start+sizeEndCentDir:start+sizeEndCentDir+commentSize]
        endrec.append(comment)
        endrec.append(maxCommentStart + start)

        # Try to read the "Zip64 end of central directory" structure
        return _EndRecData64(fpin, maxCommentStart + start - filesize,
                             endrec)

    # Unable to find a valid end of central directory structure
    return

and what happens here is that the function tries to lookup the string "PK\005\006" in the last 64K bytes of the file in question. It finds that, because there is some ZIP structure within that file, but since it is not really the last record, it won't represent the correct end-of-central-directory record. However, _RealGetContets from the same package tries to interpret it and chokes on the bad signature (should be "PK\001\002"):

    def _RealGetContents(self):
        """Read in the table of contents for the ZIP file."""
        fp = self.fp
        try:
            endrec = _EndRecData(fp)
        except IOError:
            raise BadZipfile("File is not a zip file")
        if not endrec:
            raise BadZipfile, "File is not a zip file"
        if self.debug > 1:
            print endrec
        size_cd = endrec[_ECD_SIZE]             # bytes in central directory
        offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
        self.comment = endrec[_ECD_COMMENT]     # archive comment

        # "concat" is zero, unless zip was concatenated to another file
        concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
        if endrec[_ECD_SIGNATURE] == stringEndArchive64:
            # If Zip64 extension structures are present, account for them
            concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)

        if self.debug > 2:
            inferred = concat + offset_cd
            print "given, inferred, offset", offset_cd, inferred, concat
        # self.start_dir:  Position of start of central directory
        self.start_dir = offset_cd + concat
        fp.seek(self.start_dir, 0)
        data = fp.read(size_cd)
        fp = cStringIO.StringIO(data)
        total = 0
        while total < size_cd:
            centdir = fp.read(sizeCentralDir)
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for central directory"

This problem seems to be specific to Python 2.7, since 2.5 and 2.5 give me no problems:

$ python2.5 zf.py badfile.ppt ib-specification-1.2-vol1.zip 
ib-specification-1.2-vol1.zip is a ZIP file!

$ python2.5 zf.py badfile.ppt ib-specification-1.2-vol1.zip 
ib-specification-1.2-vol1.zip is a ZIP file!

$ python2.7 zf.py badfile.ppt ib-specification-1.2-vol1.zip 
badfile.ppt is a ZIP file!
Got exception:
Bad magic number for central directory

Script I used for testing.

I had updated my patch to catch two more places where not every Zip-related exception was handled. (patch was applied to moin/1.9 repo, see: http://hg.moinmo.in/moin/1.9/rev/b9450db6c129 )

Please, note that this problem isn't ODT/PPT/WhateverFormat-related. It is just a kind of DoS for the given page where one can upload attachment that contains malformed file that will be errorneously recognized as ZIP file by the Python's zipfile module.

And it lies in the fact that zipfile.{is_zipfile,_check_zipfile} just checks for the result for _EndRecData() without any validation for the returned content, though it is kinda documented.

Plan


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/BadZipfile (last edited 2012-12-08 21:57:49 by ThomasWaldmann)