Description

Junk non utf-8 characters are inserted in the user MoinEditorBackup and cause later Unicode error on many operations.

Steps to reproduce

  1. Create a wiki user.
  2. Create homepage for user.
  3. Create a test page.
  4. Delete test page
    • Unicode error during deletion
    • Unicode error on RecentChanges, etc. for all users.

Example

Details

MoinMoin Version

Release 1.3.0 [Revision patch-399]

OS and Version

Linux (SuSE 9.0)

Python Version

2.3

Server Setup

Apache 2.0.48

Server Details

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/MoinMoin/request.py", line 756, in run
    handler(page.page_name, self)
  File "/usr/lib/python2.3/site-packages/MoinMoin/wikiaction.py", line 588, in do_savepage
    comment=comment)
  File "/usr/lib/python2.3/site-packages/MoinMoin/PageEditor.py", line 864, in saveText
    backup_url = self._make_backup(newtext, **kw)
  File "/usr/lib/python2.3/site-packages/MoinMoin/PageEditor.py", line 730, in _make_backup
    backuppage._write_file(intro + newtext)
  File "/usr/lib/python2.3/site-packages/MoinMoin/PageEditor.py", line 774, in _write_file
    was_deprecated = self._get_pragmas(self.get_raw_body()).has_key("deprecated")
  File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 499, in get_raw_body
    text = file.read()
  File "/usr/lib/python2.3/codecs.py", line 380, in read
    return self.reader.read(size)
  File "/usr/lib/python2.3/codecs.py", line 253, in read
    return self.decode(self.stream.read(), self.errors)[0]
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 41: unexpected code byte

Workaround

None

Discussion

The directory containing the page is deleted the errors on RecentChanges etc. disappear, however the user is then unable to edit any further pages without a Unicode error appearing. This is fixed by removing the user MoinEditorBackup directory.

The obvious workaround in this situation is either not to delete pages, or for users not to have a WikiHomePage. I assume this is because if a user doesn't have a homepage the MoinEditorBackup is not created. --OriginalReporter

I can't reproduce this on both this wiki and my test wikis, both running latests code, using my old account and by creating a new account, following the steps above.

This smell like a wiki which was not migrated to utf-8 from iso-8859-1 or another non utf-8 charset. In this case, when you try to read page, you will have Unicode errors.

Give more details about this wiki. Is it new, or upgraded from which version? Did you run all migrations scripts? did you have errors in while running the mig scripts?

-- DeletePageTest 2004-12-10 15:29:48

I've created a new wiki especially for testing this problem. I did have a version 1.1 wiki which I updated to 1.2.4 (no problems) and then to 1.3 this week. The only errors I received in migration were (I think) during mig3 because my original had no cache files. As for the new wiki (url above) I've only just created it so that rules out migration problems (that was my first guess).

I've attached two files, 00000002 is the revision of a deleted page, 00000000 is created as the MoinEditorBackup, should these files not be empty?

  • [get | view] (2004-12-10 15:57:45, 0.0 KB) [[attachment:00000000.conf]]
  • [get | view] (2004-12-10 15:57:25, 0.0 KB) [[attachment:00000002.conf]]
 All files | Selected Files: delete move to page copy to page

I've just installed the latest tarball from the ArchRepository (reports Release 1.3.0 [Revision 1.3.0 release]) and I get the same problem. I've also run with the standalone server to rule out Apache and I get the same error.

Please try this test:

  1. install a new wiki instance
  2. copy the data and underlay dir from the distribution
  3. setup config, permission etc.
  4. create a new account
  5. create a page for yourself
  6. create a another page
  7. delete the other page

Now make a tarball, or zip from your wiki directory - including your config and data directory and server script. attach this on this page so we can inspect all details and pages.

Also very important: add details about the language you use in your preferences, or in your browser, and which browser you use to edit the pages.

-- NirSoffer 2004-12-10 16:41:01

Done that. However I have also determined that the bug seems related to my Linux installation. It occurs on both of my SuSE 9.0 servers (python 2.3), but not on my SuSE 9.1 workstation (python 2.3.3).

I have english selected as my preferred language in my user preferences, en-us then en in my browser - which is Firefox 0.9.3

-- JonathanBrady 2004-12-10 18:47:52

I check your wiki attachment. Everything is fine expect the last line of rev 0 of TestUser/MoinEditorBackup, which contain few junk characters. I don't have any idea where those characters came from. When they are there, any access to that page rev cause an expected UnicodeError. This is a situation that should never happen, three should be no way to insert data which is not in the wiki charset, but editing the file manually.

After I delete those lines from rev0, there is no problem in this wiki running on current code.

Next step:

The goal is to find the action that insert that junk line. Check your editor backup page after each operation.

  1. Clean your TestUser/MoinEditorBackup of the junk characters or just remove that directory, its just a backup of user last edited page.

  2. create new page and save - enter the page name in the url box, press enter, select the "create new page", save the page without changing its content.
    • If you create the page in a different way before, describe that way and the template you chose.
  3. check if the junk is on TestUser/MoinEditorBackup again

-- NirSoffer 2004-12-10 19:36:46

Deleted TestUser/MoinEditorBackup directory, and created a new page called NewPage without modifying the content. Previously I changed the content to be the same as the page name. TestUser/MoinEditorBackup/revisions/00000001 contains the same as NewPage/revisions/00000001 except for the acl.

-- JonathanBrady 2004-12-10 20:17:38

With the deletion of NewPage (no reason for deletion specified) TestUser/MoinEditorBackup/revisions/00000000 now contains an acl followed by invalid data. I get the following (I'm now using the standalone server for these tests):

melon.home - - [10/Dec/2004 21:01:04] "GET /NewPage HTTP/1.1" 200 -
melon.home - - [10/Dec/2004 21:01:09] "GET /NewPage?action=DeletePage HTTP/1.1" 200 -
Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 1171, in getPageLinks
    Page(request, self.page_name).send_page(request, content_only=1)
  File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 684, in send_page
    body = self.get_raw_body()
  File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 499, in get_raw_body
    text = file.read()
  File "/usr/lib/python2.3/codecs.py", line 380, in read
    return self.reader.read(size)
  File "/usr/lib/python2.3/codecs.py", line 253, in read
    return self.decode(self.stream.read(), self.errors)[0]
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-3: invalid data
melon.home - - [10/Dec/2004 21:01:14] "POST /NewPage?action=DeletePage HTTP/1.1" 200 -

I do not however see any visual indication in my browser. -- JonathanBrady 2004-12-10 20:38:02

Ok, now repeat the same steps, but before you delete the page, open the file MoinMoin/PageEditor.py, and add the raise... line at line no 704:

Save the file, restart moin.py! Now try to delete again. Here everything is fine in this step. You should get this output: "#acl TestUser:read,write,delete All: deleted: None"

Yes, this time I get a traceback in my browser with "#acl TestUser:read,write,delete All: deleted: None" at the end, MoinEditorBackup is unchanged from the page creation.

-- JonathanBrady 2004-12-10 21:09:20

You can remove that raise, the problem is not there anyway.

It looks like a problem with this specific system, as it works on every other system.

OK the error does actually occur around this part of the code. It appears to be in _write_file

If I change it to:

        # save to page file
        pagefile = os.path.join(revdir, revstr)
        f = codecs.open(pagefile, 'wb', config.charset)
        # Write the file using text/* mime type
        f.write(self.encodeTextMimeType(text+'\n'))
        f.close()

Then the contents of the MoinEditorBackup are no longer corrupted, it seems my version of python has a problem which results in corruption if the file is not terminated with a newline.

-- JonathanBrady 2004-12-10 22:28:50

Very nice Jonathan! but that hack it not the correct fix, I will put here a patch soon. The problem in this case is useless backup being made. In this case, the page body is empty, and we use a default "describe" line. It does not make sense to do a backup of such page content.

Problem fixed in my branch, fix will be available in our tla archive soon. In DeletePage, the page editor is created with do_editor_backup=0, because it does not make sense to make an editor backup of a generated page content contained "deleted". -- NirSoffer 2004-12-11 16:31:46

The fix will be available in moin-1.3.1 soon.

Plan


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/DeletePageUnicodeError (last edited 2007-10-29 19:17:45 by localhost)