Description

When pasting text from an MS Office Word document loaded in Open Office Writer, into the GUI editor (FCKEdit), strange characters (Anchors and undefined unicode character codes) are displayed in the editor window. When you try to Preview or Save Changes, an exception is caused:

ConvertError: ExpatError: not well-formed (invalid token): line 376, column 1508 (see dump in /home/andydj/moin-1.8.5/wiki/data/expaterror.log)

This occurs on both the apache-cgi and stand-alone versions of MoinMoin release 1.8.5.

Steps to reproduce

  1. Load an MS Word document containing tables and "type here" fields into Open Office Writer.
  2. Select and copy some or all of the text.
  3. Edit a page in MoinMoin and select GUI Mode.

  4. Paste text directly into GUI editor window or click Paste Word icon and paste into dialogue box.
  5. Text appears in editor window, but includes strange characters (unicode undefined character boxes - 0004 and 0005) and also anchor symbols.
  6. Preview or Save Changes.
  7. Exception above is displayed.

Example

MoinMoinPasteError.png

Component selection

Details

The traceback:

Traceback (most recent call last):
  File "/home/andydj/moin-1.8.5/MoinMoin/request/__init__.py", line 1311, in run
    handler(self.page.page_name, self)
  File "/home/andydj/moin-1.8.5/MoinMoin/action/edit.py", line 97, in execute
    savetext = convert(request, pagename, savetext)
  File "/home/andydj/moin-1.8.5/MoinMoin/converter/text_html_text_moin_wiki.py",
 line 1441, in convert
    tree = parse(request, text)
  File "/home/andydj/moin-1.8.5/MoinMoin/converter/text_html_text_moin_wiki.py",
 line 1419, in parse
    raise ConvertError('ExpatError: %s (see dump in %s)' % (msg, logname))
ConvertError: ExpatError: not well-formed (invalid token): line 376, column 1512
 (see dump in /home/andydj/moin-1.8.5/wiki/data/expaterror.log)

MoinMoin Version

1.8.5 (this wiki)

OS and Version

CentOS 5.2 and Ubuntu 9.04

Python Version

2.4.3 and 2.6.2 respectively

Server Setup

Apache-CGI and Standalone, respectively

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

en-uk

Workaround

When I looked at the log file in vim, it showed countless ctrl-D and ctrl-E characters, and I guessed it might be expat choking on these, so I modified parse() in MoinMoin/converter/text_html_text_moin_wiki.py, adding in a text.translate() call to delete all control characters before the text is submitted to xml.dom.minidom.parseString(text). It's a bit of a kludge but it works:

Thanks to Thomas Waldmann for some pointers as to where I might find the issue.

Discussion

Plan


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/PastingFromOpenOfficeCausesConvertError (last edited 2009-10-08 19:47:39 by AndyD'ArcyJewell)