DOM->Moinwiki converter implementing

First version will not have attributes controller and footnote support.

DFS algorithm will use two stacks: for opened nodes and for their children.

Two types of actions: When visiting node first time: open_<namespace>_<name>(node) When all children are visited: close_<namespace>_<name>(node)

Example:

   1 class moinwiki:
   2     moinwiki.emphasis = "''"
   3 
   4 class Converter:
   5     ...
   6 
   7     def open_moinpage_emphasis(self, node):
   8         if not node.children:
   9             return moinwiki.emphasis + self.close_moinpage_emphasis(node)
  10         else:
  11             self.children.append(list(node.children))
  12             self.opened_nodes.append(node)
  13             return moinwiki.emphasis
  14 
  15     def close_moinpage_emphasis(self, node):
  16         return moinwiki.emphasis

I think i can done with this today at night.

If you see some limitations of this approach you are welcome to leave a message.

Sorry for just leaving a nitpicking note, but please remember PEP8. :)

Still working on DOM->Moinwiki

  • added conversion of list
    • TODO defenition list_type
    • TODO list level, for correct shift ' '*level before item

  • 'lower-alpha'|'lower-roman' list_style_type not found in moinwiki_in Next step:

  • conversion of table

DOM->Moinwiki converter: tables

Done:

  • conversion of tables with attributes
    • table class and style attributes <tableclass="..." tablestyle="..." ...> in first table's cell

    • row class and style attributes <rowclass="..." rowstyle="..." ...> in first row's cell

    • cell class and style attributes <class="..." style="..." ...>

    • colspan: || * number_columns_spanned, (alternative way is <colspan=%number_columns_spanned%>)

    • rowspan: <-%number_rows_spanned%>, (alternative way is <rowspan=%number_rows_spanned%>)

    • TODO: create separate class for conversion of tables (and for lists too)
    Other:
    • nodes with only text children don't need "close" function

Next Steps:

  • new node types:
    • underline
    • superscript
    • subscript
    • smaller
    • larger
    • stroke
  • moin_page.object

Span and first test

Done with moin_page.span, it contains next elements of moinwiki syntax:

  • underline
  • superscript
  • subscript
  • smaller
  • larger
  • stroke

But when i started to test, i've found some errors.) so i'm fixing them

We have passed the first barrier

Hmm, maybe some changes to '\n' would be later.

Text test

Input tree:

<page:page>
 <page:body>
  <page:h page:outline-level="3">Text:</page:h>
  \n
  <page:strong>strong</page:strong>
  \n
  <page:emphasis>emphasis</page:emphasis>
  \n
  <page:blockcode>blockcode</page:blockcode>
  \n
  <page:code>monospace</page:code>
 </page:body>
</page:page>

Output:

=== Text: ===\n'''strong'''\n''emphasis''\n{{{blockcode}}}\n`monospace`


Text:

strong emphasis blockcode monospace


Table test

Input tree:

<page:page>
 <page:body>
  <page:h page:outline-level="3">Table:</page:h>
  \n
  <page:table>
   <page:table-body>
    <page:table-row>
     <page:table-cell>A</page:table-cell>
     <page:table-cell>B</page:table-cell>
     <page:ta   ble-cell page:number-rows-spanned="2">D</page:table-cell>
    </page:table-row>
    <page:table-row>
     <page:table-cell page:number-columns-spanned="2">C</page:table-cell>
    </page:table-row>
   </page:table-body>
  </page:table>
 </page:body>
</page:page>

Output:

=== Table: ===\n||A||B||<|2>D||\n||||C||\n\n


Table:

A

B

D

C


List test

Input tree:

<page:page>
 <page:body>
  <page:h page:outline-level="3">List:</page:h>
  \n
  <page:list page:item-label-generate="unordered">
   <page:list-item>
    <page:list-item-body>A</page:list-item-body>
   </page:list-item>
   <page:list-item>
    <page:list-item-body>B</page:list-item-body>
   </page:list-item>
  </page:list>
 </page:body>
</page:page>

Output:

=== List: ===\n * A\n * B\n


List:

  • A
  • B


Sunday moinpage_span tests

Another working item: span

  • stroke

  • underline

  • larger

  • smaller

  • superscript

  • subscript

Span test

Input tree:

<page:page>
 <page:body>
  <page:h page:outline-level="3">Span:</page:h>
  \n
  <page:span page:text-decoration="line-through">stroke</page:span>
  \n
  <page:span page:text-decoration="underline">underline</page:span>
  \n
  <page:span page:font-size="120%">larger</page:span>
  \n
  <page:span page:font-size="85%">smaller</page:span>
  \n
  <page:span page:baseline-shift="super">super</page:span>script
  \n
  <page:span page:baseline-shift="sub">sub</page:span>script
  \n
 </page:body>
</page:page>

Output:

=== Span: ===\n--(stroke)--\n__underline__\n~+larger+~\n~-smaller-~\n^super^script\n,,sub,,script\n


Span:

stroke underline larger smaller superscript subscript


Moinwiki->DOM->Moinwiki tests

Tests:

1. "=== Text: ===\n'''strong'''\n''emphasis''\n`monospace`\n"
2. "=== Table: ===\n||A||B||<|2>D||\n||||C||\n"
3. "=== List: ===\n * A\n  1. C\n  1. D\n"
4. "=== Span: ===\n--(stroke)--\n__underline__\n~+larger+~\n~-smaller-~\n^super^script\n,,sub,,script\n"
5. " * A\n * B\n * C\n * D\n * E\n * F\n"
6. " * A\n * B\n i. C\n i. D\n 1. E\n 1. F\n i. G\n 1. H\n"
7. "=== A ===\n dsfs:: dsf\n :: rdf\n :: sdfsdf\n :: dsfsf\n"
8. "=== A ===\n css::\n :: rdf\n :: sdfsdf\n :: dsfsf\n"
9. "=== A ===\n css:: \n :: rdf\n :: sdfsdf\n :: dsfsf\n"

Problem with {{{blockcode}}}, moinwiki_in converts it to moin_page:code, equals with `monospace`

Test 8. fails, moinwiki_in does not recognise this input as definition list.

In Test 9. first list item is ' '. We need some changes in moinwiki_in.

For now, moinwiki_out supports definition list, but only one output format:

 this::
 :: A
 :: B
 not_this:: def

Moinwiki->DOM->Moinwiki: problems

Problems:

In moinmoin_in converter with this input

[[http://static.moinmo.in/logos/moinmoin.png|{{attachment:samplegraphic.png}}]]

in outputed tree {{attachment:samplegraphic.png}} is a text, not an object.

[[http://moinmo.in/|MoinMoin Wiki|class=green dotted,accesskey=1]]

no class=green dotted,accesskey=1 after moinwiki_in

But in other cases it seems that Moinwiki->DOM->Moinwiki conversion of links and objects is working.

And i wasn't right:)

MoinMoin:MoinMoinWiki|MoinMoin Wiki|&action=diff,&rev1=1,&rev2=2 is not working now. works now.

Another problem in moinwiki_in: <page:separator> don't have any attributes, no difference between:




etc

"A::\n :: B\n :: C\n :: D\n" this format of definition list does'n work in moinwiki_in. "A:: B\n :: C\n :: D\n" this works

All these problems are only from moinwiki_in converter.

It's time to prepare for last exam

I have graduation exam next friday, so you would not see any changes in my GSoC project next 3 days.

New logic for newline in moinpage_p, {{{#!wiki ... }}} support

New variables: status = list of ['text','table','list'] last_closed - last closed DOM element.

  • In text <p> -> "\n" (if not at the beginning of the page)

  • In tables and lists <p> inside cells and list items -> <<BR>>

Found moinwiki_in bugs

Added support of {{{#!wiki ... }}} (moinpage_page inside moinpage_page).

Macros and more bugs in moinmoin_in

First dirty realization of <<SomeMacro(args)>>

and more bugs in Moinwiki_in bugs

Test day

Wrote a lot of tests for moinwiki_out, and fixed some bugs.

Last elements support

DOM->Moinwiki

Added support of <note> and <table-of-content>

Rewrote implementation of <part> (macros)

Merge with the main 2.0-dev repo

Small bugfixes after merge and new implementation of parsers(<page:part page:content-type="x-moin/format;name=XXX">...</page:part>) conversion.

48 different tests of moinwiki_out passed

Conversion of HelpOnMoinWikiSyntax and subpages|subblockcodes

First conversion of real page

HelpOnMoinWikiSyntaxTestOfConverter

Bugfixes and support of subpages|subblockcodes:

long long {{{{{{{{{{{{ and }}}}}}}}}}}} with nested blocks

reST converter

I've started working on reST converter.

Done with first quick&dirty implementation: emphasis, strong, literals(monospace), blockcode, table, list

DOM->reStructuredText and problem with unicode in converter tests

Added conversion of <part> (macros), <note> (footnotes), <line-break>.

When i try to do some converter tests with unicode input:

>           self._parser.Parse(data, 0)
E           UnicodeEncodeError: 'ascii' codec can't encode characters in position 138-142: ordinal not in range(128)

MoinMoin/support/emeraldtree/tree.py:1146: UnicodeEncodeError
  • and the answer is --> ET.XML(i.encode("utf-8"))

Exam

I've passed my first entrance exam to graduate school(PhD)

DOM->ReST: objects and links

Added basic support of objects and links to rst_out converter

ReStructuredText -> DOM

i've started to think, how to implement rst_in converter. Moin parser based on docutils rst2html, With docutils parser i can create Write(docutils.writers.Writer) class, that will output MoinDOM tree after docutils parser, bu maybe it would be easier to write DocutilsDOM->MoinDOM converter.

ReStructuredText->DOM

Implemented basic structure of rst_in converter

Rst->DOM

I work on conversion of docutils DOM to moin DOM. I need more time before hg push because it has a lot of node types and i want to push some working version

Rst->DOM

I've added implementation for basic nodes of the docutils tree, but rst_in still doesn't work as converter.

I have to read documentation on basic functions in docutils.core.

ReST->DOM

Docutils part of converter works.

I need more tests to done with docutils tree -> moin tree conversion.

ReStructuredText->DOM

Added Moin directives to docutils parser in converter.

Implemented basic tests.

Exam, ReStructuredText->DOM

I've passed my second entrance exam(english) to graduate school(PhD)

New nodes support in rst_in: table, link, footnote. (with tests)

Page test for ReStructuredText roundtrip test

Now I have it: DmitryAndreev/Diary/RstPrimerConversion

it's awful, but it would help me to fix errors.

ReStructuredText conversion

Fixed table_of_content, blockcode and shift in lists.

See updates of DmitryAndreev/Diary/RstPrimerConversion

ReStructuredText conversion

Fixed the problem with equal names of references in ReStructuredText output.

Added conversion of docinfo part to a table.

Added conversion of blockqoute to a list.

RstPrimerConversion looks very good

TODO:

  • Create more unit tests for ReStructuredText->DOM and DOM->ReStructuredText

  • Fix pep8 in rst_in and rst_out converters
  • Copy the part of code related to shifts in lists to moinwiki_out converter.
  • Write a lot of docstrings for all my converters

DOM->ReStructuredText and DOM->Moinwiki: Fix indents in lists, now they are perfect

PEP8 fixes and merge with main 2.0 branch

no project work

I've passed my last entrance exam, no project progress this day.

ReStructuredText->DOM bugfixes

Added directive for moinwiki parsers

More tests and various bugfixes

DOM->ReStructuredText

Various bugfixes and more tests

= DOM->ReStructuredText =

More tests and various bugfixes

Coverage of the tests:

moinwiki_out:

89%

rst_in:

90%

rst_out:

94%

Docstrings for ReStructuredText converters

midterm evals

Added recursive version of DOM->Moinwiki converter.

Added recursive version of DOM->ReStructuredText converter

Create basic structure of Mediawiki->DOM converter based on parser from mwlib

Sorry moin, last two weeks a had a brain f*ck with year science report and bureaucracy, most of the people who must to help me with this just waved their hands and get out to have their vacations.

What i've done this time:

I've delete Mediawiki->DOM converter based on mwlib, while i've testing i've found that mwlib parser results does'n correspond mwlib internal tree specification.

I've write basic Mediawiki->DOM conversion using regexp like in Moinwiki->DOM.

  • Conversion of tt/code/pre tags
  • Table attributes
  • Multiline text in table cell
  • Conversion of line_break
  • Conversion of external links, images
  • More tests of Mediawiki->DOM converter

  • fix conversion of images

pencils down date

MoinMoin: DmitryAndreev/GSoC2010/Diary/Summary (last edited 2010-07-15 08:41:07 by EugeneSyromyatnikov)