One sometimes find himself in the need for writing texts that have a specified length: Scientific writing, proposals, and journalism all require this sort of information. The WordCount macro could help you.

[[WordCount]]   - word count of this page
[[WordCount()]] - same
[[WordCount(4000)]] - this page, agains a target count of 4000
[[WordCount(FrontPage)]] - count FrontPage
[[WordCount(FrontPage,4000)]] - count FrontPage agains a target count of 4000
[[WordCount(PageOne,PageTwo)]] - count some pages
[[WordCount(PageOne,PageTwo,4000)]] - count some pages against a target count of 4000

[[WordCount(FrontPage,115)]] is the word count of the front page, compared with a target length of 115. The difference will be printed. If you include in the pages list the magic word subpages the subpages of the current page will be counted too.

Discussion

Does this macro ignore wiki markup when counting words?

A: not currently. Should it? If it should, is there a convenient regexp for wiki markup or should I make up mine? Let me see, what is markup-that-should-not-be-counted?

In an ideal world, the word counting would happen right after the HTML generation. I could easily strip all the HTML taggery and count the words. But I don't know how to do that: MoinMoinGods out there, suggestions?

Generally, the wiki parser is looking for wiki markup and print the text between the markup. The parser works like this:

for line in text:
    for markup in line:
        print text before markup
        replace markup
    print text after last markup

To get correct word count, you should write a new parser, that count the words in the text it finds, in this loop, and count words in text inside markup. This is not easy, but otherwise your word count not correct. Maybe just add "About <wordcount> words" instead.

Another idea, I think that all text should be printed using the formatter.text() calls. So maybe you can simply create a subclass of the text_html formatter, that count the number of words it prints. But the problem is the formatter prints directly to client, so the number of words is known only after all the page was formatter and sent to the client. You can redirect the page output into a buffer, insert a placeholder for the result of the word count, then insert it and send the page to the client.

Interface

About the interface, I think its confusing and has unneeded options. How about simpler syntax:

[[WordCount]] - word count of current page

Will print:

Syntax:

[[WordCount(children)]] - word count of current page with all children

Will print:

-- NirSoffer 2005-02-25 16:15:42 Words in this page WordCount

MoinMoin: WordCount (last edited 2011-12-05 18:19:41 by 68-116-31-34)