MoinMoinChat/Logs/moin-dev/2006-02-12

2006-02-12T00:05:13  <xorAxAx> [PERIODIC ANNOUNCEMENT] Logs can be found on http://moinmoin.wikiwikiweb.de/MoinMoinChat/Logs/moin-dev
2006-02-12T00:19:49  <ThomasWaldmann> "some code assumes request.page" - i like such precise comments 8(
2006-02-12T00:22:17  <xorAxAx> ThomasWaldmann: where is that comment?
2006-02-12T00:22:43  <ThomasWaldmann> lupy.py
2006-02-12T00:23:56  <xorAxAx> yeah, that problem led to an 1-hour discussion on #moin a few minutes ago
2006-02-12T00:24:03  <xorAxAx> (request.page)
2006-02-12T00:24:23  <ThomasWaldmann> a lupy problem?
2006-02-12T00:26:32  <xorAxAx> no
2006-02-12T00:27:13  <ThomasWaldmann> timestamp please
2006-02-12T00:28:07  <xorAxAx> its masterc. we finally resolved it on #python.de. short abstract - he was trying to use requestcgi and redirectoutput and got a traceback. it was solved by manually setting request.page
2006-02-12T00:34:03  <ThomasWaldmann> so what code uses request.page?
2006-02-12T00:34:46  <ThomasWaldmann> i only found some i18n stuff avoiding creating page objects if it is already there
2006-02-12T00:38:15  <xorAxAx> grep showed me ~ 10 files
2006-02-12T00:45:35  <ThomasWaldmann> but nothing related to lupy
2006-02-12T00:47:10  <xorAxAx> i didnt say that
2006-02-12T12:27:25  <ThomasWaldmann> moin fabi
2006-02-12T12:31:37  <Fabi> moin
2006-02-12T12:36:00  <ThomasWaldmann> there are some unanswered questions in lupy.py, time for answering?
2006-02-12T12:36:35  <xorAxAx> communication via source code comments? :)
2006-02-12T12:38:16  <ThomasWaldmann> no, via irc now as i edited the file and dont want edit conflicts
2006-02-12T12:40:43  <ThomasWaldmann> 1. is anything known already why optimizing destroys the index after a page update?
2006-02-12T12:42:03  <ThomasWaldmann> 2.                 # Some code assumes request.page
2006-02-12T12:42:03  <ThomasWaldmann>                 request.page = Page(request, pagename)
2006-02-12T12:42:03  <ThomasWaldmann>                 self._index_page(writer, request.page)
2006-02-12T12:42:52  <ThomasWaldmann> this is done at one place, but not at another place. a more precise comment than "some code" would be helpful
2006-02-12T12:44:39  <xorAxAx> some code are the ~ 5-10 files showing up using grep, of course
2006-02-12T12:44:52  <xorAxAx> (dont forget -r)
2006-02-12T12:47:20  <ThomasWaldmann> and how is that code related to the things done in lupy.py?
2006-02-12T12:47:32  <ThomasWaldmann> in _index_page?
2006-02-12T12:51:20  <xorAxAx> not at all?
2006-02-12T12:51:42  <xorAxAx> request.page needs to be set if the page is rendered
2006-02-12T12:51:50  <xorAxAx> the problem i have here - where does lupy render pages?
2006-02-12T12:52:07  <xorAxAx> ah, in order to check for page links maybe?
2006-02-12T12:52:13  <xorAxAx> but even then, rendering is separated
2006-02-12T12:52:19  <ThomasWaldmann> this is what I am wondering about. afaics, setting request.page is unnecessary
2006-02-12T12:52:21  <xorAxAx> (in the getpagelinks function)
2006-02-12T13:01:09  <Fabi> I found out that optimizing inserts pages into the index serveral times
2006-02-12T13:01:32  <Fabi> while the code assumes that for each term pages are returned only once
2006-02-12T13:05:39  <ThomasWaldmann> the problem after optimizing is that lupy crashes internally with IndexErrors
2006-02-12T13:05:52  <ThomasWaldmann> accessing some element 99 when there are only 3
2006-02-12T13:08:32  <Fabi> yes
2006-02-12T13:08:47  <Fabi> this is caused by the problem described above
2006-02-12T13:09:35  <ThomasWaldmann> so "the code" == "lupy code"
2006-02-12T13:12:16  <ThomasWaldmann> so if it is a lupy bug, why doesnt it happen when build and optimize are done in one go, without a page update in between?
2006-02-12T13:12:58  * ThomasWaldmann runs build/optimize with a deleted request.page
2006-02-12T13:14:27  <ThomasWaldmann> btw, that mergeFactor = 200... - at some places in the lucene wiki, they recommend rather low values
2006-02-12T13:14:46  <ThomasWaldmann> did 200 have some special reason (default being 20)?
2006-02-12T13:15:26  <ThomasWaldmann> they tell to open up to mergeFactor * 5 files iirc
2006-02-12T13:15:39  <ThomasWaldmann> (which sounds rather insane)
2006-02-12T13:24:13  <ThomasWaldmann> btw, running without request.page didnt trigger an exception
2006-02-12T13:33:43  <ThomasWaldmann> Fabi: why does "moi" give much more results than "moinmo" and why is that so different from "moinmoi" ?
2006-02-12T13:35:04  <ThomasWaldmann> if there are rules (and not just bugs) behind that, they should be documented
2006-02-12T13:49:30  <Fabi> mompl
2006-02-12T13:49:33  <ThomasWaldmann> http://moinmoin.wikiwikiweb.de/LupyIntegration see at bottom
2006-02-12T13:55:05  <Fabi> hmm, getting an idea what's going wrong...
2006-02-12T14:00:45  <Fabi> we are using the wrong kind of search term for title search
2006-02-12T14:00:59  <ThomasWaldmann> i should add that i changed the tokenizer a bit
2006-02-12T14:01:17  <Fabi> titles are split up into single words
2006-02-12T14:01:21  <ThomasWaldmann> it added the single words only if it was a CamelCase word
2006-02-12T14:01:28  <Fabi> and we use a Prefix search on these
2006-02-12T14:01:34  <Fabi> this can't work...
2006-02-12T14:01:39  <ThomasWaldmann> I added that it also adds the full word to the index
2006-02-12T14:02:41  <ThomasWaldmann> so there should be some problems less. but it still behaves strange.
2006-02-12T14:03:35  <Fabi> search.py line 355
2006-02-12T14:03:37  <Fabi> term = PrefixQuery(Term("title", pattern), 3)
2006-02-12T14:03:49  <Fabi> increase the 3 to a really large number
2006-02-12T14:04:02  <Fabi> and see if this reduces some of your strangeness
2006-02-12T14:06:15  <ThomasWaldmann> moin--main--1.5--patch-439
2006-02-12T14:08:28  <ThomasWaldmann> thanks, much better
2006-02-12T14:08:48  <ThomasWaldmann> 30 now
2006-02-12T14:09:14  <ThomasWaldmann> really large enough?
2006-02-12T14:09:20  <Fabi> its the number of chars which are ignored behind the match
2006-02-12T14:09:34  <Fabi> may be set to infinity
2006-02-12T14:09:48  <ThomasWaldmann> how?
2006-02-12T14:09:49  <Fabi> 10^6
2006-02-12T14:09:54  <ThomasWaldmann> ok
2006-02-12T14:09:54  <Fabi> should work
2006-02-12T14:10:12  <Fabi> .oO(may be we should allow None)
2006-02-12T14:12:04  <Fabi> but now it still should not work 100% correctly
2006-02-12T14:12:48  <Fabi> what about not only adding the full name but also the name from the beginning of a single word on
2006-02-12T14:13:17  <Fabi> so CamelCaseWord gets CamelCaseWord, CaseWord, Word
2006-02-12T14:13:55  <xorAxAx> how about Camel?
2006-02-12T14:14:25  <Fabi> camel gets found in CamelCaseWord as we use a prefix search
2006-02-12T14:14:31  <xorAxAx> ah
2006-02-12T14:15:25  <Fabi> we don't need to put Camel and Case into the index, I think
2006-02-12T14:15:43  <Fabi> but... mompl
2006-02-12T14:16:30  <Fabi> hmm, we use different queries for text and title search
2006-02-12T14:16:37  <Fabi> this might not be a good idea...
2006-02-12T14:17:04  <Fabi> .oO(yeppieeee, some more of these nice flip flop bugs)
2006-02-12T14:24:08  * ThomasWaldmann will work on attachment indexing / search later today as an excercise for next week's job
2006-02-12T14:30:14  <ThomasWaldmann> brb
2006-02-12T16:19:37  <ThomasWaldmann> http://science.slashdot.org/science/06/02/12/0738233.shtml <- LOL (see comments)
2006-02-12T23:58:31  <ThomasWaldmann> we have new plugins: filters
MoinMoin: MoinMoinChat/Logs/moin-dev/2006-02-12 (last edited 2007-10-29 19:13:49 by localhost)