2006-02-12T00:05:13 <xorAxAx> [PERIODIC ANNOUNCEMENT] Logs can be found on http://moinmoin.wikiwikiweb.de/MoinMoinChat/Logs/moin-dev
2006-02-12T00:19:49 <ThomasWaldmann> "some code assumes request.page" - i like such precise comments 8(
2006-02-12T00:22:17 <xorAxAx> ThomasWaldmann: where is that comment?
2006-02-12T00:22:43 <ThomasWaldmann> lupy.py
2006-02-12T00:23:56 <xorAxAx> yeah, that problem led to an 1-hour discussion on #moin a few minutes ago
2006-02-12T00:24:03 <xorAxAx> (request.page)
2006-02-12T00:24:23 <ThomasWaldmann> a lupy problem?
2006-02-12T00:26:32 <xorAxAx> no
2006-02-12T00:27:13 <ThomasWaldmann> timestamp please
2006-02-12T00:28:07 <xorAxAx> its masterc. we finally resolved it on #python.de. short abstract - he was trying to use requestcgi and redirectoutput and got a traceback. it was solved by manually setting request.page
2006-02-12T00:34:03 <ThomasWaldmann> so what code uses request.page?
2006-02-12T00:34:46 <ThomasWaldmann> i only found some i18n stuff avoiding creating page objects if it is already there
2006-02-12T00:38:15 <xorAxAx> grep showed me ~ 10 files
2006-02-12T00:45:35 <ThomasWaldmann> but nothing related to lupy
2006-02-12T00:47:10 <xorAxAx> i didnt say that
2006-02-12T12:27:25 <ThomasWaldmann> moin fabi
2006-02-12T12:31:37 <Fabi> moin
2006-02-12T12:36:00 <ThomasWaldmann> there are some unanswered questions in lupy.py, time for answering?
2006-02-12T12:36:35 <xorAxAx> communication via source code comments? :)
2006-02-12T12:38:16 <ThomasWaldmann> no, via irc now as i edited the file and dont want edit conflicts
2006-02-12T12:40:43 <ThomasWaldmann> 1. is anything known already why optimizing destroys the index after a page update?
2006-02-12T12:42:03 <ThomasWaldmann> 2. # Some code assumes request.page
2006-02-12T12:42:03 <ThomasWaldmann> request.page = Page(request, pagename)
2006-02-12T12:42:03 <ThomasWaldmann> self._index_page(writer, request.page)
2006-02-12T12:42:52 <ThomasWaldmann> this is done at one place, but not at another place. a more precise comment than "some code" would be helpful
2006-02-12T12:44:39 <xorAxAx> some code are the ~ 5-10 files showing up using grep, of course
2006-02-12T12:44:52 <xorAxAx> (dont forget -r)
2006-02-12T12:47:20 <ThomasWaldmann> and how is that code related to the things done in lupy.py?
2006-02-12T12:47:32 <ThomasWaldmann> in _index_page?
2006-02-12T12:51:20 <xorAxAx> not at all?
2006-02-12T12:51:42 <xorAxAx> request.page needs to be set if the page is rendered
2006-02-12T12:51:50 <xorAxAx> the problem i have here - where does lupy render pages?
2006-02-12T12:52:07 <xorAxAx> ah, in order to check for page links maybe?
2006-02-12T12:52:13 <xorAxAx> but even then, rendering is separated
2006-02-12T12:52:19 <ThomasWaldmann> this is what I am wondering about. afaics, setting request.page is unnecessary
2006-02-12T12:52:21 <xorAxAx> (in the getpagelinks function)
2006-02-12T13:01:09 <Fabi> I found out that optimizing inserts pages into the index serveral times
2006-02-12T13:01:32 <Fabi> while the code assumes that for each term pages are returned only once
2006-02-12T13:05:39 <ThomasWaldmann> the problem after optimizing is that lupy crashes internally with IndexErrors
2006-02-12T13:05:52 <ThomasWaldmann> accessing some element 99 when there are only 3
2006-02-12T13:08:32 <Fabi> yes
2006-02-12T13:08:47 <Fabi> this is caused by the problem described above
2006-02-12T13:09:35 <ThomasWaldmann> so "the code" == "lupy code"
2006-02-12T13:12:16 <ThomasWaldmann> so if it is a lupy bug, why doesnt it happen when build and optimize are done in one go, without a page update in between?
2006-02-12T13:12:58 * ThomasWaldmann runs build/optimize with a deleted request.page
2006-02-12T13:14:27 <ThomasWaldmann> btw, that mergeFactor = 200... - at some places in the lucene wiki, they recommend rather low values
2006-02-12T13:14:46 <ThomasWaldmann> did 200 have some special reason (default being 20)?
2006-02-12T13:15:26 <ThomasWaldmann> they tell to open up to mergeFactor * 5 files iirc
2006-02-12T13:15:39 <ThomasWaldmann> (which sounds rather insane)
2006-02-12T13:24:13 <ThomasWaldmann> btw, running without request.page didnt trigger an exception
2006-02-12T13:33:43 <ThomasWaldmann> Fabi: why does "moi" give much more results than "moinmo" and why is that so different from "moinmoi" ?
2006-02-12T13:35:04 <ThomasWaldmann> if there are rules (and not just bugs) behind that, they should be documented
2006-02-12T13:49:30 <Fabi> mompl
2006-02-12T13:49:33 <ThomasWaldmann> http://moinmoin.wikiwikiweb.de/LupyIntegration see at bottom
2006-02-12T13:55:05 <Fabi> hmm, getting an idea what's going wrong...
2006-02-12T14:00:45 <Fabi> we are using the wrong kind of search term for title search
2006-02-12T14:00:59 <ThomasWaldmann> i should add that i changed the tokenizer a bit
2006-02-12T14:01:17 <Fabi> titles are split up into single words
2006-02-12T14:01:21 <ThomasWaldmann> it added the single words only if it was a CamelCase word
2006-02-12T14:01:28 <Fabi> and we use a Prefix search on these
2006-02-12T14:01:34 <Fabi> this can't work...
2006-02-12T14:01:39 <ThomasWaldmann> I added that it also adds the full word to the index
2006-02-12T14:02:41 <ThomasWaldmann> so there should be some problems less. but it still behaves strange.
2006-02-12T14:03:35 <Fabi> search.py line 355
2006-02-12T14:03:37 <Fabi> term = PrefixQuery(Term("title", pattern), 3)
2006-02-12T14:03:49 <Fabi> increase the 3 to a really large number
2006-02-12T14:04:02 <Fabi> and see if this reduces some of your strangeness
2006-02-12T14:06:15 <ThomasWaldmann> moin--main--1.5--patch-439
2006-02-12T14:08:28 <ThomasWaldmann> thanks, much better
2006-02-12T14:08:48 <ThomasWaldmann> 30 now
2006-02-12T14:09:14 <ThomasWaldmann> really large enough?
2006-02-12T14:09:20 <Fabi> its the number of chars which are ignored behind the match
2006-02-12T14:09:34 <Fabi> may be set to infinity
2006-02-12T14:09:48 <ThomasWaldmann> how?
2006-02-12T14:09:49 <Fabi> 10^6
2006-02-12T14:09:54 <ThomasWaldmann> ok
2006-02-12T14:09:54 <Fabi> should work
2006-02-12T14:10:12 <Fabi> .oO(may be we should allow None)
2006-02-12T14:12:04 <Fabi> but now it still should not work 100% correctly
2006-02-12T14:12:48 <Fabi> what about not only adding the full name but also the name from the beginning of a single word on
2006-02-12T14:13:17 <Fabi> so CamelCaseWord gets CamelCaseWord, CaseWord, Word
2006-02-12T14:13:55 <xorAxAx> how about Camel?
2006-02-12T14:14:25 <Fabi> camel gets found in CamelCaseWord as we use a prefix search
2006-02-12T14:14:31 <xorAxAx> ah
2006-02-12T14:15:25 <Fabi> we don't need to put Camel and Case into the index, I think
2006-02-12T14:15:43 <Fabi> but... mompl
2006-02-12T14:16:30 <Fabi> hmm, we use different queries for text and title search
2006-02-12T14:16:37 <Fabi> this might not be a good idea...
2006-02-12T14:17:04 <Fabi> .oO(yeppieeee, some more of these nice flip flop bugs)
2006-02-12T14:24:08 * ThomasWaldmann will work on attachment indexing / search later today as an excercise for next week's job
2006-02-12T14:30:14 <ThomasWaldmann> brb
2006-02-12T16:19:37 <ThomasWaldmann> http://science.slashdot.org/science/06/02/12/0738233.shtml <- LOL (see comments)
2006-02-12T23:58:31 <ThomasWaldmann> we have new plugins: filters
MoinMoin: MoinMoinChat/Logs/moin-dev/2006-02-12 (last edited 2007-10-29 19:13:49 by localhost)