1 2011-09-15T00:10:43 *** jek
2 2011-09-15T00:10:43 *** jek
3 2011-09-15T00:16:30 * ThomasWaldmann plays with zlib
4 2011-09-15T01:25:22 * ThomasWaldmann pushes sqlite storage with optional zlib compression
5 2011-09-15T01:48:22 *** MattMaker
6 2011-09-15T01:52:45 *** MattMaker
7 2011-09-15T01:59:31 *** xjjk
8 2011-09-15T02:12:08 *** brunomartin
9 2011-09-15T02:41:15 * ThomasWaldmann pushes memcached "storage"
10 2011-09-15T03:38:25 *** xjjk
11 2011-09-15T04:19:21 *** MattMaker
12 2011-09-15T04:21:16 *** MattMaker
13 2011-09-15T04:34:37 *** MattMaker
14 2011-09-15T04:36:14 *** MattMaker
15 2011-09-15T05:28:22 *** Marchael
16 2011-09-15T06:39:38 *** MattMaker
17 2011-09-15T06:51:20 *** MattMaker
18 2011-09-15T07:24:34 <ronny> ThomasWaldmann: btw, how do we index on gae?
19 2011-09-15T07:28:33 *** raignarok
20 2011-09-15T07:29:01 <ronny> ThomasWaldmann: whoosh on gae is scary :P
21 2011-09-15T07:40:19 *** raignarok
22 2011-09-15T08:22:21 <dreimark> moin
23 2011-09-15T10:35:23 *** Marchael
24 2011-09-15T10:40:00 <ThomasWaldmann> ronny: did you try it? what's scary?
25 2011-09-15T10:41:15 <ronny> ThomasWaldmann: it stores index files as whole in the blobstore
26 2011-09-15T10:41:30 <ronny> (read as OMG FUCKING SCARRY)
27 2011-09-15T10:42:53 <ThomasWaldmann> can one improve this?
28 2011-09-15T10:44:33 <ronny> ThomasWaldmann: not easyly, whoosh has its own file format, i dont see how to map it to something thats efficient on gae
29 2011-09-15T10:52:24 <ThomasWaldmann> so it does request while index file at once from blobstore?
30 2011-09-15T10:52:40 <ThomasWaldmann> whole*
31 2011-09-15T10:53:21 <ThomasWaldmann> if so, does it keep the result in memory then?
32 2011-09-15T10:56:25 <waldi> ThomasWaldmann: i'm looking at the storage stuff now
33 2011-09-15T10:59:26 <ThomasWaldmann> i yesterday thought wether we maybe want to have a "cache" package additionally to "storage"
34 2011-09-15T10:59:38 <waldi> something i carry since some time: we need to partition the metadata keyspace.
35 2011-09-15T11:00:22 <ThomasWaldmann> but otoh, disks die also...
36 2011-09-15T11:00:59 <ThomasWaldmann> waldi: because of ACLs for them?
37 2011-09-15T11:01:34 <ThomasWaldmann> i still have that idea on my braindump wiki page
38 2011-09-15T11:01:58 <waldi> ThomasWaldmann: and to separate generated from mutable entries
39 2011-09-15T11:02:50 <waldi> where is that dump? i have to add my own thoughts to it
40 2011-09-15T11:08:33 <ThomasWaldmann> http://moinmo.in/ThomasWaldmann/Moin2BrainDump
41 2011-09-15T11:08:48 <ThomasWaldmann> might need an update due to last days work
42 2011-09-15T11:12:32 <ronny> ThomasWaldmann: why metadata prefixes? whouldnt it make sense to just have a nested mapping?
43 2011-09-15T11:14:46 <ronny> ThomasWaldmann: on a sidenote, do we really want *rpc, REST seems to be a better fit (imho)
44 2011-09-15T11:18:17 <ThomasWaldmann> ronny: yeah, could be done also
45 2011-09-15T11:22:21 <ThomasWaldmann> ronny: and about the rpc/other stuff: we'll do that after storage is done
46 2011-09-15T11:24:58 *** greg_f
47 2011-09-15T11:40:39 <waldi> ThomasWaldmann: the backend is reponsible to assign ids? is there a reason why the backend is not implemented with __getitem__, __delitem__ for the id access?
48 2011-09-15T11:44:26 <waldi> TrackingFileWrapper may be a little bit more reliable, aka check for pre- and postconditions (position of the stream, access to the hash)
49 2011-09-15T11:46:46 *** xjjk
50 2011-09-15T11:48:01 *** xjjk
51 2011-09-15T11:49:20 <waldi> ThomasWaldmann: is there a reason for splitting the router in a ro and a rw part? i can't use a rw router in front of a ro storage?
52 2011-09-15T11:49:42 <ThomasWaldmann> you mean assert tell() == 0 and raise some error if one reads hash without reading the stream fully?
53 2011-09-15T11:51:49 <waldi> something like that. the user must get EOF at least once before it may read the hash. also further calls to the read function should error out
54 2011-09-15T11:52:42 <ThomasWaldmann> ok
55 2011-09-15T11:53:29 <ThomasWaldmann> about __*item__ backend: i thought of that also,but ronny was somehow feeling uncomfortable with that.
56 2011-09-15T11:54:40 <waldi> does every request get its own backend objects or when is open/close called?
57 2011-09-15T11:55:38 <ThomasWaldmann> good question
58 2011-09-15T11:56:31 <ThomasWaldmann> it could be open for a longer time, as long as we make sure the acl stuff sees the correct user
59 2011-09-15T11:56:43 <waldi> this looks more like it could use a factory
60 2011-09-15T11:59:02 <waldi> or merge open into __init__
61 2011-09-15T12:00:19 <ThomasWaldmann> i didn't want open within init, because we may want to just declare some storage setup, but not really open it yet
62 2011-09-15T12:00:47 <ThomasWaldmann> somehow that stuff must be configurable
63 2011-09-15T12:04:14 <ThomasWaldmann> waldi: are you just looking or hacking on stuff you propose?
64 2011-09-15T12:04:46 <waldi> just looking write now
65 2011-09-15T12:05:02 <ThomasWaldmann> right :)
66 2011-09-15T12:05:17 <waldi> first comes the review, then the code
67 2011-09-15T12:05:27 <waldi> and i broke too much today already
68 2011-09-15T12:06:05 <ThomasWaldmann> btw, iirc i enabled issue tracking in that repo on bb
69 2011-09-15T12:10:32 <ThomasWaldmann> "No issues found. Try another search."
70 2011-09-15T12:11:03 <ThomasWaldmann> you can try as many searches there as you like, as there are 0 issues total, it won't find any. :D
71 2011-09-15T12:12:06 <waldi> i'm just changing this
72 2011-09-15T12:16:35 <ThomasWaldmann> parent stuff is still missing, yes
73 2011-09-15T12:19:24 *** brunomartin
74 2011-09-15T12:23:06 <ThomasWaldmann> you're filing one for the TrackingFileWrapper also?
75 2011-09-15T12:27:38 <waldi> done
76 2011-09-15T12:30:23 *** Marchael
77 2011-09-15T12:48:30 <waldi> ThomasWaldmann: why do you call it "metaid"? isn't it supposed to be unique?
78 2011-09-15T12:49:14 <ThomasWaldmann> "id" is a builtin and also way to generic
79 2011-09-15T12:49:26 <waldi> uuid?
80 2011-09-15T12:49:35 <ThomasWaldmann> i was a bit undecided between metaid and revid, though
81 2011-09-15T12:50:15 <waldi> ups, for the filesystem backend it is different but still unique. okay
82 2011-09-15T12:50:23 <ThomasWaldmann> it is important not to get confused by all those ids, so naming them all the same is maybe not that helpful
83 2011-09-15T12:51:13 <waldi> yeah
84 2011-09-15T12:51:32 <ThomasWaldmann> so metaid/revid points to the meta, meta has itemid as "coupling" and dataid to point to data
85 2011-09-15T12:52:05 <ThomasWaldmann> and soon maybe also parentids to point to parent(s)
86 2011-09-15T12:52:57 <ThomasWaldmann> also, it would give a name clash within the meta dict, because we have them all there
87 2011-09-15T12:53:26 <ThomasWaldmann> brb
88 2011-09-15T13:09:16 <ThomasWaldmann> re
89 2011-09-15T14:03:52 <ronny> re
90 2011-09-15T14:58:50 <ronny> ThomasWaldmann: i suppose we should switch indexing grouping to the best group
91 2011-09-15T15:01:11 <ThomasWaldmann> you mean for latest revs determination?
92 2011-09-15T15:01:23 <ThomasWaldmann> did you try that brand new whoosh code?
93 2011-09-15T15:02:47 <ThomasWaldmann> ronny:
94 2011-09-15T15:09:42 <ronny> ThomasWaldmann: currently looking at it
95 2011-09-15T15:10:08 <ronny> hmm, best does not 100% fit it seems
96 2011-09-15T15:12:32 <ThomasWaldmann> https://bitbucket.org/thomaswaldmann/storage-ng/issue/5/router-split-between-ro-and-rw-part#comment-654308
97 2011-09-15T15:21:38 <ThomasWaldmann> https://bitbucket.org/thomaswaldmann/storage-ng/issue/3/serialization-missing#comment-654321
98 2011-09-15T15:45:02 *** Marchael
99 2011-09-15T16:03:26 <brunomartin> hi ThomasWaldmann!
100 2011-09-15T16:28:19 <ThomasWaldmann> hi brunomartin
101 2011-09-15T16:36:31 <ThomasWaldmann> brunomartin: https://bitbucket.org/thomaswaldmann/storage-ng/issues?status=new&status=open
102 2011-09-15T16:37:07 <brunomartin> yeah, saw that!
103 2011-09-15T16:37:53 <brunomartin> ThomasWaldmann: btw, we are doing a three navigation for moin itens here... using the +index job as base...
104 2011-09-15T16:38:27 <brunomartin> *tree
105 2011-09-15T16:38:43 <brunomartin> using jstree
106 2011-09-15T16:41:43 <brunomartin> ThomasWaldmann: we are very busy here until next week... but after that, I think we can contribute on storage-ng....
107 2011-09-15T16:54:27 <ThomasWaldmann> ok
108 2011-09-15T17:10:20 <ThomasWaldmann> ronny: "it makes sense to have a serialization order mixed, ..." wut?
109 2011-09-15T17:11:18 <ronny> ThomasWaldmann: each data item behind the first meta item refering to it
110 2011-09-15T17:11:26 <ronny> so we can read it from the stream
111 2011-09-15T17:11:54 <ronny> meta we need to deserialize, data we want to stream
112 2011-09-15T17:16:11 <ThomasWaldmann> hmm, in fact the data should be first in the tar, so it is already in storage when the meta needs it right afterwards
113 2011-09-15T17:18:41 <ronny> ThomasWaldmann: that kills router level insertion
114 2011-09-15T17:18:53 <ThomasWaldmann> ah, right. no name.
115 2011-09-15T17:19:14 <ronny> ok, tarfile handles limited substreams
116 2011-09-15T17:22:18 <ThomasWaldmann> can we stream out a tarfile stream as a http response?
117 2011-09-15T17:22:30 <ThomasWaldmann> without building the file completely first
118 2011-09-15T17:23:30 <ronny> ThomasWaldmann: needs greenlets
119 2011-09-15T17:23:56 <ThomasWaldmann> why that?
120 2011-09-15T17:24:10 <ronny> you cant generate file cunks to send out
121 2011-09-15T17:24:31 <ronny> so you need to pass in a magic file object that gives you a iterator for wsgi
122 2011-09-15T17:25:28 <ronny> alternative would be a thread + a queue
123 2011-09-15T17:26:14 <ronny> ThomasWaldmann: we might want to go for custom chunking after all
124 2011-09-15T17:27:22 *** Marchael
125 2011-09-15T17:30:22 <ThomasWaldmann> if that streaming doesn't work with tarfile without special stuff, we could also use self-made xml
126 2011-09-15T17:30:42 <ThomasWaldmann> then we can implement as we need it
127 2011-09-15T17:30:49 <ronny> ThomasWaldmann: must it be xml?
128 2011-09-15T17:31:23 <ThomasWaldmann> json just has other issues :)
129 2011-09-15T17:31:41 <ronny> no, i'd prefer to go for a simple binary format
130 2011-09-15T17:31:45 <ThomasWaldmann> just check how to make it stream
131 2011-09-15T17:32:04 <ronny> else binary items become a huge mess
132 2011-09-15T17:32:19 <ThomasWaldmann> b64encode is not that difficult
133 2011-09-15T17:32:37 <ronny> they blat up and need chunking, thats not too nice
134 2011-09-15T17:33:03 <ThomasWaldmann> it worked for current moin2 code :)
135 2011-09-15T17:33:39 <ronny> that doesnt change the undesirable properties
136 2011-09-15T17:34:37 <ThomasWaldmann> ok, propose some format that is easy to generate and read by 3rd parties
137 2011-09-15T17:39:40 <ronny> ThomasWaldmann: store_revision might need some extra support for storing metadata only canges
138 2011-09-15T17:40:32 <ronny> also metaid cant be set
139 2011-09-15T17:40:51 <ronny> but no i have a very simple and generatable data format in mind
140 2011-09-15T17:41:27 <ronny> basically its a chain of frames
141 2011-09-15T17:41:56 <ThomasWaldmann> hmm tarfile has mode "w|"
142 2011-09-15T17:42:21 <ronny> ThomasWaldmann: needs a file
143 2011-09-15T17:42:38 <ronny> it starts with a meta frame, that is followed by a data frame if the metaid is not known yet (we might also want to use hash here)
144 2011-09-15T17:42:48 <ThomasWaldmann> if we give that a special file that just buffers writes into StringIO, we could yield whatever it has written after our write returns
145 2011-09-15T17:43:21 <ronny> ThomasWaldmann: write a 1000 mb data item, watch it blow up on small boxes
146 2011-09-15T17:43:33 <ThomasWaldmann> we would write blockwise
147 2011-09-15T17:44:19 <ThomasWaldmann> i think it could work
148 2011-09-15T17:44:26 <ThomasWaldmann> without special stuff
149 2011-09-15T17:44:26 <ronny> if we use uber-messy tar, the mangleing isnt worth the efford over creating a generator that yields stuff we can pass to wsgi/fp.writeall
150 2011-09-15T17:44:44 <ronny> and it would actually be harder to create for 3rd parties
151 2011-09-15T17:45:10 <ThomasWaldmann> tar was your suggestion :P
152 2011-09-15T17:45:27 <ronny> that was before we also wanted direct streaming
153 2011-09-15T17:45:41 <ThomasWaldmann> ah :)
154 2011-09-15T17:46:29 <ThomasWaldmann> ok, so we output a bunch of json and then tell "and now you read just as <size> meta tells"?
155 2011-09-15T17:46:34 <ronny> basically framed chunks in order of meta, data if previous dataid not known are very efficient for streaming it into a wiki
156 2011-09-15T18:05:57 <ThomasWaldmann> you're implementing it? :)
157 2011-09-15T18:09:01 <ThomasWaldmann> ronny:
158 2011-09-15T18:18:44 <ronny> ThomasWaldmann: will wire up a basic implementation in about a hour
159 2011-09-15T18:37:05 *** raignarok_
160 2011-09-15T18:39:23 *** raignarok_
161 2011-09-15T18:42:46 <ThomasWaldmann> ronny: http://pastebin.com/T5sNy53i < rough idea, incomplete still, gtg now
162 2011-09-15T18:48:14 *** raignarok
163 2011-09-15T18:59:09 <ronny> ThomasWaldmann: looks not good
164 2011-09-15T19:19:30 <ronny> back in ~ 40 min :(
165 2011-09-15T19:34:47 *** Marchael
166 2011-09-15T19:39:59 *** Marchael
167 2011-09-15T19:49:10 *** greg_f
168 2011-09-15T19:49:21 *** Marchael
169 2011-09-15T20:06:37 *** raignarok
170 2011-09-15T20:10:39 *** Marchael
171 2011-09-15T20:44:26 <ronny> re
172 2011-09-15T20:44:43 <ronny> hmm, never forget the double worst estimate rule
173 2011-09-15T20:52:24 <ThomasWaldmann> heh
174 2011-09-15T20:58:21 *** raignarok_
175 2011-09-15T20:58:44 <ronny> ThomasWaldmann: the result of the serialize function should be usable as wsgi iterable
176 2011-09-15T20:59:51 *** raignarok__
177 2011-09-15T21:01:31 *** raignarok
178 2011-09-15T21:02:09 *** raignarok__
179 2011-09-15T21:03:31 *** raignarok_
180 2011-09-15T21:04:36 <ronny> ThomasWaldmann: memcached tests fail?
181 2011-09-15T21:31:47 <ronny> ThomasWaldmann: meh, right now i wish we had classlevel generate_tests in pytest
182 2011-09-15T21:42:30 <ThomasWaldmann> ronny: well, i has some yields there...
183 2011-09-15T21:42:41 <ThomasWaldmann> you need memcached install on localhost
184 2011-09-15T21:46:41 <ronny> k
185 2011-09-15T21:47:13 <ronny> ThomasWaldmann: i'll have fun wireing up tests, so serialization in routing setups can be tested
186 2011-09-15T22:38:54 * ronny <3 werkzeug.wsgi.LimitedStream
187 2011-09-15T22:39:48 <ronny> meh
188 2011-09-15T22:39:56 <ronny> serialize/deserialize tells me some indexing fail
189 2011-09-15T22:57:27 <dreimark> re
190 2011-09-15T22:57:44 <ronny> ok, i got basic serialize/deserialize
191 2011-09-15T22:59:23 <ronny> ThomasWaldmann: see my push, im sure there is something missing tho
192 2011-09-15T23:06:23 <ThomasWaldmann> re
193 2011-09-15T23:06:57 * ThomasWaldmann looks
194 2011-09-15T23:14:24 <ThomasWaldmann> 62 item = source['name']
195 2011-09-15T23:14:29 <ThomasWaldmann> why '...'
196 2011-09-15T23:17:41 <ThomasWaldmann> ronny: ^^
197 2011-09-15T23:33:32 *** raignarok
198 2011-09-15T23:50:43 *** dreimark
199 2011-09-15T23:50:43 *** dreimark
200 2011-09-15T23:52:38 * ThomasWaldmann looks at kyoto cabinet
201