| 14:23.08 | *** join/#infobot TimRiker (n=timr@70.1.153.92) |
| 14:23.08 | *** mode/#infobot [+o TimRiker] by ChanServ |
| 15:33.48 | CIA-26 | infobot: 03timriker * r1635 10/trunk/ (README infobot src/Factoids/Core.pl src/Modules/Units.pl): more bloot refs |
| 15:34.56 | troubled | TimRiker: at least it wasnt blootbot svn this time ;) |
| 15:38.15 | TimRiker | heh. yeah. my kid's at a chess tournament, so I'm hacking on the bot. planning to move over to infobot today. |
| 15:47.02 | TimRiker | what happened to textstats_main ? |
| 15:53.26 | troubled | TimRiker: textstats_main? |
| 15:53.39 | troubled | doesnt ring any bells |
| 15:57.01 | troubled | btw, there is still a 3600s timeout problem. havent been able to track it down yet though. i spent a bit looking, but its going to require a deep search possibly as to why. |
| 15:57.33 | *** join/#infobot abot (n=timr@pdpc/supporter/active/TimRiker/bot/apt) |
| 15:57.33 | *** mode/#infobot [+o abot] by ChanServ |
| 15:57.45 | TimRiker | abot: version |
| 15:57.47 | troubled | seems though that the ping/pong just stopped. could have been related to the scheduler changes I made, but I just cant see missing that one. tbh, I think it only worked before because the scheduler could be random |
| 15:58.27 | TimRiker | so does the bot disconnect now? or what happens? |
| 15:58.33 | TimRiker | abot: owner |
| 15:58.38 | troubled | so when I fixed it and it became 3600 exactly, it could have started. its hard to notice though since I think it only happens when it doesnt get any channel events |
| 15:58.41 | TimRiker | abot: are you alive? |
| 15:59.02 | TimRiker | abot: bzflist |
| 15:59.04 | abot | s=1 |
| 15:59.08 | troubled | ya, just times out, and reconnects the alt nick. but the second one never times out |
| 15:59.48 | TimRiker | hmm. |
| 15:59.48 | TimRiker | what happens with multiple nicks then? |
| 15:59.48 | troubled | unless my perl libs got messed up some how |
| 16:00.07 | TimRiker | infobot: bzflist |
| 16:00.15 | TimRiker | abot: bzflist |
| 16:00.17 | abot | s=1 |
| 16:00.22 | TimRiker | something is broken. |
| 16:00.27 | troubled | it doesnt try to recover back to the first nick. i havent tried using 2 or 3 simultaneous nicks though. didnt know you could til I ran into you in the channel that day |
| 16:02.12 | troubled | its odd though since it doesnt affect ravenbird. im suspecting I only noticed cause my bots in idle channels, his isnt |
| 16:02.59 | CIA-26 | infobot: 03timriker * r1636 10/trunk/files/sample/infobot.chan: +#infobot |
| 16:03.23 | troubled | cheater :p |
| 16:03.29 | TimRiker | :) |
| 16:06.28 | troubled | i recall spending an hour or so looking into it, but I think the ping/pongs were all fine |
| 16:07.13 | troubled | err, what I mean is that the code for the ping pong looked ok, but for some reason it just wasnt happened. so i get the feeling ravenbird is only staying online because of the channel spam |
| 16:07.15 | *** join/#infobot |bot (n=timr@pdpc/supporter/active/TimRiker/bot/apt) |
| 16:07.16 | *** mode/#infobot [+o |bot] by ChanServ |
| 16:07.34 | TimRiker | |bot: foo |
| 16:07.35 | |bot | [foo] bar |
| 16:07.38 | troubled | im hanging around the keyboard though, so feel free to hilight me as needed |
| 16:07.50 | TimRiker | kk |
| 16:08.04 | TimRiker | ~bzflist |
| 16:08.06 | |bot | s=1 |
| 16:09.03 | *** join/#infobot |bot (n=timr@pdpc/supporter/active/TimRiker/bot/apt) |
| 16:09.03 | *** mode/#infobot [+o |bot] by ChanServ |
| 16:09.08 | TimRiker | ~bzflist |
| 16:09.13 | |bot | s=263 PLAYERS(263) BZFS0026(233) bzflagr.net:5154(32) BZFS0058(30) bzflag.planet-mofo.com:4200(21) 06.bztank.net:5159(19) bzexcess.com:25155(15) bzexcess.com:25154(15) 01.bztank.net:5154(12) badgerking.fiedlers.us:5667(9) fairserve.bzflag.org:5220(8) bzflag2.norang.ca:4418(8) bzf.bzflag.org:5155(8) 81.171.22.238(7) dkdk.homelinux.com:5154(6) 05.bztank.net:5158(6) viper2.pimpi.org:5158(5) dub.bzflag.org:5150(5) bzflag2.norang.ca:4409(4) |
| 16:09.39 | TimRiker | it helps if I remove my broken proxy setting from the config. |
| 16:19.16 | TimRiker | |bot: chanset autojoin |
| 16:19.57 | TimRiker | |bot: chanset autojoin |
| 16:19.58 | |bot | Showing autojoin values on all channels... |
| 16:19.58 | |bot | <PROTECTED> |
| 16:19.58 | |bot | <PROTECTED> |
| 16:19.59 | |bot | <PROTECTED> |
| 16:19.59 | |bot | End of list. |
| 16:20.16 | TimRiker | |bot: chanstats |
| 16:20.17 | |bot | I'm on 4 channels: #botpark/21, #debian-bots/14, #infobot/10 |
| 16:20.17 | |bot | i've cached 45 users, 36 unique users, distributed over 4 channels. |
| 16:20.46 | TimRiker | ~version |
| 16:20.46 | infobot | it has been said that version is Kernel: uname -r. Debian: cat /etc/debian_version. and: dpkg -l libc6 | tail -1. Package: dpkg -l _pkgname_. Miscellaneous: try --version on the command. check /etc/apt/sources.list; run apt-cache policy <packagename>, or lsb_release -d |
| 16:21.38 | TimRiker | ~status |
| 16:21.38 | infobot | Since Sun Mar 9 03:58:25 2008, there have been 58 modifications, 748 questions, 0 dunnos, 0 morons and 456 commands. I have been awake for 6d 12h 23m 13s this session, and currently reference 114513 factoids. I'm using about 18824 kB of memory. With 0 active forks. Process time user/system 13316.16/535.33 child 0.02/0.07 |
| 16:21.39 | |bot | Since Sat Mar 15 16:08:48 2008, there have been 0 modifications, 0 questions, 0 dunnos, 0 morons and 1 command. I have been awake for 12m 50s this session, and currently reference 1 factoids. I'm using about 16660 kB of memory. With 0 active forks. Process time user/system 0.79/0.04 child 0/0 |
| 16:25.38 | TimRiker | ~convert 1 inch to microlightyears |
| 16:25.40 | |bot | 1 inch cannot be converted to microlightyears: |
| 16:25.54 | TimRiker | who broke units.pl and why? |
| 16:27.41 | TimRiker | ah. converted to use the perl one. don't think that's a Good Thing actually. |
| 16:27.51 | TimRiker | ~convert 1 inch to microlightyears |
| 16:27.53 | |bot | 1 inch cannot be converted to microlightyears: |
| 16:28.02 | TimRiker | nice brokenness. |
| 16:29.02 | TimRiker | my $response = readline ($units); huh? |
| 16:30.58 | troubled | I think we went over the change to the packages unit convertion with dondel a while back |
| 16:31.30 | TimRiker | ~convert 1 inch to microlightyears |
| 16:31.31 | troubled | no idea how readline worked its way in there |
| 16:31.32 | |bot | 1 inch is approximately 2.68478e-12 microlightyears |
| 16:31.44 | TimRiker | ah... it's NOT the perl units, its the command line "units" |
| 16:31.50 | troubled | ya |
| 16:32.03 | TimRiker | and no useful message when it's broken it appears. |
| 16:32.19 | troubled | nor mention of the dependency in the readme :) |
| 16:32.34 | troubled | least i dont think i added one for it |
| 16:32.37 | TimRiker | !WARN! ircCheck: ident(|bot) != param{ircNick}(abot,|bot). |
| 16:33.02 | TimRiker | that's broken. though it might have been broken before too. |
| 16:33.36 | troubled | i seem recall knowing about that. cant remember exactly what it was atm though |
| 16:34.03 | troubled | i think the irccheck code has to be moved into the scheduler.pl or something |
| 16:34.17 | troubled | something to do with variable scope iirc |
| 16:34.45 | troubled | its all a little hazzy for me though, might be thinking of something else |
| 16:38.06 | TimRiker | ~convert 12 pence to sixpence |
| 16:38.08 | |bot | 12 pence cannot be converted to sixpence: Unknown unit 'sixpence' |
| 16:38.22 | TimRiker | ~convert 12 inches to barleycorns |
| 16:38.24 | |bot | 12 inches is approximately 36.0001 barleycorns |
| 16:39.44 | TimRiker | ~convert 12 louis to livres |
| 16:39.46 | |bot | 12 louis cannot be converted to livres: Unknown unit 'louis' |
| 16:40.57 | TimRiker | so generally we threw away he working full featured one and added a smaller limited version. why did we do that again? |
| 16:41.11 | TimRiker | s/ he/the/ |
| 16:41.12 | |bot | TimRiker meant: so generally we threw awaythe working full featured one and added a smaller limited version. why did we do that again? |
| 16:41.16 | troubled | checked my logs. havent seen that ircCheck: ident" error you got. guess i never ran into cause i never used it |
| 16:41.56 | TimRiker | yeah, it would only happen with multiple nicks. |
| 16:42.02 | troubled | i think it was to do with not having to maintain the convertions and remove the file from the src wasnt it? |
| 16:42.06 | TimRiker | just not sure if it used to happen or not. |
| 16:42.13 | troubled | I can pull up the conversation if you like |
| 16:42.19 | TimRiker | sure. |
| 16:42.31 | troubled | give me a few minutes |
| 16:42.57 | TimRiker | was there a lot of work in maintaining the conversions? /me does not recall that there was. |
| 16:44.48 | troubled | i thought it was because there was inaccuracies, but I might be thinking of currency convertions |
| 16:46.54 | TimRiker | dunno |
| 16:47.29 | troubled | waiting to get a rough date of the conversation. takes a bit :) |
| 16:48.43 | TimRiker | ~convert 1 `ls` to `cat` |
| 16:48.45 | |bot | 1 `ls` cannot be converted to `cat`: Unknown unit '`ls`' |
| 16:49.40 | TimRiker | ~convert 1 `ls>foo` to `cat` |
| 16:49.42 | |bot | 1 `ls>foo` cannot be converted to `cat`: Unknown unit '`ls>foo`' |
| 16:51.03 | troubled | < dondelelcaro> troubled: NewUnits.pl calls units which uses the full /usr/share/misc/units.dat file; our Units.pl sucks in comparison |
| 16:51.29 | troubled | oct. 19th 2007 is the day we are after |
| 16:52.23 | TimRiker | hmm. k |
| 16:52.57 | troubled | i think it may have been afected by your changes of the bot and don's as well though |
| 16:55.05 | TimRiker | ~convert 1 yotaainch to lightyears |
| 16:55.07 | |bot | 1 yotaainch cannot be converted to lightyears: Unknown unit 'yotaainch' |
| 16:55.52 | TimRiker | ~convert 1 yottainch to lightyears |
| 16:55.54 | |bot | 1 yottainch is approximately 2.68478e+06 lightyears |
| 16:56.16 | TimRiker | ~convert 1 yottameter to metres |
| 16:56.18 | |bot | 1 yottameter is approximately 1e+24 metres |
| 16:57.25 | TimRiker | ~convert 1 zettameter to metres |
| 16:57.28 | |bot | 1 zettameter is approximately 1e+21 metres |
| 17:00.15 | TimRiker | ~convert 1 zettainch to metres |
| 17:00.16 | |bot | 1 zettainch is approximately 2.54e+19 metres |
| 17:01.14 | TimRiker | I think the new one is right actually. there was a debate back when yotta was added. unsure outcome. |
| 17:02.13 | troubled | well, i got the conversation with don about it in front of me. just cheap arse sql program I used only exports as cvs :/ |
| 17:04.46 | troubled | nor does it escape the delimeter....grrr |
| 17:05.33 | TimRiker | heh. s'ok. I'll go with the flow and keep the new one. |
| 17:06.07 | TimRiker | wish it would do us temp conversion |
| 17:07.19 | troubled | sorry, normally im in windows where Im already setup to access the logs. just put ubuntu on though yesterday so i havent had a chance to do much yet but config stuff |
| 17:07.52 | CIA-26 | infobot: 03timriker * r1637 10/trunk/src/Modules/Units.pl: might as well strict |
| 17:15.22 | troubled | TimRiker: http://pastebin.ca/raw/943836 (you got 15 minutes) |
| 17:15.37 | troubled | look around 6am - 8am when we talk about it |
| 17:15.56 | troubled | sorry for the delay, firefox was being braindead until i went through the proxy |
| 17:17.37 | troubled | it seems that mebi units were the big concern since #debian users practically live off of that converstion unit |
| 17:18.52 | troubled | seems I renamed his NewUnits.pm to Units and went with that |
| 17:24.07 | TimRiker | http://ibot.rikers.org/%23infobot/20071019.html.gz |
| 17:24.53 | troubled | i was going to mention the bot logs, but figured you would have already after I gave the date :) |
| 17:25.17 | troubled | I poked don in #-bots, but no idea if he's around atm |
| 17:25.40 | troubled | hes been afk for like 14 hours though |
| 17:29.49 | TimRiker | k. I'm in.. just missed the conversation back when it happened. |
| 17:30.40 | troubled | i think we (you and I) talked about it as well at some point iirc, but probably +/- a day or two |
| 17:37.42 | TimRiker | yeah, I vaguely recall something. I'm just reviewing changes prior to moving my bots over. |
| 17:38.18 | TimRiker | would be nice if tcp version shows the subversion rev. |
| 17:39.51 | troubled | the bot version issue never was finalised and I didnt get a chance to actually decide on a naming/numbering scheme |
| 17:40.46 | troubled | could just go with a 1.$rev or something for now and just put the $Id$ tag in the version file and alter the parsing a tad |
| 17:41.37 | troubled | my problem with that was that the file way wouldnt work unless it changed manually each rev |
| 17:42.18 | TimRiker | there's an entry like that in debian bugs, but it's not filled in. q$Rev: 22 $ should get filled in, no? |
| 17:43.40 | TimRiker | what's the point in putting the version in a separate file? just as easy to touch the actual file, no? or is it that core.pl does not reload with "reload" ? |
| 17:43.53 | TimRiker | ~reload |
| 17:43.53 | infobot | reloading... |
| 17:43.53 | infobot | reloaded: |
| 17:43.54 | |bot | reloading... |
| 17:43.54 | |bot | reloaded: |
| 17:44.28 | TimRiker | |bot: die |
| 17:44.58 | TimRiker | !WARN! PERL: DBD::SQLite2::st execute failed: columns nick, channel are not unique(1) at dbdimp.c line 419 at ./src/dbi.pl line 440. |
| 17:46.10 | TimRiker | might be trying to update seen once per connection. |
| 17:46.44 | troubled | hmmm, i didnt put much into sqlite tests tbh. im a mysql man |
| 17:47.19 | troubled | sqlite is nice though, but between HEAD, your db and don's, we got too much legacy to preserve |
| 17:53.33 | CIA-26 | infobot: 03timriker * r1638 10/trunk/src/ (IRC/Schedulers.pl Modules/Debian.pl modules.pl): ws |
| 17:54.08 | troubled | ws? |
| 17:55.05 | troubled | TimRiker: did you update your schema yet to the current one? |
| 17:55.32 | troubled | might want to csv dump it or something and reorganize a bit just in case |
| 17:55.45 | *** join/#infobot Ttech (n=ttech@fullcirclemagazine/developer/ttech) |
| 18:01.24 | TimRiker | ~ws |
| 18:01.24 | infobot | hmm... ws is short for workstation. White Space, or the country code for Western Samoa |
| 18:01.33 | TimRiker | white space in this case. |
| 18:01.52 | TimRiker | troubled: nope. what's needed? |
| 18:02.51 | troubled | not sure what your schema diff is, might want to dump schema and compare to HEAD sqlite schema though in case that was causing the problem |
| 18:03.17 | troubled | while I dont use it much, I do recall not having any problems with a fresh bot using sqlite |
| 18:03.20 | TimRiker | are we tracking "seen" in multiple channels now per nick? |
| 18:03.38 | troubled | not sure |
| 18:03.42 | TimRiker | ~seen ravenbird |
| 18:03.45 | infobot | ravenbird is currently on #botpark (1d 17h 51m 33s) #debian (1d 17h 51m 33s) #infobot (1d 17h 51m 33s) #debianppc (1d 17h 51m 33s) #debian-bots (1d 17h 51m 33s). Has said a total of 1 messages. Is idling for 1d 4h 5m 35s, last said: 'Welcome back xk, o lonely traveller amongst the TCP/IP packets of chalk.'. |
| 18:03.46 | ravenbird | You are moron #1, infobot |
| 18:03.55 | troubled | heh |
| 18:04.02 | TimRiker | hehehe |
| 18:04.03 | troubled | aparently |
| 18:04.18 | TimRiker | ~seen infobot |
| 18:04.19 | infobot | infobot is currently on #utos (1d 23m) #maemo (1d 23m) #fredlug (1d 23m) #opensimpad.org (1d 23m) #android (1d 23m) #asterisk-bugs (1d 23m) #udbug (1d 23m) # (1d 23m) #wowhead (1d 23m) #infobot (1d 23m) #lugot (1d 23m) #mseide (1d 23m) #wowwiki (1d 23m), last said: 'TimRiker meant: fixing it now to not msg if ... |
| 18:04.25 | TimRiker | ^seen infobot |
| 18:04.28 | ravenbird | TimRiker: i haven't seen 'infobot' |
| 18:04.34 | TimRiker | aw |
| 18:05.21 | troubled | not sure if he runs +seen in here or not |
| 18:05.54 | troubled | im assuming no from the reply, but who knows when it comes to bugs :) |
| 18:17.42 | *** join/#infobot |bot (n=timr@pdpc/supporter/active/TimRiker/bot/apt) |
| 18:17.42 | *** mode/#infobot [+o |bot] by ChanServ |
| 18:38.16 | simonrvn | troubled: hi |
| 18:44.21 | troubled | simonrvn: did you have +seen in here? |
| 18:46.10 | simonrvn | *shrug* |
| 18:53.13 | simonrvn | ^reload |
| 18:53.13 | ravenbird | reloading... |
| 18:53.14 | ravenbird | reloaded: |
| 18:53.22 | simonrvn | ^die |
| 18:56.42 | *** join/#infobot ravenbird (i=ravenbir@unaffiliated/simonrvn/bot/ravenbird) |
| 19:18.17 | TimRiker | why do we index seen as nick,channel ? |
| 19:18.58 | TimRiker | should be unique on nick from what I can see. |
| 19:20.14 | TimRiker | I'm also a non-fan of having 2 separate sets of sql create scripts. there are differences there that should not be there. In fact, I don't know of any differences that _should_ be there. |
| 19:20.20 | simonrvn | indeed. i think i edited my table to be that way |
| 19:20.25 | TimRiker | er make that 4 separate sets. |
| 19:21.08 | TimRiker | anyone recall who split them up and why? |
| 19:22.31 | simonrvn | troubled's in trouble :P |
| 19:23.33 | troubled | TimRiker: i split em due to differences in syntax |
| 19:23.55 | troubled | sqlite lite's syntax is limited compared to say pgsql |
| 19:24.51 | troubled | as for sqlite and sqlite2 schemas, i wasnt sure about them back then and was thinking ahead in case they needed to be different |
| 19:25.07 | troubled | sqlite and sqlite2's schemas can probably be merged to 1 |
| 19:25.16 | simonrvn | sqlite3 has better real sql syntax than 2 |
| 19:25.53 | troubled | i know mysql and pgsql _must_ be different though. unless we are going to go with some sort of odbc interface to it all |
| 19:26.22 | simonrvn | that would be better, long term |
| 19:26.46 | TimRiker | troubled: so what is actually different? is there not one syntax that works for all? then only one place to maintain it? |
| 19:27.03 | TimRiker | what _must_ be different? |
| 19:27.06 | troubled | no, pgsql and mysql syntax is completely diff |
| 19:27.17 | simonrvn | because then you abstract data access, and simplify code |
| 19:27.18 | troubled | mainly quotes syntax and field types though |
| 19:28.03 | troubled | diff one of files in the schemas and you will see what I mean |
| 19:28.08 | TimRiker | there's a revoke in the pg, but that's not needed, it's just there. |
| 19:28.23 | troubled | which file? |
| 19:28.36 | simonrvn | that's why things like mysql2pgsql exist ;) |
| 19:28.36 | TimRiker | simonrvn: no, in this case, you hide bugs by not testing all of the different dbs. |
| 19:28.48 | TimRiker | they are out of sync now. ie: bug. |
| 19:29.24 | TimRiker | simonrvn: so infact, all you have done is hidden interfaces, complicated the testing, and introduced bugs. |
| 19:29.27 | simonrvn | hm, good point |
| 19:29.35 | troubled | as for the index on nick,channel, I believe it may be an efficiency thing because of the way the sql select is done. i would have to look at it closer to be sure if its needed though |
| 19:30.21 | TimRiker | the question is... should there be multiple entries for a nick in there or not. I say not, but having nick,channel means that there can be. |
| 19:31.57 | troubled | no, there shouldnt, but we have to keep it for legacy. tbh, the schemas we have are so terribly designed, it would be better to start from scratch, but then I piss the hell out of dondel |
| 19:32.20 | troubled | there is absolutely _0_ normalization, but we cant just change it without breaking all your dbs |
| 19:33.00 | troubled | so yes, there should be a nick table, referenced by id instead of 40000 nick dupes. but unless you guys are going to change your dbs, you will loose everything |
| 19:33.57 | troubled | i figure I could take don's db to 40% of its current size just from normalizing a bit |
| 19:34.00 | TimRiker | no, I'm not a fan of a separate nick table. nick is passed in as text, it should be stored that way imho. |
| 19:34.16 | *** join/#infobot cheese (n=Rambo111@adsl-10-128-163.mia.bellsouth.net) |
| 19:34.23 | troubled | you should take a look at my db logger schema to see what i mean |
| 19:34.29 | cheese | hey, I was hoping to get infobot working, but I'm having issues |
| 19:35.08 | TimRiker | troubled: imho it's silly to log to a db, so you get what you asked for. :) |
| 19:35.11 | cheese | it won't seem to work both with my mysql database, and also isn't working when I give it the root login/passwd |
| 19:35.22 | troubled | TimRiker: for proper normalization and db size savings, the nick, ident _and_ hostname must be seperate |
| 19:35.47 | cheese | ohhh... wait |
| 19:35.48 | TimRiker | troubled: so that silly log tables are smaller? log to text. it's a _log_ |
| 19:35.49 | troubled | but the space savings would be minimal compared to my setup |
| 19:36.04 | cheese | blootbot isn't infobot is it? |
| 19:36.08 | troubled | TimRiker: you seriously underestimate my db :) |
| 19:36.19 | troubled | its very efficient |
| 19:36.19 | TimRiker | cheese: blootbot is now the new infobot, yes. |
| 19:36.50 | TimRiker | troubled: my 9 years of text logs are pretty efficient too. |
| 19:37.04 | troubled | not like this |
| 19:37.18 | TimRiker | you underestimate bzip2 |
| 19:37.21 | troubled | log files lack full information for the queries I do |
| 19:38.07 | TimRiker | like? |
| 19:38.46 | troubled | irc clients (and thus the db) store the full user info for each msg. logs only store nicks typically |
| 19:38.58 | *** part/#infobot cheese (n=Rambo111@adsl-10-128-163.mia.bellsouth.net) |
| 19:39.16 | troubled | so parsing logs, its impossible at the forensic level to know for sure that a nicks host hasnt changed until they quit while you are logging |
| 19:39.39 | troubled | then you run into problems like cut off logs which are hard to detect by script |
| 19:41.10 | troubled | and theres the fact that the db is fully index'd. so you finding something in a sequential search could take minutes versus a 5-10 second search in the db for the same thing |
| 19:42.39 | troubled | my db design is at least efficient enough that it compressed so well that rzip only saved 2mb of size over bzip2. and rzip is design to highly compress large files (900mb uncompressed or so iirc) with poor normalization |
| 19:42.53 | TimRiker | the logs record nick changes. so it'd doable. |
| 19:43.04 | troubled | if the avg(length(nick)) is < sizeof(int4), I save space |
| 19:43.21 | troubled | thats the kind of logic and planning that went into my design and still holds true. |
| 19:43.33 | TimRiker | you presume that I often want to search 9 years of logs, which I don't. |
| 19:43.57 | troubled | its doable with a degree of inaccuracy. ive been over it with someone with years of logs running a test import script already |
| 19:44.09 | TimRiker | what I _do_ want are daily logs that google can index and folks can download. |
| 19:44.19 | TimRiker | db logs don't give me either of those. |
| 19:44.26 | troubled | true, but there is a problem with accuracy in standard irssi logs |
| 19:44.38 | simonrvn | unless you parse them out to flat file via script |
| 19:44.46 | TimRiker | hence the reason I don't use so-called-standard irssi logs. |
| 19:45.21 | TimRiker | it's the bot's job to log, not the client. that's what bots are for. :) |
| 19:45.32 | troubled | if you are just looking up nicks or "words" then flat file is fine if the goal is just for google to index them for you i guess |
| 19:46.39 | troubled | i just find it more convienient to log via client. then me and the bot dont have to both be in 50 channels, doubling bandwidth for when I want to talk in a channel |
| 19:46.40 | TimRiker | I've got 88M of sql data. I've got over a gig of logs just on this version of the bot. |
| 19:46.59 | TimRiker | I don't want a anywhere near a gig of db to backup. |
| 19:47.14 | troubled | that 88mb of sql could probably be stripped down to like 50 with some heavy normalization. maybe even more |
| 19:47.55 | troubled | but the point of normalization isnt as much about space saving as it is faster lookups |
| 19:48.20 | TimRiker | I seriously doubt it. Most of that is factoid info or seen "last said" info. normalization won't help either of those at all. |
| 19:49.07 | troubled | the reason I can pull out every record for any nick, ident or hostname out of a million records in ~100-200ms is because the table is small, and has a fixed size (thus calculated offsets) |
| 19:49.14 | TimRiker | normalization == slower lookups for simple queries. and that's about all the bot ever does. |
| 19:50.10 | troubled | its faster to normalize |
| 19:50.18 | simonrvn | i don't log in irssi |
| 19:50.42 | TimRiker | so I guess the design would be different if I wanted to add a command to the bot that would let users search all 9 years of logs. that sounds like a Really Bad Idea right from the start. |
| 19:51.28 | troubled | lets take a look at your db and ill explain why its slow and why normalization would give you faster speed if done across all tables |
| 19:51.45 | troubled | take a look at your row couunt for the seen table. how many you got? |
| 19:51.49 | troubled | and whats the avg size? |
| 19:52.50 | troubled | mysql> select avg(length(nick)) from nicks; |
| 19:52.53 | troubled | +-------------------+ |
| 19:52.55 | troubled | | avg(length(nick)) | |
| 19:52.58 | troubled | +-------------------+ |
| 19:53.00 | troubled | | 7.6225 | |
| 19:53.03 | troubled | +-------------------+ |
| 19:53.05 | troubled | 1 row in set (4.42 sec) |
| 19:53.07 | troubled | mysql> select count(*) from nicks; |
| 19:53.10 | troubled | +----------+ |
| 19:53.12 | troubled | | count(*) | |
| 19:53.15 | troubled | +----------+ |
| 19:53.17 | troubled | | 286546 | |
| 19:53.19 | troubled | +----------+ |
| 19:53.22 | troubled | 1 row in set (2.36 sec) |
| 19:54.04 | troubled | so, on freenode, the avg nick size is 7.6225 out of a distinct 286k nicks. Since my pkey is only 4 bytes, I am saving 3.6225 * 286546 bytes just on nicks |
| 19:54.16 | troubled | the same thing applies to quit messages: |
| 19:54.21 | TimRiker | 229590 entries in seen. some of which are probably bad due to the bad index setting. |
| 19:54.29 | troubled | mysql> select count(*) from quit_messages; |
| 19:54.32 | troubled | +----------+ |
| 19:54.34 | troubled | | count(*) | |
| 19:54.36 | troubled | +----------+ |
| 19:54.39 | troubled | | 32714 | |
| 19:54.41 | troubled | +----------+ |
| 19:54.44 | troubled | 1 row in set (0.42 sec) |
| 19:55.37 | troubled | so for months and months, instead of having thousands and thousands of quit messages logged, I only have 32k distinct ones, but the pkey id is only 4bytes, and the avg length of quit messages is: |
| 19:55.41 | troubled | +----------------------+ |
| 19:55.43 | troubled | | avg(length(message)) | |
| 19:55.46 | troubled | +----------------------+ |
| 19:55.48 | troubled | | 32.4172 | |
| 19:55.50 | troubled | +----------------------+ |
| 19:55.52 | troubled | 1 row in set (0.62 sec) |
| 19:55.54 | TimRiker | troubled: actually you're not saving that. you have 4 bytes in 2 tables plus the 7.6225 nicks. |
| 19:55.59 | troubled | now, _.62s_ is damn fast for 8 months of logs |
| 19:56.13 | troubled | i only have 1 nicks table |
| 19:56.20 | TimRiker | so you've added 286k*8 that you are not counting anywhere. |
| 19:56.44 | TimRiker | increase that 8 by 4 more for each table where you use it. |
| 19:56.51 | TimRiker | and eventually you break even. |
| 19:57.27 | troubled | any time I break even, I make up for by smaller table size which equates to faster indexes and faster lookups |
| 19:57.29 | TimRiker | but then add on top of that that every seen update is now a join on 2 tables, and you have much more sql overhead. |
| 19:57.43 | troubled | and its negated by the savings in tables like quit messages |
| 19:57.48 | TimRiker | no. |
| 19:58.21 | TimRiker | when you break even is exactly when you break even. you do NOT have smaller table sizes then. you have _even_ table sizes. |
| 19:58.46 | TimRiker | course, I'm not counting index overhead for the new table, but it factors in at similar values. |
| 19:59.09 | troubled | my user id tables which is a join of nickid, identid and hostid causes the break even in that area |
| 19:59.49 | troubled | but i get accuracy of messages and 100 fold increase in lookup speed vs. sequential searches |
| 20:00.23 | TimRiker | so now your simple "update seen record" have morphed in to a "check for and potentially add these 3 entries in each of 3 tables, and then add one record in this table" instead of "update this table" |
| 20:00.41 | troubled | the normalization comes into play best when i do a dump more so than on disk since index size sucks hd space (which is dirt cheap) |
| 20:00.46 | TimRiker | I don't call that breaking even. |
| 20:01.37 | TimRiker | so to save the "dirt cheap" disk space, you've added much mroe overhead for the typical insert/update. |
| 20:02.10 | TimRiker | you've successfully optimized the thing you rarely do at the expense of the thing you do all the time. |
| 20:02.44 | TimRiker | this is a step backward, not forward. |
| 20:02.44 | troubled | sure, you add a bit more overhead for faster lookups |
| 20:02.44 | troubled | you cant get both at the same time heh |
| 20:02.44 | troubled | speed is the goal, not size |
| 20:02.44 | troubled | or cpu |
| 20:03.00 | TimRiker | exactly. so proper database design says you optimize for the usage, not optimize blindly. |
| 20:03.23 | TimRiker | and my point is that you have increase cpu load significantly. |
| 20:03.28 | troubled | sure its lower overhead to just slap everything in a table without any normalizing, but then your lookup times exponential slow down |
| 20:03.36 | TimRiker | ie: speed is worse, and size is better. |
| 20:03.53 | troubled | we need speed though, not size |
| 20:04.10 | TimRiker | you're missing the point. we don't do fancy lookups. we do brain dead lookups. and those are fast. |
| 20:04.13 | troubled | normalizing gives you that through smaller and fixed tables |
| 20:04.29 | TimRiker | exactly! we need speed, not size. and you have size and not speed. |
| 20:04.40 | troubled | you are forgetting the terrible loss in performance you get in mysql with a non fixed table size due to the factoids being varchars |
| 20:04.54 | troubled | thats why you need the nuh's in a seperate table |
| 20:05.01 | TimRiker | heh, actually I'm not as I use sqlite which does not have that same issue. |
| 20:05.16 | troubled | wheithere its better to split that up into 3 tables depends on the amount of data savings you would get |
| 20:05.48 | TimRiker | nor does it have the network overheard or task switching that a mysql install requires. |
| 20:06.06 | troubled | but if you had at least the nick!user@host in its own table, it could be fixed table size and would be better the reference it |
| 20:06.11 | TimRiker | course it _is_ perl, so that hardly registers. :) |
| 20:06.28 | troubled | but the size savings only come into play with lots of duplicates, so it depends on your exact db's contents |
| 20:06.59 | troubled | well, I agree sqlite is nice in that respect. mysql/pgsql do add overhead, but that also give you www page potential |
| 20:07.28 | TimRiker | sqlite can be used by more than one process, so there's nothing stopping a web interface. |
| 20:07.45 | troubled | sqlite3 with the daemon can |
| 20:07.52 | troubled | or do you mean file locks |
| 20:07.57 | TimRiker | even without a daemon. |
| 20:08.23 | troubled | i dont like deadlock issues that come with sqlite though |
| 20:08.35 | TimRiker | sqlite uses record locking normally to allow multiple processes to access the same db. that's how the daemon version works in the first place. |
| 20:08.44 | troubled | at least mysql/pgsql will "self correct" with corruption and such |
| 20:09.17 | troubled | yes, but in the daemon version, its the gatekeeper doing all the sharing. so its really just 1 process accessing it |
| 20:09.56 | TimRiker | agreed. if I had a web interface up, I most likely would not design it using sqlite shared file access. but I don't, so sqlite is the lowest overhead, and the fastest performance. ie: exactly what I need. in addition it's the simplest to get running. |
| 20:10.43 | TimRiker | troubled: re daemon, not exactly and the daemon allows multiple forks .. or threads? I guess I don't recall. |
| 20:10.46 | troubled | we could just go sqlite only, i wouldnt mind. we can always put a web interface into the bot later to deal with people that want that sorta thing |
| 20:11.37 | troubled | re daemon: ya, but it does have deadlocking and timeout code I would assume. still not the same as if infobot choked and blocked, then the daemon wouldnt know how to deal with it |
| 20:11.52 | TimRiker | I think it's a good thing that we support multiple sql dbs. some folks will want to use phpmyadmin etc to get other views into the data. that's all good. I just don't happen to want that enough to deal with the little overhead it adds. |
| 20:12.42 | troubled | which comes back to the schema differences between the engines. most are index related I believe? |
| 20:13.46 | troubled | i dont mind if say mysql and pgsql have diff indexing and specific speed tweaks over sqlite resulting in different files so long as the schema are the same for all and dont alter the bots access to the data |
| 20:14.12 | TimRiker | default values or lack thereof is what I was noticing. |
| 20:14.48 | troubled | defaults werent really needed since the bot doesnt miss anything on inserts |
| 20:14.49 | TimRiker | in mysql seen: `nick` varchar(20) NOT NULL default '', |
| 20:14.53 | TimRiker | that's just wrong. |
| 20:15.12 | TimRiker | the point in having NOT NULL is that you can't add one without a nick. adding a default kills that. |
| 20:15.23 | troubled | shouldnt even be in there really. but when it was converted as was |
| 20:15.34 | TimRiker | exactly. read: bugs |
| 20:15.40 | troubled | nod |
| 20:16.07 | TimRiker | if someone adds code that does not set nick, then mysql will happily allow that bug to exist. |
| 20:16.25 | TimRiker | and as we have 4 sets of sql setup scripts, someone won't notice. |
| 20:16.27 | troubled | i dont like "default" myself either |
| 20:17.04 | TimRiker | I'd strongly prefer that we find one set of sql setup scripts that works on all dbs. |
| 20:17.04 | troubled | when i have to edit, i generally once over all the diff engines. I think I missed a change once that simon caught, but its usually not a problem and hard to miss |
| 20:17.22 | simonrvn | TimRiker: and just specify which on the CL |
| 20:17.29 | troubled | I think the end goal was to put all the bootstrap stuff into that setup script |
| 20:17.56 | TimRiker | you seem to be saying that we should never end up where we are. while that's a nice thought, we are here with the "default" lines that should not be there. |
| 20:18.15 | troubled | i dont like the idea that the user needs to know anything when they setup the bot, but I just havent got around to it |
| 20:18.28 | troubled | then fix em :) |
| 20:18.31 | TimRiker | my point is that having one set of scripts would prohibit us from ending up here again. the dbs would have to be in sync, not just chance to be once in a while. |
| 20:18.45 | simonrvn | well, they should know *something* |
| 20:19.15 | troubled | like i was trying to say, when i made them, they were converted temp "as is" with only cosmetic changes, so as to not conflict with don's and other existing db's until we decided on specific changes |
| 20:19.24 | troubled | changes like this "default" issue |
| 20:19.30 | TimRiker | there used to be only one set till simonrvn split them up, if I'm reading the logs correctly. |
| 20:19.40 | troubled | i would have totaly change the schemas long time ago otherwise ;) |
| 20:20.14 | troubled | i think that was my patch that simonrvn applied for me since i didnt have svn commit yet |
| 20:20.20 | simonrvn | yeh |
| 20:20.29 | TimRiker | could be. |
| 20:20.39 | simonrvn | i still have that patch |
| 20:20.40 | troubled | should be like "troubled is da man" or something heh |
| 20:20.55 | TimRiker | well in order to test merging them, i guess I need all 4 back ends setup. ugh. |
| 20:21.05 | troubled | then he fubar'd and forget to create some files etc. needless to say a few revs later it was finalised heh |
| 20:21.12 | TimRiker | hehe |
| 20:21.19 | simonrvn | *grin* |
| 20:22.02 | troubled | TimRiker: i think the sqlite vs. sqlite2 was because of the way the old setup script locates the path (based on the conf var for db type) |
| 20:22.30 | troubled | we could merge them, but then the setup script needs an edit so it treats sqlite and sqlite2 as the same path to schema |
| 20:22.44 | troubled | cause i dont think they actually differ |
| 20:24.03 | TimRiker | troubled: and do the others _need_ to differ? |
| 20:24.18 | TimRiker | I realise they do, but do they _need_ to. |
| 20:24.30 | troubled | ya, pgsql and mysql arent compatible |
| 20:24.37 | TimRiker | like the sqlite init does not use quotes, but it could, it just doesn't now. |
| 20:25.09 | troubled | ' vs ` was a problem, as were the field types |
| 20:25.23 | troubled | varchar vs. character varying, kinda stuff |
| 20:25.30 | TimRiker | troubled: you're telling me that with our simple brain dead schema that we can't come up with a version both mysql and pgsql like? |
| 20:25.34 | simonrvn | yes, i hate that ' ` crap :/ |
| 20:26.03 | troubled | TimRiker: not that I am aware of. unless they both have some sql compliant versions of each hidden somewhere |
| 20:27.13 | troubled | tbh, it wasnt an issue with only mysql and sqlite. its when i had to add pgsql that it got screwy. and the method I used to take $DBTYPE for path to schema meant i needed to split everything. but a (sqlite|sqlite2) check could possibly let us merge the 2 sqlites |
| 20:28.40 | TimRiker | troubled: I'd hope to at least merge all but pysql |
| 20:28.49 | troubled | pgsql! |
| 20:29.00 | TimRiker | s/pysql/pgsql/ |
| 20:29.09 | troubled | better... ;p |
| 20:30.50 | troubled | really though, seperate files shouldnt matter so long as _we_ know enough to keep them all compatible. far as a user is concerned, he should only have to run the setup file and it should just do the right thing (tm) |
| 20:32.23 | troubled | kinda coming back to the regression tests it would seem |
| 20:34.15 | TimRiker | well, having fewer sets generally means less regression testing. |
| 20:34.38 | TimRiker | still need some to catch things like mysql case insensitive indexes. |
| 20:35.09 | troubled | just make the table a ci type |
| 20:35.13 | troubled | not cs |
| 20:35.22 | troubled | but then sqlite would choke on it |
| 20:35.38 | troubled | collation type iirc |
| 20:35.47 | TimRiker | never tried case insensitive in sqlite. might work. |
| 20:36.14 | troubled | sqlite supports collations? |
| 20:36.19 | troubled | setting i mean |
| 20:36.38 | TimRiker | but my point was the reverse. mysql is case insensive by default as the tables don't specify that. and sqlite is case sensitive by default. |
| 20:38.17 | TimRiker | hmm. do I have that backwards? that's the way I recall the bugs. It's been a while. |
| 20:38.45 | troubled | <PROTECTED> |
| 20:39.36 | troubled | mysql goes by the collation of the field. so ci for insensitive, and cs for sensitive. the index will do the right thing i believe |
| 20:40.01 | troubled | so right there, sqlite and mysql syntax would be incompatible |
| 20:40.14 | *** part/#infobot Ttech (n=ttech@fullcirclemagazine/developer/ttech) |
| 20:40.19 | simonrvn | TimRiker: so did i |
| 20:40.28 | TimRiker | utf8, man use utf8. |
| 20:40.57 | troubled | tbh, im not sure if it was possible with utf8. i would have to check |
| 20:41.03 | troubled | dont see why not though |
| 20:41.04 | simonrvn | i meant "ughs" ;) |
| 20:41.07 | TimRiker | my.cnf:default-character-set = utf8 |
| 20:41.08 | TimRiker | my.cnf:default-collation = utf8_unicode_ci |
| 20:41.25 | simonrvn | i set mine that way also |
| 20:41.40 | TimRiker | the 'one true way' |
| 20:41.48 | troubled | might be a good idea to force it during table creation though for people that have it set conflicting |
| 20:42.14 | simonrvn | a) it was putting it as latin1_swedish_ci (WTF?) b) UTF8 r0x my s0x |
| 20:42.22 | TimRiker | well, we currently only index ascii (barring any odd tables troubled is playing with) |
| 20:42.23 | troubled | i think its still coming down to us needing to have seperate schemas though |
| 20:42.52 | TimRiker | simonrvn: mysql hq is in sweden. hence the default. |
| 20:42.52 | troubled | i dont know enough about latin1 vs. utf8/16 tbh |
| 20:43.01 | TimRiker | clearly. :) |
| 20:43.08 | simonrvn | oh, that explains it |
| 20:43.29 | simonrvn | ascii first 127 are the same as utf8's |
| 20:43.41 | troubled | <PROTECTED> |
| 20:43.44 | simonrvn | that was done on purpose in utf8 |
| 20:43.45 | TimRiker | latin1 can only express latin1. no jp, zh, kr, ru, gr, etc. |
| 20:43.59 | troubled | ahh |
| 20:44.08 | TimRiker | utf8 can handle them all. one codeset to rule them all and in the darkness bind them. |
| 20:44.16 | troubled | generally i use utf8 though. just some leftovers around in my setup |
| 20:44.22 | troubled | lol |
| 20:44.25 | simonrvn | and extendable too |
| 20:44.51 | TimRiker | 0-127 are the same for almost all of the codesets including utf8 |
| 20:45.09 | TimRiker | it's 128-xxxxxxxxxx that differ |
| 20:45.56 | TimRiker | latin1 is always one byte. but limited in what it can display. swedish uses it, but sorts if differently than english. |
| 20:45.58 | simonrvn | yup |
| 20:46.19 | TimRiker | utf8 starts with one byte but can be up to 6 bytes per character. |
| 20:47.00 | simonrvn | non-spacing marks front and back, and a 4 byte char |
| 20:47.20 | simonrvn | or a combo |
| 20:48.45 | TimRiker | normal set only include up to 4 byte, and the 4 byte is hardly used, so really utf-8 is often just 1-3 byte. |
| 20:49.09 | TimRiker | bbiab |
| 20:49.26 | troubled | do we have a sqlite->csv/mysql script? |
| 20:55.45 | TimRiker | hmm. I have one around someplace... isn't there a dump script in scripts/ that still works? |
| 20:56.02 | troubled | i see a few, havent tried em though |
| 20:56.26 | TimRiker | sqlite <table> .dump will get you an sql backup. chances are just the creates would need to change to get it into mysql. |
| 20:56.27 | troubled | was gonna see if I could grab a copy of your db so I can put into mysql and propose some changes |
| 20:58.30 | TimRiker | 12M here: http://rikers.org/tmp/20080315.bz2 |
| 20:58.38 | troubled | ill hack up a quick sqlite cli based bash script to stress test some stuff and do a before and after comparison. see how changes go |
| 20:58.57 | troubled | give me a few to see how well the convert scripts work :) |
| 20:59.24 | troubled | well, after the download finishes heh |
| 20:59.29 | TimRiker | there's not much difference between dumps. |
| 20:59.46 | troubled | what do you mean? |
| 20:59.51 | TimRiker | just set your bot to sqlite and run, to get a blank db. |
| 20:59.56 | troubled | between the sqlite dump and the perl one in infobot? |
| 21:00.04 | TimRiker | well, I use sqlite2, but either should work. |
| 21:00.39 | troubled | do you have access to mysql? |
| 21:00.42 | TimRiker | no, I mean my dumps over time. |
| 21:01.05 | troubled | ya, i figure the changes are gradually slowing down |
| 21:01.13 | TimRiker | yeah, mysql5 was on ibot's server. don't recall if I shut it down or not, but it's trivial to get 4 or 5 running. |
| 21:02.02 | troubled | k. ill see if it converts ok to mysql/csv and put teh original and the altered one and send back the altered so we can bench some changes |
| 21:02.31 | troubled | id sooner do sqlite tests though. much each to bench, but more manual to edit the schema compared to a gui |
| 21:03.17 | TimRiker | I'm generally solely interested in run time overhead for the bot. size is irrelevant. and I don't plan on adding any log queries, so logging to db is not at all interesting to me. |
| 21:03.41 | troubled | im not after size either, i want lookup speed only |
| 21:03.49 | TimRiker | lookup for what? |
| 21:03.59 | troubled | anything in the db |
| 21:04.28 | TimRiker | k. well, "anything" is not what the bot normally does. I'm interested in what the bot normally does. |
| 21:04.34 | troubled | reads are more important than writes, so more specifically, im interested in any design changes that can give speed increases for reads |
| 21:04.55 | TimRiker | note: this does NOT include the speed for listkeys for example. I don't consider that a "normal" bot task. |
| 21:05.06 | TimRiker | botmail, seen, factoid looksup, all normal. |
| 21:05.54 | TimRiker | again, what kind of reads? all the normal queries are simple unjoined lookups. |
| 21:05.58 | troubled | the dump's schema is the same as the one in the svn? |
| 21:06.28 | TimRiker | no. it's blootbot. so there are a few trivial differences. it will run with the svn bot though. |
| 21:06.48 | troubled | ok, let me try import into sqlite as a start |
| 21:08.03 | TimRiker | k |
| 21:08.09 | troubled | btw, is sqlite3 backward compatible with sqlite2? or do I need both? |
| 21:08.18 | simonrvn | go for 3 |
| 21:08.38 | TimRiker | db file format has changed. |
| 21:08.48 | TimRiker | dump/import should be the same. |
| 21:09.10 | TimRiker | so you need to use the command line tool that matches your installed perl modules. |
| 21:09.34 | TimRiker | I normally run the bot to get an empty table, then run sqlite against it. it'll complain if they differ. |
| 21:11.59 | troubled | ya, i just tried the sqlite2 schemas and it didnt like the import |
| 21:12.07 | troubled | SQL error: factoids.created_time may not be NULL |
| 21:12.10 | troubled | SQL error: factoids.requested_by may not be NULL |
| 21:12.46 | troubled | about 500 flew by, so there could be more problems too |
| 21:13.20 | troubled | so your db needs at least a few changes to be compliant with our HEAD copy of sqlite2 schema |
| 21:14.16 | troubled | well, let me fudge the schema a bit so I can at least get it imported, then go from there shall we :) |
| 21:15.57 | troubled | news and uptime.sql for sqlite2 dont import proper either, even though we dont use em yet |
| 21:16.44 | troubled | sqlite doesnt seem to like #'s in the schema for comments *shrug* |
| 21:17.02 | troubled | although i think the bot was change to parse them out as a kludge |
| 21:17.06 | TimRiker | yep. need to reload the table myself to cleanup issues like that. |
| 21:17.57 | TimRiker | #'s in sql are not standard. they are a mysql extension. |
| 21:18.06 | TimRiker | proper sql is --- if you can believe that. |
| 21:18.23 | simonrvn | isn't it -- ? |
| 21:19.13 | troubled | i didnt realize your dump had the schema in it though, but its good we caught that i guess |
| 21:19.28 | troubled | ya -- iirc |
| 21:20.29 | troubled | although technically -- + anything would be a comment |
| 21:20.42 | troubled | bah, let me do this on my wks :) |
| 21:20.52 | simonrvn | yeh -- |
| 21:20.57 | TimRiker | oh, yeah. -- |
| 21:24.31 | troubled | TimRiker: might want to take care of all your null fields though before you convert |
| 21:32.52 | TimRiker | troubled: yep, planning on it. |
| 21:34.28 | TimRiker | though I may change the schema as well. force the code to set what should be set, rather than forcing it in the schema. have not decided. |
| 21:35.16 | TimRiker | requested_by VARCHAR(64) NOT NULL DEFAULT 'nobody', makes sense, but better to insert nobody when creating the factoid imho. thoughts? |
| 21:35.35 | troubled | i think thats already done for some reason |
| 21:35.42 | troubled | maybe not everywhere though |
| 21:35.52 | TimRiker | exactly. |
| 21:36.08 | TimRiker | removing the default exposes the "not everywhere" cases. |
| 21:36.32 | troubled | lets agree to remove all "default" from all schemas as a start then? |
| 21:36.34 | TimRiker | or if we are going to keep the default, then we should remove the ones that do specify nobody. etc. |
| 21:37.11 | TimRiker | I tend towards the no-defaults in the schema as it makes it easier to force the code to be consistant. |
| 21:37.23 | troubled | works for me |
| 21:37.38 | TimRiker | however it also means more places to touch if we decide that "nobody" should be a valid nic now and want something else there. |
| 21:37.43 | troubled | but it may break trunk for a bit til we test each engine |
| 21:38.25 | TimRiker | need to walk softly where db changes are concerned. |
| 21:39.02 | troubled | i say we just make all the changes we want done and be done with it tbh. all the db's are already out of whack (yours, and dons namely) |
| 21:39.31 | *** join/#infobot simonrvn (i=simon@unaffiliated/simonrvn) |
| 21:39.31 | *** mode/#infobot [+v simonrvn] by ChanServ |
| 21:39.34 | troubled | not sure who else has has db's though that could be bitten |
| 21:43.42 | troubled | sqlite> select avg(length(locked_by)) from factoids; |
| 21:43.44 | troubled | 35.5281690140845 |
| 21:43.47 | troubled | sqlite> select avg(length(created_by)) from factoids; |
| 21:43.49 | troubled | 40.0837810684489 |
| 21:45.10 | troubled | thats tons bigger than say a 4bit pkey id. not to say that i would go as far as splitting it up into 3 seperate fields, but at least moving the line as a whole might have a bit of an effect. let me see if I can get a group by for count or something in sqlite |
| 21:48.54 | troubled | sqlite> select created_by,count(created_by) from factoids group by created_by order by count(created_by) desc limit 5; |
| 21:48.57 | troubled | |242 |
| 21:48.59 | troubled | hoxu!hoxu@dsl-jklgw3i1c.dial.inet.fi|226 |
| 21:49.02 | troubled | greycat!~wooledg@imadev.eeg.ccf.org|197 |
| 21:49.04 | troubled | localhost!root@a1-3c045.neo.rr.com|145 |
| 21:49.07 | troubled | greycat!wooledg@imadev.eeg.ccf.org|126 |
| 21:49.09 | troubled | 242 nulls :/ |
| 21:50.27 | troubled | so out of the ~114k, there are at least a few thousand that are dupes. multipled by avg length times the 2 or 3 fields that have potentitial host info, that could drop table size a enough to give faster lookups. and some index's on them would really kick it into high gear |
| 21:51.03 | troubled | lots of 50 ish dupes |
| 21:51.15 | troubled | greycat!wooledg@imadev.eeg.ccf.org|126 |
| 21:51.17 | troubled | greycat!~wooledg@192.35.79.70|99 |
| 21:51.56 | troubled | and splitting the field further into seperate nick, ident and hosts would save a tad for nick dupe where the ip changes (which is most common change) |
| 21:52.12 | troubled | but a slight cost of over head for joins on 3 tables instead of 1 |
| 21:52.38 | troubled | im leaning towards a simple 1 table though. less code change and probably the best increase in efficiency we can hope for |
| 21:54.32 | troubled | maybe just a simple "users" or "nuh" table for storage of nick!user@host for the entire set of tables to cut down on things a bit. ill see if I can manage to manually test that change on your db set and see what kinda savings and speed difference we are looking at. hyptohetical only gets you so far. but 4byte times a few spots is better than 64bytes times the same number of spots |
| 21:54.52 | troubled | which in turn cuts down on the...aparently non existant rofl....indexes |
| 21:58.49 | troubled | sqlite doesnt support views does it? |
| 21:59.32 | troubled | would be the simplest was to avoid changing any code for db changes while increasing efficiency |
| 22:03.32 | TimRiker | fixing a bunch of nulls. I put up another. you can nuke the old one. |
| 22:03.59 | troubled | update factoids set created_by = "default" where created_by = null;? |
| 22:04.05 | troubled | that kinda thing? |
| 22:04.22 | troubled | quicker to just match the sql probably :) |
| 22:04.52 | troubled | same url though? probably wont take long to download again |
| 22:05.37 | troubled | ill try import it with the svn schema for sqlite2 |
| 22:06.27 | TimRiker | yeah, over different tables. |
| 22:06.31 | troubled | 18:06:17 ERROR 404: Not Found. |
| 22:06.38 | TimRiker | not there yet. |
| 22:06.42 | troubled | ah |
| 22:08.16 | TimRiker | k. there now |
| 22:08.27 | troubled | <PROTECTED> |
| 22:08.31 | troubled | still 404 |
| 22:08.58 | TimRiker | hmm. didn't get em all.. |
| 22:09.31 | TimRiker | http://rikers.org/tmp/20080315.bz2 ? |
| 22:09.32 | troubled | this machine is using the proxy though, so could be that causing a problem. first download was from the gateway |
| 22:10.40 | troubled | there we go, --no-proxy fixed it *shrug* |
| 22:11.00 | troubled | probably cache the 404 |
| 22:11.04 | troubled | cached* |
| 22:16.03 | troubled | sqlite> select count(*) from factoids where created_by is null; |
| 22:16.05 | troubled | 91621 |
| 22:16.40 | troubled | that cant be right. created_by = null returned 0 :/ |
| 22:17.10 | troubled | sqlite> select count(*) from factoids; |
| 22:17.27 | troubled | 114514 |
| 22:17.27 | troubled | sqlite> select count(distinct(created_by)) from factoids; |
| 22:17.27 | troubled | 9762 |
| 22:19.05 | TimRiker | k. well still working on it. |
| 22:19.08 | troubled | well, we could save 100k rows worth in the table for created_by alone by using a fkey'd users table |
| 22:19.45 | troubled | not so much space, but since we dont have index atm, thats a sequential lookup. and 10k is faster than 100k. although not by much on a fast machine |
| 22:20.45 | TimRiker | created_by can be null in the sqlite svn sql |
| 22:21.01 | troubled | so proposed would be an index'd 10k user table vs. a sequential 114k table currently. should give a few ms of speed |
| 22:21.24 | troubled | sure it can be, but the code ensures that its always set im pretty sure |
| 22:21.42 | TimRiker | you're forgetting the extra join, which will likely kill any speed saves. |
| 22:21.45 | troubled | the null allowed was only cause of older db's like yours with them before that code was introduced |
| 22:22.05 | TimRiker | yeah, I'll likely set em all to nobody |
| 22:22.39 | troubled | well, nulls are fine tbh. smaller data |
| 22:23.01 | troubled | and if you are going to set it to a word, length("null") < length("nobody") :p |
| 22:23.28 | troubled | besides, I think the bot code already knows how to handle nulls doesnt it? |
| 22:24.32 | troubled | infobot: literal working fine? |
| 22:24.37 | troubled | infobot: factinfo working fine? |
| 22:24.37 | infobot | there's no such factoid as working fine?, troubled |
| 22:24.37 | TimRiker | I forget how we handle factoid locks. unlocked should have 0 time? null locker? both? either? |
| 22:24.56 | TimRiker | infobot: literal slashdot |
| 22:24.56 | infobot | "slashdot" is "http://slashdot.org, or they're so lame they had to interview unknown_lamer for more content" |
| 22:25.26 | troubled | infobot: factinfo bitkeeper |
| 22:25.26 | infobot | bitkeeper -- it has been requested 92 times, last by prpplague at Tue Apr 4 12:56:47 2006. |
| 22:25.35 | troubled | infobot: factinfo debian |
| 22:25.35 | infobot | debian -- created by atoponce <n=aaron@ubuntu/member/pdpc.supporter.bronze.atoponce> at Fri Sep 7 01:57:55 2007 (190 days); last modified by nobody; it has been requested 13 times, last by gutz|work, 3h 31m 54s ago. |
| 22:25.42 | troubled | ya, nulls are handled |
| 22:26.03 | troubled | probably best to keep em that way. saves on space too. and smaller the table the better |
| 22:27.37 | troubled | ya, seems non 0 locktime means locked |
| 22:30.04 | TimRiker | well, we ought to be able to handle request_by is NULL too then, no? |
| 22:30.06 | troubled | sweet, sqlite supports views :) |
| 22:30.23 | troubled | we can technically change the db at least without code tweaks |
| 22:30.48 | troubled | hmm, wait, im better it cant do inserts. usually they can provided only 1 foreign table though |
| 22:31.09 | TimRiker | what happens if you insert into a view where nothing exists? does it make new records in all the linked tables? |
| 22:31.09 | troubled | i think NULL's are handled for everything |
| 22:31.22 | troubled | best way to check is to look up a few null records and check them here |
| 22:32.18 | troubled | well, the problem with a view is that unless you have triggers, views can only basiclaly insert to the same table. which limits the view to itself with save a "where not visible" type selects |
| 22:33.05 | troubled | so we would have to update the bot code to deal with them. not a big problem though, just take some time to parse out where all the sql is and fix it to do the inserts different |
| 22:33.44 | troubled | the way I layed it out in my db is the data goes to the normal table, but selects all go from a "view_$NAME" table |
| 22:34.38 | troubled | now, for speed, we could cache id's of stuff we already looked up, but the sqlite lib should handle that for us and it wouldnt break simultaneous updates. |
| 22:35.02 | troubled | if we cache id's and the db id changes by something else, the bot would trash the db |
| 22:35.37 | troubled | and since we dont have any kind of referencial integrity in the db for sqlite (that im aware of), its all on us to ensure in the bot |
| 22:36.53 | TimRiker | ~forget ! did i just "hear" you say the the xfbdev |
| 22:36.53 | infobot | TimRiker: i forgot ! did i just "hear" you say the the xfbdev |
| 22:37.30 | TimRiker | ~bitkeeper |
| 22:37.31 | infobot | it has been said that bitkeeper is at http://www.bitkeeper.com/ or at http://www.bitmover.com/ or hosted trees at http://www.bkbits.net/ or like cvs and subversion, but its non-opensource.. but its really good, even linus uses it to develop the kernel! |
| 22:37.45 | TimRiker | ~"bitkeeper |
| 22:37.45 | infobot | it has been said that "bitkeeper is written by kernel developers, for kernel developers" hmm. |
| 22:38.12 | TimRiker | ~factinfo "cpu" |
| 22:38.12 | infobot | TimRiker: no extra info on "cpu" |
| 22:38.28 | TimRiker | ~forget "cpu" |
| 22:38.28 | infobot | TimRiker: i forgot "cpu" |
| 22:38.32 | troubled | also, one technique I use for my logger repo is to provide sql incremental updates and a full schema. would require some documentation, but at least we can get a db schema we like, then carry it over time. no more excuses for out of date dbs. might even be able to setup an automated update system at some point |
| 22:40.31 | CIA-26 | infobot: 03timriker * r1639 10/trunk/setup/sqlite2/ (factoids.sql seen.sql): sqlite2 changes |
| 22:41.09 | TimRiker | yep. need to do that at some point. |
| 22:41.47 | troubled | also, 64 is too small for nick!user@host. should be like "16!8@63" = 89, or for up to 32 char nick = 105 |
| 22:42.29 | troubled | 16!8@63 is freenode though, and 89 (or even 90 or 100) should be more than enough |
| 22:43.17 | troubled | also, we get more speed at the cost of extra storage by using fixed fields instead of varying |
| 22:43.51 | troubled | would negate the space saved a tad from the normalizing, but the speed from it all should be worth it. what do you think? |
| 22:44.28 | troubled | what I mean is a forgeign table wuth fixed sized fields would be well worth the extra storage |
| 22:44.40 | troubled | toss and index on top and we should be smokin |
| 22:46.26 | CIA-26 | infobot: 03timriker * r1640 10/trunk/setup/sqlite2/ (connections.sql news.sql uptime.sql): sql comments |
| 22:48.21 | TimRiker | I'm still a complete non-fan of normalization. I'm not convinced that any of them are appropriate. |
| 22:49.15 | troubled | what is the sqlite equiv of a auto increment/serial type? |
| 22:50.03 | troubled | thats ok, ill just be testing and giving you benchmark results so we can see for sure just how much of a diff it makes and if its worth it |
| 22:50.15 | troubled | consider all this R&D :) |
| 22:50.41 | troubled | thats why im just changing the db, not the code yet. and if I do, ill put the proposed in a branch anyways |
| 22:57.27 | TimRiker | well, gotta head home. back on later. I hope to get the bot moved over today. |
| 22:57.35 | TimRiker | cya |