04:21.40 | *** join/#uclibc sjhill (~sjhill@12-217-217-31.client.mchsi.com) |
04:47.13 | *** join/#uclibc landley (~landley@cs6625195-177.austin.rr.com) |
04:47.29 | landley | So I'm messing with bzip, and getting strange results. |
04:47.59 | landley | My code generates valid bzip files, but the sizes differ from the archive sizes produced by the original version. |
04:48.16 | landley | Usually mine are bigger, but I have at least one test case where it's smaller. |
04:48.53 | landley | The reason for that is the huffman code generation's ordering of tied entries (ones with the same weight). |
04:49.42 | landley | The other really oddball one is I made my buffer size bigger (used the full 900k instead of 900k-18 bytes), and the end result was a _larger_ archive. |
04:49.47 | landley | Darn butterfly effects... |
04:50.12 | landley | I suppose this is what I get for using random data as my test cases. :) |
04:55.12 | mjn3 | i haven't looked at huffman coding in a while. but if the weights are tied, why should there be a difference in size at that stage? |
04:56.15 | landley | There isn't at that stage. But the resulting files made from things done in jseward's order are smaller in every test case I have except one. |
04:56.33 | landley | If they were _uniformly_ smaller, I'd just switch over to his and assume he knew something I didn't. |
04:56.42 | landley | It's that one outlier that's got me scratching my head... |
04:57.06 | landley | (Both sets of code produce valid huffman trees, and both extract to give you your original data back just fine...) |
04:57.15 | landley | I have a _working_ bzip implementation, I'm just not happy with it yet... |
04:59.04 | mjn3 | even if you switch back to his buffer size (-18) |
04:59.53 | landley | Ah, you've looked at the code then. :) |
05:00.07 | landley | The buffer size is seperate from the different huffman tables. |
05:00.13 | mjn3 | no... just noticed your comment earlier |
05:00.37 | landley | In a separate test I removed the -18 to see what it would do, and the result was a 1% larger archive. |
05:01.35 | landley | Not a clue why. I need a better range of test cases... :) |
06:26.02 | *** join/#uclibc landley (~landley@cs6625195-177.austin.rr.com) |
06:26.14 | landley | mjn3: You still awake? |
06:26.18 | mjn3 | yep |
06:27.05 | landley | I was wondering if it would be okay to shorten your hospice notice in the bzip code slightly. (I have a suggested new wording, lemme cut and paste...) |
06:27.09 | landley | <PROTECTED> |
06:27.09 | landley | <PROTECTED> |
06:27.09 | landley | <PROTECTED> |
06:27.09 | landley | <PROTECTED> |
06:27.31 | landley | Considering that their address and phone number and such are on their website, you see... |
06:27.41 | landley | ("No" is a perfectly acceptable answer here...) |
06:28.48 | landley | It's not a pressing issue, I just thought I'd ask... |
06:30.04 | landley | Ha! My code now compresses the linux-2.6.0 tarball to be 250 bytes shorter than jseward's code! |
06:30.11 | landley | (And I have almost no idea why...) |
06:30.30 | mjn3 | i suppose it would be okay to omit the address, phone number, etc |
06:30.43 | mjn3 | as you said, it is available on the web |
06:30.49 | landley | Thanks. |
06:31.01 | landley | I was thinking about it because I'm about to contribute the compression-side code. |
06:31.22 | landley | (Also, I just looked up the info because in lieu of sending you a christmas present I thought I'd just send the hospice a check.) |
06:31.34 | landley | (But I compared the info against the web page out of habit to make sure it was up to date...) |
06:31.53 | mjn3 | that would be great. thanks. they really were wonderful |
06:32.20 | landley | I know. My mother died of cancer just over a year ago, and a hospice gave us a live-in nurse for the last couple months... |
06:33.20 | landley | (And all the pain medication she wanted, which admittedly wasn't enough. She had chicken pox as a child. As her immune system went downhill, it came back as shingles. Just to add insult to injury, really...) |
06:33.37 | landley | Ah, christmas cheer. Right, back to data compression code... |
06:34.35 | mjn3 | i really should add a note in all the other applets i've written |
06:34.44 | mjn3 | prior to the 1.0 release |
06:35.47 | landley | I'd bounce an email off andersee. Maybe he can get you a HOSPICE.TXT file or something like that in the root dir... |
06:37.06 | mjn3 | well, i've been needing to put a dedication in uClibc. i suppose i should talk to him about busybox |
06:37.28 | landley | They're BOTH kinda close to a 1.0 release, aren't they? |
06:38.15 | landley | One reason I'm thumping on my home-grown linux from scratch system again is to have an environment to test out uclibc and busybox in "as part of this balanced breakfast", as it were... |
06:38.38 | landley | I'm focusing more on busybox. When it breaks, I can fix that. When uclibc breaks, I'm pretty much lost... |
06:38.44 | mjn3 | busybox is. i think there's more work to do for uClibc. because of delays, i haven't added the gettext stuff yet. plus i need to implement wcsftime and stabalize the internal locale api |
06:39.05 | landley | I switch locale and other foriegn language support off, myself. |
06:39.19 | landley | No, the stuff that gets me is all the patches needed to make c++ work... |
06:39.54 | landley | Linux From Scratch 5.0 redid the entire stage one build process so that you can swap out the C library. (They manually edit the paths for binutils and gcc and such). |
06:40.07 | mjn3 | once i stabalize the locale api, i plan to add a uClibc-specific support directory to libstdc++ |
06:40.09 | landley | The main reason I can see for them doing this is making it easier to cross-build uclibc... |
06:40.15 | landley | Cool. |
06:41.09 | landley | I gave up on c++ almost 10 years ago myself (about the time templates went in). But there are some c++ packages I'd like to use. (Konqueror comes to mind...) |
06:41.25 | mjn3 | on the plus side, the core stdio rewrite is just about done. all my tests are passing. just testing a new (hopefully correct) popen implementation now |
06:41.53 | mjn3 | yeah, i stopped following c++ just prior to STL |
06:41.58 | landley | Cool. (What did the stdio rewrite accomplish, again? I'm not up to speed on that bit of the code...) |
06:42.22 | landley | It's a pity Java went down the rathole it did. I liked the language, circa 1.1 or so. |
06:42.35 | landley | These days, I use C for systems programming and python for apps. |
06:42.46 | mjn3 | fixed a couple of minor bugs. improved performance... especially for non-threaded apps that use getc/putc |
06:43.02 | landley | Cool. |
06:43.24 | landley | Did you hear that the 2.6 linux kernel now has a -tiny tree? (Which boots and runs comfortably in 4 megs again?) |
06:43.36 | landley | (Something that I'm not sure 2.4 _ever_ did... :) |
06:43.53 | mjn3 | simplified the core code a lot. the formatted i/o (printf/scanf) stuff i haven't touched other than necessary changes. but i plan to rewrite vfprintf early next year |
06:44.03 | mjn3 | i saw that |
06:44.27 | landley | I'm entirely in favor of simplifying. :) |
06:44.40 | mjn3 | btw... i fixed a long-standing bug wrt pthreads yesterday |
06:44.46 | landley | (That's actually my main attraction to busybox and uclibc. I can understand what they DO...) |
06:45.12 | landley | I just reinstalled my laptop with Fedora Core last week, and only just reinstalled the 2.6 kernel, so right now I haven't got a thing on here currently using uclibc yet... |
06:45.40 | landley | I'm going to get my distro to build itself using busybox, and then I'm going to shoehorn uclibc into it. |
06:45.55 | landley | So you've got yourself a bit of a breather before I start flooding you with bug reports. :) |
06:45.56 | mjn3 | because of the configurability, the core stdio codebase was a mess of preprocessor logic. i isolated a lot of that stuff now with macros in one internal header file |
06:46.05 | landley | Cool. |
06:46.23 | landley | That's one of the big things I learned from reading the linux kernel. Making #ifdefs go away. (Or at the very least seem to.) |
06:46.31 | mjn3 | but we'll probably get another bug release out prior to committing the new stdio core |
06:47.08 | landley | Darn it this depth tracking code is ugly no matter how I look at it... |
06:47.20 | landley | (It's three extra lines, but an UGLY three extra lines...) |
06:47.27 | landley | Time to step back and drink some tea, I believe... |
06:47.27 | mjn3 | perl 5.8.2 is failing one sigaction test out of 25. passing all the others though |
06:47.36 | landley | Cool. |
06:48.05 | landley | (Okay what IS the deal with perl? Modern perl is three times the size of the previous perl...) |
06:48.14 | landley | I've heard of miniperl, but haven't played with it... |
06:48.39 | mjn3 | i don't use perl myself. but lots of people do and i build it just for the self-tests |
06:48.51 | mjn3 | well, plus some things need it to build |
06:48.58 | mjn3 | like the linux test project |
06:49.01 | landley | Lots of things need it to build... |
06:49.09 | landley | Most of them are still happy with the older perl, though. |
06:49.15 | landley | Hopefully, some of them will be happy with miniperl. |
06:49.22 | landley | This is a to-do list item to find out in my new distro. :) |
06:49.36 | mjn3 | probably |
06:49.40 | landley | (I'm basically aiming it at my laptop, which means it needs a full desktop, web browser, email, etc.) |
06:50.06 | landley | And my server system on my cable modem, which means it needs apache, smtp, dns, iptables, etc... |
06:50.25 | landley | I've already done a complete LFS based server system. It's the desktop side, plus uclibc and busybox that are new. |
06:50.34 | landley | (And all the procedure changes in LFS 5.0...) |
06:52.10 | mjn3 | apparently there's been a lot of work on using uClibc with gentoo |
06:53.28 | landley | I saw that. |
06:53.34 | landley | And andersee got debian building with it. |
06:53.54 | landley | Personally, I have this goal of minimizing the number of FSF packages in use on my system. |
06:54.06 | landley | It started when I began reading their code. |
06:54.30 | landley | The main things I do NOT have replacements for are binutils and gcc. (tcc just ain't there yet...) |
06:55.43 | mjn3 | there's lcc. but last time i looked it didn't support 64 bit long longs and i don't know how good an optimizer it has |
06:55.55 | mjn3 | hard to get away from gcc though |
06:58.20 | landley | Yup. |
06:58.41 | landley | tcc at least has (as an explicit goal) being able to build the linux kernel. |
06:58.49 | landley | But it has NO optimizer to speak of... |
06:59.05 | mjn3 | and only support x86 i think |
06:59.26 | landley | Yup. |
06:59.28 | mjn3 | we'll see. i want to find out what breaks |
06:59.39 | landley | It's odd, both glibc and gcc have explicitly forked away from the FSF. |
06:59.57 | landley | glibc is now maintained by Red Hat, and gcc is now maintained by the egcs committee. |
07:00.17 | landley | The FSF has largely proved itself incapable of maintaining projects past the initial coding stage. |
07:00.56 | landley | Yet somehow, the packages they claim credit for have this shared tendency to get bigger and bigger... |
07:00.59 | landley | Oh well. |
07:01.01 | mjn3 | glibc is controlled by drepper. he isn't high on my list |
07:01.09 | landley | I know. |
07:01.28 | landley | There's a number of people's lists he isn't high on. |
07:01.45 | landley | Oh, speaking of which: does the new thread library work with uclibc? |
07:01.51 | landley | (Whichever one IBM surrendered to...) |
07:02.06 | mjn3 | someone was working on porting it |
07:02.38 | landley | Darn it. |
07:02.38 | mjn3 | erik may have heard something |
07:02.55 | landley | I've got an "this will almost never happen, and if it does we don't really care anyway" case... |
07:03.09 | landley | And I'm slowing down a loop making sure it doesn't happen... |
07:03.33 | landley | This is the trouble with working from jseward's code. There are assumptions baked into it that influence your thinking. |
07:03.45 | landley | Half the problem isn't what he's actually doing, it's "should we be doing this at all"? |
07:03.49 | landley | Hmmm... |
07:04.16 | Lethal | the plan 9 compiler isn't bad, and is somewhat portable. gcc really isn't that bad though, once you run it through indent a few times.. |
07:04.29 | landley | Buffer size 900000, that's a little less than 2^20... |
07:04.52 | landley | So a pathlogically unbalanced huffman code table could at worst have a depth of about 20... |
07:04.59 | landley | And I've got 8 bits reserved here. |
07:05.11 | landley | Therefore, adding it together is NEVER going to cause it to overflow out of the depth and into the weight. |
07:05.18 | landley | Right. |
07:05.35 | landley | I need to spend more lines commenting that than I just saved, but I don't care! |
07:06.10 | landley | Screw it, I'll write separate documentation on the algorithm. |
07:07.39 | landley | I haven't looked at the plan 9 compiler. |
07:07.52 | landley | And I've tried to use gdb maybe a half-dozen times in my life. |
07:07.58 | landley | Each one was a disaster of one sort or another... |
07:08.14 | landley | (Everybody wants to do integrated debugging with emacs. I'm just not going there.) |
07:08.25 | Lethal | it doesn't get any worse then remote multithreaded debugging with gdb.. |
07:08.44 | mjn3 | that sounds pretty ugly |
07:08.47 | landley | Darn it, is there a standard "max(x,y)" macro anywhere? |
07:09.09 | landley | I've done remote multithreaded debugging under OS/2, but that was with an IBM debugger. |
07:09.24 | landley | (Don't remember if it was through the network or a serial cable. Network, I think. _TOKEN_RING_ network...) |
07:10.20 | Lethal | mjn3, thats not the half of it. gdbserver and gdb use two seperate threaddbs that don't quite behave the same. the gdbserver side delays thread creation and slowly feeds it to the client, so you have to wait awhile for your threads to show up, sometimes they don't, etc. and the gdbserver side doesn't use the notification system that the host does.. |
07:12.15 | Lethal | landley, include/linux/kernel.h has appropriate min/max implementations. thats probably as standard as you'll get. |
07:13.02 | landley | That's not getting included into busybox. :) |
07:13.29 | mjn3 | oh... in busybox. i don't know. i thought you were talking about std c |
07:14.04 | landley | I got used to threads. Programming in OS/2 and then switching to java was actually pretty good preparation for debugging SMP deadlocks in the linux kernel... :) |
07:14.10 | landley | (Or better yet, avoiding them...) |
07:15.44 | landley | 13392 bytes for a bzip compressor executable. |
07:16.18 | landley | Over half of that is the block sorting stuff I haven't touched yet. |
07:17.06 | landley | What's the command to see how big each piece of something is? (argument to nm, but I forget what...) |
07:18.11 | landley | [landley@driftwood bzip2]$ nm --size-sort -S bznew |
07:18.11 | landley | 0804be44 00000001 b completed.1 |
07:18.11 | landley | 0804acc4 00000004 R _fp_hw |
07:18.11 | landley | 0804acc8 00000004 R _IO_stdin_used |
07:18.11 | landley | 080495af 00000017 T main |
07:18.12 | landley | 0804bd00 00000038 d incs |
07:18.14 | landley | 0804ac40 00000044 T __libc_csu_fini |
07:18.14 | mjn3 | nm -S --size-sort -t d will give sorted sizes in decimal |
07:18.16 | landley | 080486b2 00000045 T flush_outbuf |
07:18.18 | landley | 0804abf8 00000048 T __libc_csu_init |
07:18.20 | landley | 0804a8bc 0000008a T BZ2_blockSort |
07:18.22 | landley | 080486f7 000000c6 T put_bits |
07:18.24 | landley | 0804a946 000000de t fallbackSimpleSort |
07:18.26 | landley | 08048484 000000f9 T init_bzip_data |
07:18.28 | landley | 08049480 0000012f T compressStream |
07:18.30 | landley | 0804857d 00000135 T input_rle_data |
07:18.32 | landley | 08049cb9 0000019c t mainSimpleSort |
07:18.34 | landley | 0804aa24 000001d1 t mainGtU |
07:18.36 | landley | 080495c8 00000358 t fallbackQSort3 |
07:18.38 | landley | 08049920 00000399 t fallbackSort |
07:18.40 | landley | 08049e55 0000050a t mainQSort3 |
07:18.42 | landley | 0804a35f 0000055d t mainSort |
07:18.44 | landley | 080487bd 00000cc3 t output_mtf_data |
07:18.46 | landley | Heh heh heh. :) |
07:18.48 | landley | [landley@driftwood bzip2]$ nm -S --size-sort -t d bznew |
07:18.50 | landley | 134528580 00000001 b completed.1 |
07:18.52 | landley | 134524100 00000004 R _fp_hw |
07:18.54 | landley | 134524104 00000004 R _IO_stdin_used |
07:18.56 | landley | 134518191 00000023 T main |
07:18.58 | landley | 134528256 00000056 d incs |
07:19.00 | landley | 134523968 00000068 T __libc_csu_fini |
07:19.02 | landley | 134514354 00000069 T flush_outbuf |
07:19.04 | landley | 134523896 00000072 T __libc_csu_init |
07:19.06 | landley | 134523068 00000138 T BZ2_blockSort |
07:19.08 | landley | 134514423 00000198 T put_bits |
07:19.10 | landley | 134523206 00000222 t fallbackSimpleSort |
07:19.12 | landley | 134513796 00000249 T init_bzip_data |
07:19.14 | landley | 134517888 00000303 T compressStream |
07:19.16 | landley | 134514045 00000309 T input_rle_data |
07:19.18 | landley | 134519993 00000412 t mainSimpleSort |
07:19.20 | landley | 134523428 00000465 t mainGtU |
07:19.22 | landley | 134518216 00000856 t fallbackQSort3 |
07:19.24 | landley | 134519072 00000921 t fallbackSort |
07:19.26 | landley | 134520405 00001290 t mainQSort3 |
07:19.28 | landley | 134521695 00001373 t mainSort |
07:19.30 | landley | 134514621 00003267 t output_mtf_data |
07:19.53 | landley | My functions are init_bzip_data, input_rle_data, flush_outbuf, put_bits, output_mtf_data, compressStream, and main. |
07:20.05 | landley | output_mtf_data is the meat of the program. |
07:21.11 | mjn3 | cool |
07:21.41 | landley | Posting to the busybox list in about 5 mins, want to run some more tests first... |
07:21.43 | mjn3 | gcc final build failed without the fake glibc defines :-( |
07:22.13 | landley | My lack of surprise is boundless, unlimited... well, perhaps merely immense. |
07:22.33 | landley | "Incestuous" is the word that comes to mind when thinking about that code... |
07:22.35 | mjn3 | indeed. i'll look at it later |
07:22.39 | mjn3 | very true |
07:22.57 | landley | "The gcc's connected to the... binutils, the binutils' connected to the... glibc..." |
07:23.09 | landley | (There's no ascii muscial notation. :( ) |
07:23.56 | landley | What exactly ARE the fake glibc defines? |
07:24.00 | landley | (How many of them are there?) |
07:24.17 | landley | I ask because it might be possible to pass them on the command line for the builds that need them. -DSHUT_UP_FSF |
07:24.34 | landley | If it's only one or two packages... |
07:25.47 | mjn3 | just 3. __GNU_LIBRARY__, __GLIBC__, and __GLIBC_MINOR__ |
07:26.22 | landley | I'd try building gcc with those forced in either the configure or make step... |
07:26.23 | mjn3 | probably need to get the configure stuff fixed before we can hope that gcc might behave |
07:26.41 | landley | Which configure stuff needs to be fixed? |
07:27.18 | mjn3 | recognizing something like <arch>-*-<{uc}linux>-uclibc |
07:27.47 | landley | Ah. |
07:27.51 | landley | Worthy goal. |
07:28.09 | landley | I'm fiddling with ./configure stuff trying to get busybox to work as part of a development environment. |
07:28.20 | landley | (That's how I got into fixing sed in the first place.) |
07:28.31 | landley | That's about to come off the back burner... |
07:29.09 | mjn3 | worthy goal... that other people seem to want to work on. i've got too much lib stuff i want to work on right now to mess much with the toolchains |
07:29.55 | landley | I know the feeling... |
07:30.06 | landley | Darn it, I just broke the thing. what did I do? |
07:31.53 | mjn3 | and seeing that other people can work on the toolchain stuff, but i'm the only one likely to work on the l10n/i18n issues... |
07:32.18 | mjn3 | which is really still a big mess, since it was meant as purely experimental code to feel my way around the issues involved |
07:33.09 | landley | Good luck there. I've never cared much for internationlization. (I know it's important, but deep down I can't help but view the world's persistent habit of not speaking english as a major annoyance, on a purely pragmatic level.) |
07:33.19 | landley | A personal failing, I know... |
07:33.36 | landley | (I've had to _DO_ internationalization. Been paid to do it even. But it's not something I work on for fun.) |
07:35.22 | landley | Wow, the amount of damage a missing ~ can do... :) |
07:35.38 | mjn3 | really all of that work was funded. the internal code really shouldn't take much more to get into shape |
07:36.51 | landley | Yeah, I generally find being paid to work on something one of the stronger motivations. :) |
07:36.52 | mjn3 | the painful thing is going to be writing a real localedef utility so that we can remove the dependency on glibc. right now, much of the locale data is being generated by running glibc-linked applications |
07:37.15 | landley | How much is involved in defining a locale? |
07:37.30 | landley | Does it work from some kind of source textfile or something? |
07:37.36 | mjn3 | yes |
07:37.46 | landley | (I vaguely remember dealing with this a looooong time ago...) |
07:37.57 | landley | Can we maybe just parse the source textfile directly? |
07:38.07 | landley | Does any system actually NEED 8 gazillion active locales at any given time? |
07:38.42 | landley | (I assume the system is using one, and that apps override it with some kind of library call if need be. So the parsing overhead shouldn't be too crazy...) |
07:38.56 | mjn3 | much of the data is shared. not in glibc's implementation of course |
07:39.11 | landley | Shared between apps? Between locales? |
07:39.20 | mjn3 | between locales |
07:39.39 | landley | So a locale could be defined as a delta vs another locale? |
07:39.40 | mjn3 | much of my work was in identifying and removing redundant data |
07:40.05 | landley | What kind of information is in a locale? If it could be some kind of rcfile with keyword=value pairs... |
07:40.32 | landley | Then the new ones could be parent=filename and some more keyword=value pairs to override stuff in the parent file... |
07:40.53 | landley | Or words to that effect... :) |
07:41.31 | mjn3 | the bulk of the data is for the collation tables |
07:41.39 | landley | What's a collation table? |
07:41.50 | mjn3 | of the current 250k or so, about 190k of that is for collation |
07:41.58 | landley | Fun. What's a collation table? |
07:42.12 | mjn3 | strcoll, strxfrm, wcscoll, wcsxfrm |
07:42.22 | mjn3 | sorting of strings per locale/codeset |
07:42.43 | landley | This is going to involve unicode, isn't it? |
07:42.49 | mjn3 | yes |
07:43.30 | landley | I thought there was one big unicode table that had all the unicode information in it? (I know sun lied to me about a lot of things, but this assumption has so far remained unchallenged...) |
07:44.04 | landley | How are strings even represented when passed to strcoll? |
07:44.17 | mjn3 | for the ctype stuff, yes. and there is a unicode collation algorithm. but that is tunable for different cultural sorting conventions |
07:44.41 | mjn3 | strcoll uses multibyte strings in the current locale's encoding |
07:44.56 | mjn3 | wcscoll uses wchar_t strings |
07:45.09 | landley | Is there a minilanguage for representing cultural unicode string sorting conventions? |
07:45.45 | landley | Both are translated to full sized unicode internally? |
07:45.57 | landley | (What's "full sized" these days? 32 bits, or still 16?) |
07:46.04 | mjn3 | kind of... but it is still table-driven. the trick is in identifying redundancies in the tables |
07:46.10 | mjn3 | 32 |
07:46.30 | landley | If they go to 64 bit unicode when we go to 64 bit processors, I'm going to point and laugh. |
07:46.31 | mjn3 | although you can actually select with gcc. but glibc and uClibc assume 32 |
07:46.34 | landley | I'm just warning you now... |
07:46.52 | mjn3 | actually, unicode is fixed at a little over 20 or 21 bits |
07:47.05 | landley | how are the tables represented? (Is there an example somewhere?) |
07:47.06 | mjn3 | UCS-4 is 32 bits |
07:47.36 | landley | Okay, strings are always sorted in some variant of "linearly", right? |
07:47.48 | landley | By which I mean get next symbol, compare, and have the ability to stop. |
07:48.13 | landley | (Taking into account that "get next symbol" could be going right to left...) |
07:48.15 | mjn3 | look at /usr/share/i18n/locales/iso14651_t1 as an example |
07:49.00 | mjn3 | strings are sorted in 4 passes where codes have weights specified at each level (pass) |
07:49.25 | mjn3 | at some levels, sorting is backwards (right to left) |
07:50.00 | mjn3 | it is also possible for a sort to depend on the number of ignored chars previous to the 2 being compared |
07:50.10 | landley | # The comment at the beginning of this section mentions characters which |
07:50.10 | landley | # are not otherwise covered. But this description cannot express this. |
07:50.10 | landley | # Therefore we add here a few entries which are used in older implementations |
07:50.10 | landley | # to be compatible. --drepper |
07:50.14 | landley | Ouch. |
07:50.32 | landley | What ARE ignored chars? |
07:50.49 | mjn3 | that file is just the basic template for a number of locales. they can include it and provide overrides |
07:51.03 | landley | I take it that order in this file is significant? |
07:51.15 | landley | <hamza> |
07:51.15 | landley | <alef> |
07:51.15 | landley | <beh> |
07:51.15 | landley | <peh> |
07:51.18 | landley | And all that? |
07:51.21 | mjn3 | gives the weight |
07:51.51 | landley | Some say collating-symbol <4> |
07:51.56 | landley | And some just say <4> |
07:51.57 | mjn3 | it took me about a month and a half to come up with a way to compress all that |
07:52.09 | landley | I'm still trying to figure out what it _means_ :) |
07:52.14 | mjn3 | there's a specification pdf somewhere. if you're interested, i can look it up |
07:52.34 | landley | I must admit to a certain morbid curiosity... |
07:52.41 | landley | But the word "rathole" is coming to mind as well... |
07:52.51 | landley | Lemme clear my current to do list a bit first. :) |
07:52.52 | mjn3 | as a comparison, glibc's locale archive for the locales we support in 250k or so is roughly 25-30MB |
07:53.16 | landley | The file I'm looking at is a glibc file. |
07:53.51 | mjn3 | yes... but that's a source file. localedef parses it and others to generate a locale file... /usr/share/locale |
07:54.02 | landley | That part I knew. |
07:54.07 | mjn3 | recent glibc's collect all locales in a locale archive |
07:54.19 | landley | It creates an evil binary blob that I pretty much wanted nothing to do with last time I built a system... |
07:54.33 | mjn3 | indeed |
07:54.40 | landley | a locale archive is not an improvement. Bundling together bad ideas gives you one big worse idea. |
07:55.11 | landley | (And if you're going to archive stuff, use zip. I think the zip format is underappreciated, myself.) |
07:55.40 | landley | It's pretty closed to a compressed filesystem. The index isn't compressed, so you can seek to any file immediately and extract just that file...) |
07:56.05 | mjn3 | the file get's mmap'd as i recall |
07:56.13 | landley | Making a tarballfs is virtually impossible, but making a zipfs (read-only) would be pretty straightforward... |
07:56.14 | mjn3 | or at least the relevant parts of it |
07:56.34 | landley | How many locales do you EVER need to look at at once? |
07:56.39 | landley | (What, three?) |
07:57.02 | mjn3 | anyway, the main goal for my stuff is to provide fairly comprehensive locale support in a small amount of space for something like a pda |
07:57.23 | landley | Well, you're an order of magnitude ahead of glibc so far. |
07:57.41 | landley | I gotta spend 5 minutes in another desktop sending my bzip implementation to the busybox list... |
07:58.02 | mjn3 | if you can budget 300k (less if compressed in flash) for locale data, then you can market your product unchanged in a lot more markets |
07:58.34 | landley | One more test... |
07:58.57 | landley | That's a nice selling point, yes. |
08:01.19 | mjn3 | another issue on my list to address is message object files for gettext. for a given application, you wind up storing the keys plus the translations for each language. that eats up a lot of space for redundant data, so i'm looking at something like a message object archive that stores the keys once for all the supported translations |
08:03.05 | landley | Hmmm... |
08:04.18 | landley | It could also be compressed pretty trivially. (It's sounding like strerror). |
08:04.25 | landley | What uclibc did with it, I mean... |
08:05.34 | mjn3 | on something like a pda, typically that stuff will be compressed anyway in a jffs2 filesystem or somesuch in flash |
08:05.45 | landley | True. |
08:06.01 | landley | In which case the duplicated keys are also compressed, although it's still a waste. |
08:06.20 | landley | Depending on how the data is grouped, there might be a benefit there anyway. |
08:06.32 | landley | Putting a lot of text together gives the compressor something to work with... |
08:06.52 | landley | (I dunno how gettext works. The key isn't an offset into an array or anything, is it?) |
08:07.01 | landley | No preprocessor magic to be done here...? |
08:07.30 | mjn3 | the key is the untranslated string. then there's a hash table that indexes into a translation table |
08:07.56 | landley | I vaguely remembered that, I was just wondering if all the hashing and stuff had to be done at runtime... |
08:08.08 | mjn3 | yes |
08:08.22 | landley | Pity. |
08:08.44 | landley | Darn it, it made it bigger again! |
08:09.00 | landley | (Taking that -18 out makes the file bigger. Every time. I wonder why?) |
08:10.19 | mjn3 | there's also catgets stuff, which doesn't have a runtime penalty. but it is more intrusive. most apps (for linux anyway) use gettext |
08:11.30 | landley | I have some vague recollection of gettext being obsoleted in favor of gettext, which makes no sense. Some kind of .(blah) syntax was going away in favor of something else... |
08:11.39 | landley | This was a couple years ago, though... |
12:25.06 | *** join/#uclibc Qui_Gon (fox@81.185.48.139) |
12:25.33 | *** join/#uclibc ambassador_ (~ambassado@h72.149.40.69.ip.alltel.net) |
14:38.37 | *** join/#uclibc Qui_Gon (fox@83.21.185.81.internet9tcollecte.9massy1-1-ro-bas-2.9tel.net) |
14:48.51 | *** join/#uclibc dsmith (~user@mail.actron.com) |
17:20.57 | *** join/#uclibc randey (~randey@202.63.116.98) |
17:59.35 | *** join/#uclibc DavidM (~david@h24-207-7-221.dlt.dccnet.com) |
19:15.11 | *** join/#uclibc andersee (~andersee@codepoet.org) |
23:32.19 | *** join/#uclibc TheMasterMind1 (~aman@h-66-167-234-81.MCLNVA23.dynamic.covad.net) |