Feb. 11th, 2009

walkitout: (Default)
Once upon a time, in the early 1990s, when I was working for DEC, I volunteered to compare, word by word, a scanned in _Can Such Things Be_ (book and text file supplied to me) and correct the text file to match the book (OCR not being perfect back in the day). This was fun. It was part of the Wiretap project, and for a while, was included in Project Gutenberg's works, but doesn't seem to be anymore (they have a newer version).

Should you be curious to view the results, they can be found here:

http://infomotions.com/etexts/literature/american/1900-/bierce-can-285.htm

It can also be found elsewhere.

(I had a different last name back then, when I was married to my first husband.)

As a result of this experience, I learned a variety of things, but for the purposes of this post, I learned that scanned in books had a lot of things that needed to be fixed.

Some time before that, while in college, I did a little work on a robotics lab (yeah, really), partly on text-to-speech (among other things, I added a module so if the robot ran across a roman number, it would read it correctly, instead of saying something like L-V-I-I-I. Or worse.). In the course of doing that, I learned a little about the current state-of-the-art of speech-to-text and boy, howdy was I unimpressed.

Time goes by. You can now say, "Two" to an evil phone menu and sometimes it will recognize it (and if your toddler yells in the background, your selection will be misunderstood and hopefully generate a, sorry, didn't quite catch that, and hopefully won't be misunderstood as a request for spanish). I do get that OCR has improved, and that you can do all kinds of massaging to try to "understand" what's on the page via spell-checking and grammar-checking and all that stuff. There is, unfortunately, a big problem with applying spell-checkers and grammar-checkers to out-of-print books, particularly old ones. A little quote from Coleridge should make this clear:

"Water, water, every where
Nor any drop to drink."

Any reasonable grammar checker and possibly some spell checker is going to clean that up in a way that's going to do damage to the text.

How, then, is it possible to do a search of the text of google books? Well, my guess is that this is just Beautiful Magic. They've got a Pretty Darn Good idea of what the text says, and that's what they use for pattern matching off your search. But they don't ever _show_ you their idea of what the text says. They just map that back to the scan, and show you the entire scanned page. Hopefully, any error they made will be really hard to detect as a result. Beautiful Magic.

I assume (and this may be a large assumption) that the version of Google Books available on the iPhone, Android and others is essentially the same as what I see on my laptop: an image, more or less, the scanned in-image of the text. This cannot be done on the kindle for a couple of reasons. First, I don't think even the new display is up to the task (altho I could be wrong about that). Second, the cost of sending that crap over sprint's EVDO network would rapidly cause problems for the Sprint/Amazon partnership. There's no charge to the kindle user to download Amazon's DRMed Mobi/PRC/wtf files because that shit is tiny. You start schlepping images around and everyone sits down to have a little chat about Cost. A chat that may or may not need to happen with all-you-can-eat plans on iPhones. I have no idea how those work. I do know that my EVDO plan for my card when I had it, and my Centro now has some Very Interesting small print.

Obviously, it's easy enough now to take any Mobi/PRC/wtf book file (so, basically, any of the Project Gutenberg-like stuff) and stick it on your kindle and read it (just use the freaking USB cable). Plenty of people already had. But the Google plan to scan entire academic libraries sort of brings the game to a new level. I would love to hear any ideas about how google books could become something that might work on the kindle. Handwaving around the scanned image vs. text problem is amusing, but not tremendously helpful. I might by a detailed technical explanation of how to massage OCR to get the text good enough -- but it's going to have to include innovation from the last decade-ish (because the technology did not exist around the time I retired). And it has to explain why google would store and display the scanned image instead of showing you the textual interpretation. If it's good enough for the kindle, it should be good enough for google to display.
walkitout: (Default)
I'm a big fan of the all-in-one device. Really! I know you're skeptical, given all this kindle crap I keep posting about. I bought the first Treo cell phone on the market, largely because it pissed me off that I was carrying around a few ounces of phone AND a few ounces of PDA. I put up with a couple rounds of it breaking on me, too, before I gave in and got a basic phone (had to when I moved out here where this is nothing but Verizon anyway) and eventually got a new PDA which I was never very happy with. Regular readers may recall my agonizing over the purchase of a Centro, and whether or not I should stall because I might move and blah, blah, bleeping blah, but I ultimately decided that it would justify its existence quickly enough to make it worth the trouble and indeed, I've used the web browser and check e-mail features often enough in parking lots while waiting to be reasonably pleased.

In fact, the centro plus a kindle makes a laptop, at least on short trips, superfluous. At least, as long as there is a navigator in the vehicle we are driving. Which would be a form of the abandon-your-laptop argument for the kindle. But that's actually not where I'm going here.

One of the places I'm trying to explore is cost. I've pretty much beaten to death the cheaper device vs. savings on books argument. Now I'm after the monthly-cost aspect. Obviously, a non-connected PDA that you stuff Mobipocket books (or similar, ideally free public domain things, or Baen backlist or whatever) on is going to be the cheapest all-around: no/minimal cost for the books, cheap device cost, no monthly cost (okay, power to recharge something, and however you get stuff from the internet onto your device). The kindle adds a higher device cost, but no monthly cost if you stick to the same kind of books (which you can) and use the USB cable.

But that's not what a lot of people are talking about. People seem to really like to get the books onto the device without the intervening computer. I'm okay with that (hey, that's one of the benefits of moving -- I'll be able to use the EVDO connection on the kindle). And an iPhone or even another smartphone is probably a cheaper device to buy than the kindle to get that connectivity (and it's a phone, but we'll get back to that in a moment). But the monthly cost on the iPhone or other smartphone is kinda steep (figuring $70 and up for an all-you-can-eat data plan), whereas the monthly cost on the kindle is very small (if you trigger the .10 charge for transferring a file). (I'm not talking about buying content in this post, just the cost to use the wireless transfer.)

These are obviously different fruits. And herein lies another area for the gadget press to seriously disconnect from the standard audience for the kindle. Your basic nose-in-a-book person is not your basic-hang-on-the-phone person -- or even your basic constantly-texting person. Your basic nose-in-a-book person may, in fact, be quite reluctant to use a phone, and if they have a cell phone at all, it's primarily for emergencies, kid pickup and drop off and similar. A phone may be useful, but it brings no joy. Similarly, a PDA is useful, but not typically a source of joy.

And _that_ is the beauty of the all-in-one device. I have to carry all this crap around with me (money, a pen, ID, diapers, a change of clothes for A., wipes, blah, blah bleeping blah), including a cell phone and probably a PDA. None of those things bring me joy. They are things I use.

The kindle, by contrast, I do not have to carry around with me and often I do not. But the kindle brings me joy. And let me tell you, if you've got the money, it's a lot easier to spend money on joy, than on useful.

If, however, I was not a nose-in-a-book person, but rather a hang-on-the-phone or constantly-text person, I'm betting I'd be a whole lot less interested in the kindle, and a whole lot more interested in figuring out a way to tack the utility of reading a book when I'm stuck somewhere onto my existing stash of useful items. Because _then_ it would not be about joy. It would be about useful. A kind of useful primarily associated with being stuck waiting somewhere unexpectedly. Not really $359 worth of disposable income material. Certainly not carry 10.whatever ounces worth of probably breakable and definitely dead battery when I need to use it material.

August 2025

S M T W T F S
      1 2
3 4 5 6 7 8 9
10 11121314 1516
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 19th, 2025 02:32 pm
Powered by Dreamwidth Studios