wombats (was: GLLUG meeting topics)
Sean
picasso@madflower.com
Thu, 5 Apr 2001 13:55:45 -0400 (EDT)
On 4 Apr 2001, Matt Graham wrote:
> Sean <picasso@madflower.com> wrote:
> > On 4 Apr 2001, Ben Pfaff wrote:
> >> Hmm. That *is* interesting. But how would you tell the computer
> >> to search for a picture of a wombat?
> > There is actually a program that does that sort of thing used by graphic
> > artists. It is pricey though and I'm not sure exactly how it matches
> > stuff.
> > Personally I just hope its an eps with a text string in it, so i can do
> >a find by content on the database.
>
> Image recognition is hideously difficult. We can't even get OCR working
> very well currently (though the folks I work for have some ideas...),
> and ASCII-like characters on a printed page are far less complex than
> the average image.
It is a fairly complex algorhythm, I think you would be surprised, but I
have had ~99% accuracy for about 6 years with OCR on a variety of
documents. *shrugs*
> So I'd say that the program Sean referred to doesn't
> work very well, if at all. Search /. for "porn" and you'll find a story
> where some folks claimed to have a net-nanny proxy-type thing that could
> detect whether an image was pornographic or not... it didn't work with
> any degree of accuracy.
It wasnt talking about scanning on the fly. It doesnt have to be done
real time to be effective. I was talking about a system for cataloging
stock photos and images for large print houses and ad agencies to help
find the ones you wanted. It was an expensive program and i wont verify
its accuracy. The point was they are working tediously on the technology
and it is starting to show up at the high end.
> And to reply to Ben: Yes, the line lengths are too long. I'm damned
> lucky this f@#$ing web-mail service will let me wrap the lines of
> outgoing messages at all. That's another reason why I posted recently
> about DSL/cable in W.Lansing; I'd like to run my own mailserver and use
> mutt+vim for mail and not have to deal with the quoted-unprintable this
> thing puts out and such. Using my work mail, where I do run the
> mailserver, is not really an option--separate business from personal,
> etcetera.
Can you set it up with a like a proxy gateway? Like have a perl script
grab and parse the webpages, dump it to your local mailbox and then
resubmit it with form info?
> Yes, tar was a bad choice wrt info, but I recall having similar problems
> while searching for definitive info on the special mode bits using "info
> chmod" many months ago.
>
> Anyway, my comment about the wombat picture was not meant to be a search
> key, but a recognition key--humans see the wombat on every page of
> "Chapter 4: The 10-Layer OSI Burrito Stack", and they make appropriate
> associations. Then when they need information about an obscure
> component of level 8 ("Guacamole"), they flail around confusedly for a
> bit, remember that there was a wombat by something related to the info
> they needed, and refine their search to those pages that have wombats.
> A search key is only useful when you either know exactly what you want
> or there's a really good fuzzy-matching algorithm on the backend.
> (Computers have perfect recall, but horrible fuzzy-matching, while
> most people are the other way around.)
>
> The ways by which people construct a mental map of their surroundings
> (whether physical or informational) are many and varied; it's best to
> provide a few different kinds of landmarks in both cases if possible.
>
I dont see this working without first having made the image association,
which in the example you pointed to you had already read the book.
I mean setting up something like info or the Apple help command isnt
really that hard, I mean Apple uses HTML for the formatting of its help
documents and loading them into a heirarchael system using keyword
references isnt hard, adding a fuzzy logic system to it and it gets harder
but you could do the fuzzy logic matching up front to the keywords.
But you still havent made the initial image association.
The _major_ problem with this system is basically you are trying to do
takes up a lot of space. Unless i am taking this copletely wrong, you are
looking for something like the ORA series CD documents which are basically
ORA books in electronic format, and searchable.