469,306 Members | 1,901 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,306 developers. It's quick & easy.

Extracting text from .png images

Hi group!

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...

Thanks in advance,

Henrik
Jul 18 '05 #1
7 15011
"Henrik Berg Nielsen" <hb*@imada.sdu.dk> writes:
I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...


OCR is the TLA you're looking for ("Optical Character Recognition").

Dunno if there are any good free OCR engines. With these sorts of
hard algorithms, you tend to get what you pay for.
John
Jul 18 '05 #2
Henrik Berg Nielsen <hb*@imada.sdu.dk> spake thusly:

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed
to a Python script for further processing. Any good ideas on how to go
about with this? I have no idea whatsoever about how to extract the
numbers out of the images...

This might help you out...
http://www.pricelessware.org/2003/PL...tm#Convert-OCR

I'm not sure if it does PNG, you might have to convert the file to tiff or
bmp or something.
--
Audio Bible Online:
http://www.audio-bible.com/
Jul 18 '05 #3
In article <wb*****************@news.get2net.dk>, Henrik Berg Nielsen wrote:
Hi group!

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...

http://www.claraocr.org/

Jul 18 '05 #4
John> OCR is the TLA you're looking for ("Optical Character Recognition").

John> Dunno if there are any good free OCR engines. With these sorts of
John> hard algorithms, you tend to get what you pay for.

Which often means there's a piece of free software out there which works
better than the most expensive commercial solutions. <wink>

A little googling suggests this might be a candidate:

http://www.claraocr.org/

I have no idea if there's an exported library and/or a Python wrapper, but
it's probably worth a look.

Skip

Jul 18 '05 #5
"Henrik Berg Nielsen" <hb*@imada.sdu.dk> wrote:

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...


Are you hoping to extract the "password" characters from the pictures
presented by the whois checks? If so, you should give up now, because
those images are SPECIFICALLY designed to make them almost impervious to
automated recognition.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jul 18 '05 #6
Henrik Berg Nielsen wrote:
Hi group!

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...


Hi,
I'm dealing with similar problem now. My pictures are very complicated
(construction drawings). I am trying to use gamera
(http://dkc.jhu.edu/gamera/) for OCR and it seems very promising.

--
-- Lukas
Jul 18 '05 #7
On Wed, 01 Oct 2003 20:25:45 -0700, Tim Roberts <ti**@probo.com> wrote:
"Henrik Berg Nielsen" <hb*@imada.sdu.dk> wrote:

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...


Are you hoping to extract the "password" characters from the pictures
presented by the whois checks? If so, you should give up now, because
those images are SPECIFICALLY designed to make them almost impervious to
automated recognition.

Sounds interesting as a problem, but I wouldn't want to create a skeleton key
for any bad guys ;-)

Regards,
Bengt Richter
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Dr. Lince M. Lawrence | last post: by
5 posts views Thread by Michael Hill | last post: by
4 posts views Thread by Moogy | last post: by
1 post views Thread by Mark Jones | last post: by
2 posts views Thread by ming | last post: by
4 posts views Thread by Ant | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
1 post views Thread by Geralt96 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.