By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,321 Members | 1,909 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,321 IT Pros & Developers. It's quick & easy.

scraping display to obtain all on-screen text using OCR

P: n/a
I would like to perform a more classical type of "screen scraping"
than what most people now associate with this term. I only want to
find all the text on the current screen, and obtain associated screen
coordinates. This probably must be done using OCR.

This need only run on Windows. A fairly-pure Python solution would be
ideal because most of the software which would use this functionality
is also written in Python.

The ideal output would consist of a list of tuples, where each tuple
consists of ("string found", a, b, c, d) where the latter four
constitute a bounding rectangle associated with the text that was
found. It might also be handy to throw in some font information.

Thanks in advance for any pointers.

Jonathan
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
ja**********@yahoo.com (Jonathan Epstein) writes:
I would like to perform a more classical type of "screen scraping"
than what most people now associate with this term. I only want to
find all the text on the current screen, and obtain associated screen
coordinates. This probably must be done using OCR.
This need only run on Windows.


Usually you do that by intercepting the Windows text painting events,
rather than anything as horrendous as OCR'ing.
Jul 18 '05 #2

P: n/a
Jonathan Epstein wrote:
I would like to perform a more classical type of "screen scraping"
than what most people now associate with this term. I only want to
find all the text on the current screen, and obtain associated screen
coordinates. This probably must be done using OCR.

This need only run on Windows.
You can use the accessibility APIs to get that information. Start at
http://weblogs.asp.net/oldnewthing/a...23/118893.aspx
A fairly-pure Python solution would be
ideal because most of the software which would use this functionality
is also written in Python.


You may be able to do it using win32all, ctypes or worst case a
SWIG wrapper.

Roger
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.