473,325 Members | 2,860 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

scraping display to obtain all on-screen text using OCR

I would like to perform a more classical type of "screen scraping"
than what most people now associate with this term. I only want to
find all the text on the current screen, and obtain associated screen
coordinates. This probably must be done using OCR.

This need only run on Windows. A fairly-pure Python solution would be
ideal because most of the software which would use this functionality
is also written in Python.

The ideal output would consist of a list of tuples, where each tuple
consists of ("string found", a, b, c, d) where the latter four
constitute a bounding rectangle associated with the text that was
found. It might also be handy to throw in some font information.

Thanks in advance for any pointers.

Jonathan
Jul 18 '05 #1
2 4330
ja**********@yahoo.com (Jonathan Epstein) writes:
I would like to perform a more classical type of "screen scraping"
than what most people now associate with this term. I only want to
find all the text on the current screen, and obtain associated screen
coordinates. This probably must be done using OCR.
This need only run on Windows.


Usually you do that by intercepting the Windows text painting events,
rather than anything as horrendous as OCR'ing.
Jul 18 '05 #2
Jonathan Epstein wrote:
I would like to perform a more classical type of "screen scraping"
than what most people now associate with this term. I only want to
find all the text on the current screen, and obtain associated screen
coordinates. This probably must be done using OCR.

This need only run on Windows.
You can use the accessibility APIs to get that information. Start at
http://weblogs.asp.net/oldnewthing/a...23/118893.aspx
A fairly-pure Python solution would be
ideal because most of the software which would use this functionality
is also written in Python.


You may be able to do it using win32all, ctypes or worst case a
SWIG wrapper.

Roger
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Tristan Miller | last post by:
Greetings. I am trying to write a function which toggles the display of a certain class of <div> elements in an HTML page. The CSS file initially sets some classes to "display: none", and...
1
by: mustafa | last post by:
anyone know some good reliable html scraping (with python) tutorials. i have looked around and found a few. one uses urllib2 and beautifull soap modules for scraping and parsing...
2
by: Selden McCabe | last post by:
I've been working on a web scraping program, and have the basics down. But I don't understand the parameters. Normally, you go to a URL (say a reverse yellow pages directory), and enter some...
4
by: Eric A. Johnson | last post by:
For the following code: ' return String representation of CTriangleShape Public Overrides Function ToString() As String ' use MyBase reference to return CShape String Return...
3
by: Sanjay Arora | last post by:
We are looking to select the language & toolset more suitable for a project that requires getting data from several web-sites in real- time....html parsing/scraping. It would require full emulation...
1
by: Essial | last post by:
I have written a fairly simple application in C# that loops through the forms, and all the children of all the forms, and have generated a tree view with all the properties. All this works good and...
9
by: perls | last post by:
"Newbie needs help" Hi all, I had a programmer do a site scraping script for me.. the aim was to scrape data from 5 different sites and upload directly into my website databse. I started to...
4
by: ssg31415926 | last post by:
I'm familiar with the idea of page-scraping - getting data from websites where there's no nice feed to use. Is there a name for the opposite i.e. when your program simulates a human inputting...
2
by: shannonw | last post by:
Hi, I'm looking for a piece of software or coding that will let me post a form to another URL, accept the response, search it for a specific "success" string and then let me continue...
1
by: dearprasan | last post by:
I have a custom browser application built in C#. I want to access the Internet Explorer's Cache to display contents on this custom browser application. For example: If the user types "www.msn.com"...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.