Poor man's OCR: need performance improvement tips

qvx

Hi all,

I have a performance problem in my app. It is a poor man's version of
OCR app. I started this app as a prototype before implementing it in
C++. But now, after I have a working copy in Python, I don't feel like
doing the job again in C++. A speed improvement of at least 5 times
would be necessary (I have to process several hundreds of thousands of
images).

The application works like this:

1. Load a raster image of alphabet. This image is accompanied with xml
description of that image (positions of each letter etc.). No problems
here. Using this information load raster data (data[][]) for each
letter.

2. Load image which must be recognized and transform it into 1 bit
image (to avoid palette issues). No problems here.

3. Detect lines of text in input picture. No problems here.

4. Process each line: compare pixels of each letter of alphabet with
corresponding pixels in line of input picture. This consists of loops
comparing pixel by pixel. This is my performance bottleneck.

I'm using PIL for initial image processing. But then I use plain Python
loops for pixel matrix comparision. One nice optimization was to call
PIL.Image.getdata() at the begining and then use data[y*w+x] instead of
PIL.Image.getpixel(xy). I would like to compare each character raster
with corresponding image pixels in a "single operation" and avoid
(Python) loops.

Oh, one more thing. Letter raster matrices have varying width and
constant height (due to proportional width font which is used). This
compare function should signal if there is a single different pixel.

Any library that can do that?
Here is how I expected to solve this problem in C++. Each line of text
(and letter) has height below 16 pixels. It can be artificially made
into 16 pixels. I planned to linearize each letter's matrix by columns.
Imagine leter with pixel indices numbered like this:

00 10 20
01 11 21
02 12 22
03 13 23
.. .. ..
0f 1f 2f

I would convert it into 00 01 02 03 04 05 ... 2e 2f. Since each pixel
is one bit wide, each column would be 2 octets long. I would do the
same to the line of text of input picture. Then i would have to compare
two buffers of length equal to the length of character. After
successfull match, I would advance "input" stream by that number of
bytes.

Thanx
qvx

Sep 24 '05 #1

Subscribe Post Reply

2604

D.Hering

Hi Take a look at ADaM's image processing functionality. I'd also
suggest seeing if Numarray of Scipy can be utilized.

Here's ADaM's site. I'm sure your familiar with the others mentioned.

http://datamining.itsc.uah.edu/adam/

hth,
Dieter

Sep 24 '05 #2

Alex Stapleton

On 24 Sep 2005, at 19:14, qvx wrote:

Hi all,
<snip>

4. Process each line: compare pixels of each letter of alphabet with
corresponding pixels in line of input picture. This consists of loops
comparing pixel by pixel. This is my performance bottleneck.

I'm using PIL for initial image processing. But then I use plain
Python
loops for pixel matrix comparision. One nice optimization was to call
PIL.Image.getdata() at the begining and then use data[y*w+x]
instead of
PIL.Image.getpixel(xy). I would like to compare each character raster
with corresponding image pixels in a "single operation" and avoid
(Python) loops.

Oh, one more thing. Letter raster matrices have varying width and
constant height (due to proportional width font which is used). This
compare function should signal if there is a single different pixel.

Any library that can do that?
Here is how I expected to solve this problem in C++. Each line of text
(and letter) has height below 16 pixels. It can be artificially made
into 16 pixels. I planned to linearize each letter's matrix by
columns.
Imagine leter with pixel indices numbered like this:

00 10 20
01 11 21
02 12 22
03 13 23
.. .. ..
0f 1f 2f

I would convert it into 00 01 02 03 04 05 ... 2e 2f. Since each pixel
is one bit wide, each column would be 2 octets long. I would do the
same to the line of text of input picture. Then i would have to
compare
two buffers of length equal to the length of character. After
successfull match, I would advance "input" stream by that number of
bytes.

Presumably you don't care about alignment and kerning and other
things currently.

If you haven't tried Psyco yet, try it.
If you read the image in rotated 90 degrees then the data is
linearised how you want it already. You could then just pack it into
an integer and compare that, or look it up in a dictionary even.

e.g.
char = lookup[data[n:n+2]]

where n is the left (or bottom, we rotated in 90 degrees remember?)
and 2 is me assuming PIL will not encode each pixel as entire byte in
a 1bpp image. I would of thought that would be pretty quick as long
as you could get the alignment reliable enough.

I hope this makes some actual sense, I have 0 OCR experience tbh.

Sep 24 '05 #3

qvx

I also have 0 OCR experience, but the case is simple enough.

I know about scipy but I have 0 experience with it. I was actually
hoping somebody who knows about it might have some recipe.

I also tried psyco, but unfortunetly, the speedup is only few percent.

I will check out ADaM's site.

I was hoping to replace matrix comparison with something more efficient
(minimal code change). I like the idea of dictionary lookup but it
requires more code to be changed. If nothing else comes up I will
probably try this method. Of course I will have to check the wider
characters first so there will be presumably several lookups for each
position. The only problem here is how to efficiently transform
portions of input picture into suitable format (some kind of
list/buffer).

Thanks.

Sep 24 '05 #4

D.Hering

I'm working on essentially the same thing for a real-time context. No
formal schema developed yet, but I know that I'll be using some
combination of the following: ADaM, ESML (Binary data format
unification...xml config), Numarray/Scipy, Pytables (DataBase), PIL,
Cairo (svg), and MatPlotlib.

Sep 24 '05 #5

D.Hering

I thought I should correct and clarify the use of EMSL (even though
this isn't specific toward your request, qvx.)

http://esml.itsc.uah.edu/index.jsp

It's more for data format recognition and conversion. Here's a
description from the site. The sw was developed for earth science, but
is very useful for any domain.

ESML is an interchange technology that enables data (both structural
and semantic) interoperability with applications without enforcing a
standard format. Users can write external files using ESML schema to
describe the structure of the data file. Applications can utilize the
ESML Library to parse this description file and decode the data format.
As a result, software developers can now build data format independent
applications utilizing the ESML technology. Furthermore, semantic tags
can be added to the ESML files by linking different domain ontologies
to provide a complete machine understandable data description. This
ESML description file allows the development of intelligent
applications that can now understand and "use" the data.

Sep 24 '05 #6

John J. Lee

"qvx" <qv*****@gmail.com> writes:
[...]

4. Process each line: compare pixels of each letter of alphabet with
corresponding pixels in line of input picture. This consists of loops
comparing pixel by pixel. This is my performance bottleneck.

I'm using PIL for initial image processing. But then I use plain Python
loops for pixel matrix comparision. One nice optimization was to call
PIL.Image.getdata() at the begining and then use data[y*w+x] instead of
PIL.Image.getpixel(xy). I would like to compare each character raster
with corresponding image pixels in a "single operation" and avoid
(Python) loops.

[...]

I don't know what exactly "compare" means here, so no Numeric code,
but GIYF when it comes to the PIL<-->Numeric conversion (I imagine
numarray is almost identical here, though I've not used it in anger):

http://effbot.org/zone/pil-numpy.htm
Since you mention C++, scipy.weave may also be of interest to you.
John

Sep 24 '05 #7

tvrtko.sokolovski

Imagine a large matrix with dimensions [W,H], and a lots of smaller
matrices with dimensions [p,q1], [p,q1], [p,q2], [p,q1], ... I have to
slide a small window [p,q] horizontally over a larger matrix. After
each slide I have to compare smaller matrices with the data from larger
matrix (as defined by sliding window).

I'm currently trying to use other kinds of optimizations (linearize
data by columns), but the program no longer works, and it is so hard to
debug. But it works very fast :)

Here is an example of linearization by columns that i'm currently using
:

# setup: convert to 1 bit image
img = Image.open(file_name)
img2 = img.point([0]*255 + [1], "1")

# find ocr lines, and for each do ...

# extract OCR line
region = img2.crop((0, ocrline.base_y-13, width, ocrline.base_y+3)) #
h=16
region.load()

# clean up upper two lines which should be empty but
# sometimes contain pixels from other ocr line directly above
draw = ImageDraw.Draw(region)
draw.line((0,0,width,0), fill=1)
draw.line((0,1,width,1), fill=1)

# transpose data so I get columns as rows
region = region.transpose(Image.FLIP_LEFT_RIGHT)
region = region.transpose(Image.ROTATE_90)
ocrline.data = region.tostring() # packs 8 pixels into 1 octet

I do the same for known letters/codes (alphabet). Then I compare like
this:

def recognize (self, ocrline, x):
for code_len in self.code_lengths: # descending order
code = ocrline.data[x:x+code_len]
ltr = self.letter_codes.get(code, None)
if ltr is not None:
return ltr, code_len # So I can advance x

This currently "works" two orders of magnitude faster.

Sep 25 '05 #8

Similar topics

Database performance degrading (again)

by: teedilo | last post by:

We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of...

Microsoft SQL Server

Access Performance Problems

by: Scott | last post by:

I have a customer that had developed an Access97 application to track their business information. The application grew significantly and they used the Upsizing Wizard to move the tables to SQL...

Microsoft Access / VBA

Split Database Poor Performance

by: Daveo | last post by:

Hi there, Since splitting my database, one form in particular takes about 10 times as long to load and refresh, compared to the unsplit version on the server. The code behind it contains 36...

Microsoft Access / VBA

What techniques are key to Access performance with separate front-endand back-end Access implementation via a LAN?

by: Bob Alston | last post by:

Most of my Access database implementations have been fairly small in terms of data volume and number of concurrent users. So far I haven't had performance issues to worry about. <knock on wood> ...

Microsoft Access / VBA

blocking i/o vs. non blocking i/o (performance)

by: Andre Kelmanson | last post by:

Hi, I'm writing a proxy application for rfb protocol (vnc), but i'm not satisfied with it's performance. I'm using blocking i/o and the app just read(...) from source and the write(...) to...

C / C++

Visual Inheritance - Got Fix - Need Problem?

by: Richard Myers | last post by:

Howdy, I haven't struck this before.... I am visually inheriting several forms from another. All works well but then i thought i'd create an enum that is used only by this form - not even its...

Visual Basic .NET

The performance of all kinds of C operations

by: mrby | last post by:

Hi, Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication" on a typical machine. Thanks!...

C / C++

Poor man's captcha: why wouldn't this work?

by: pittendrigh | last post by:

Let's say we're trying to keep blog and forum spammers out of our site--we're not trying to protect fort knox. 1) Step one is a one-time-only step. We create six different css files that define...

PHP

Poor python and/or Zope performance on Sparc

by: joa2212 | last post by:

Hello everybody, I'm posting this message because I'm quiet frustrated. We just bought a software from a small software vendor. In the beginning he hosted our application on a small server at...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing