Well, there are a number of printer drivers that will send the
output to a PDF file, and I am sure it should be easy to find
a component that will let you extract the text from those files.
Or install a standard PCL printer and configure it to print to
a file rather than a local port. You could then parse the pcl
output directly.
But scraping the data? Even if you could get the programs
to output the text at specific positions, I am not sure that
information might survive intact to the printer file. There are
often minor translations going on due to attempts to reconcile
differences between the printer capabilities and the requirements
of the programs (I think). Font substitution should be expected,
but perhaps the positions are intact? Needs testing.
Question is: If you can actually make the programs output
the text at specific positions, would it not be easier to make
them output something you could actually use directly?
Or are you saying that you do not have such control over
the programs, but want to specify those text positions to
the *reader*?
I do not think it is impossible, but screen scraping (in this case
print scraping?) is normally not the way to do things (not the
way to do anything). Even if you get it to work in a controlled
environment, it is a sure road to Support Hell.
Or am I just being pessimistic?
ok, let me put it another way: Screen/print scraping is not
the way to do anything *if* it can be avoided.
Then there is the whole deal of keeping track of what the
printer driver is doing and how it is configured. I guess something
like this could be useful (haven't tried it - just a quick Google
result):
http://www.blackice.com/Printer%20Dr...erProducts.htm http://www.blackice.com/Printer%20Dr...Tool%20Kit.htm
/JB
On 25 Dec 2005 07:46:38 -0800,
pi*****@yahoo.com wrote:
Hi there,
We are thinking of developing a product that needs to be able to parse
data printed via some printer driver. The data will then be used to
fill a PDF form. We believe the printer solution will be best as there
are multiple applications that we need to retrieve data from.
This is a brand new area for us, so I'm basically wondering if there is
any out-of-the-box printer drivers that create files with a structure
that we can then use to extract data from? I guess the printer files
need to have a fixed position structure so that we can tell our
software to "get X characters starting at position X".
I would appreciate any recommendations, ideas and experience regarding
this type of development. Am I missing something that makes this type
of software more or less impossible?
We plan to develop it using VB.NET.
Thank you in advance!
Peter