Aloha,
Paul Rubin schrieb:
Simon Burton <si****@NOTTHISBIT.webone.com.au> writes: http://www.reportlab.org/
handles pdf files.
Reportlab generates reports in pdf format, but I want to do the
opposite, namely read in pdf files that have already been generated by
a different program, and crunch on them. Any more ideas? Thanks.
The commercial version (reportlab.com) mentions a tool named
PageCatcher, that seems to be able to extract pages and page descriptions
out of .pdf documents. There is not that many information on the web-page.
If you read comp.text.tex you will find various solutions for composing
and a few for extracting data/content from .pdf documents. Afaik there
is at the moment (read as: i'm working on it) no free-self-contained-
python solution. But as python is very interface-friendly you can use
general tools like gs easily.
For your problem i would suggest to use gs als a .pdf to .ps filter
in the first place, work on the .ps and distill back with gs.
Wishing a happy day
LOBI