By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 937 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

[Half-off] How to get textboxes (text blocks) from ps/pdf files?

P: n/a
Hi!

I need to get textboxes/textblocks from pdf files. I can convert them
into ps.
Is anyone knows about method, trick, routine to I can get the textboxes
from ps or pdf?
(Pythonic, COM, or command line solutions needed.)

I need to redraw them into my application, and user can reorder them,
and next I concat. every text to process it.

I need these infos:
x, y, w, h, text

Example:
page1
textbox1{x:100,y:100;w:600;h:27;text:"TextBox1 /xfc /xfa"}
textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"}
page2
textbox1{x:100,y:100;w:600;h:27;text:"TextBox1"}
textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"}
....

Any solution?

Thanks for it!
dd

ps1:
I tried every pdf2text and pdf2html application. All failed in the
test.
Only one provide good informations, the pdftohtml, because it is
makes divs with abs. position and size and the texts.
But this program is not handle the iso-8859-2 chars, so I lost them.

ps2:
The program must run under Windows XP. So the solution is os specific.
Jan 3 '07 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.