By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,106 Members | 1,081 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,106 IT Pros & Developers. It's quick & easy.

Text and Image extraction: XML conversion

P: 1
Using the DocBook DTD, follow a two step conversion, to first extract the text and then extract the image

A challenge faced is to extract the image links from the source document into xml

when using the OCR to extract text, is there a solid approach to ensure artifacts and links are properly handled; know a manual approach would be labor intensive.

any suggestions?

Thank you
Jul 14 '07 #1
Share this Question
Share on Google+
1 Reply


Dököll
Expert 100+
P: 2,364
Using the DocBook DTD, follow a two step conversion, to first extract the text and then extract the image

A challenge faced is to extract the image links from the source document into xml

when using the OCR to extract text, is there a solid approach to ensure artifacts and links are properly handled; know a manual approach would be labor intensive.

any suggestions?

Thank you
Let's start you off with this link, jartan:

http://www.w3schools.com/dtd/dtd_intro.asp I have a feeling you have many questions, this can get you thinking. Please tell us what you find out. Good luck and welcome!
Jul 24 '07 #2

Post your reply

Sign in to post your reply or Sign up for a free account.