473,320 Members | 1,950 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Text and Image extraction: XML conversion

1
Using the DocBook DTD, follow a two step conversion, to first extract the text and then extract the image

A challenge faced is to extract the image links from the source document into xml

when using the OCR to extract text, is there a solid approach to ensure artifacts and links are properly handled; know a manual approach would be labor intensive.

any suggestions?

Thank you
Jul 14 '07 #1
1 1647
Dököll
2,364 Expert 2GB
Using the DocBook DTD, follow a two step conversion, to first extract the text and then extract the image

A challenge faced is to extract the image links from the source document into xml

when using the OCR to extract text, is there a solid approach to ensure artifacts and links are properly handled; know a manual approach would be labor intensive.

any suggestions?

Thank you
Let's start you off with this link, jartan:

http://www.w3schools.com/dtd/dtd_intro.asp I have a feeling you have many questions, this can get you thinking. Please tell us what you find out. Good luck and welcome!
Jul 24 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Kangol kangoll | last post by:
hi, i've been trying to figure out how to make visual basic start extracting text at a certain line to when i tell it to stop. I am making trying to make a program where it extracts text from a...
2
by: Oz Mortimer | last post by:
Hi There Does anyone have an example of how to overlay text onto an image (.gif) and then save the file? Any good class files or source out there? Many Thanks Oz.
2
by: Vyz | last post by:
I am looking for a PDF to text script. I am working with multibyte language PDFs on Windows Xp. I need to batch convert them to text and feed into an encoding converter program Thanks for any...
7
by: Jim Smith | last post by:
I know very little about Access but a friend who probably knows less than I do has a database in Access 2002 that is 155MB. There are images in the database we would like to extract, but neither of...
0
by: buzzer | last post by:
i would like to build a software coding which can classify image and pattern using artificial neural networks the idea is it should be able to do feature extraction on a certain image (can consist...
3
by: dec01louis | last post by:
Hi all, actually i'm now doing something on license plate recognition system for my project. The first step would be the license plate extraction algorithm which means it is needed to extract a...
16
by: EM.Bateman | last post by:
Working on Visual Studio .Net I've implemented a class: #ifndef CONTRIBUTOR_H #define CONTRIBUTOR_H enum Gender {male=1, female, unk}; #include <iostream> #include <iomanip> #include...
0
by: MikeY | last post by:
I'm having trouble extaction my image from my resource file using DictionaryEnumerator & ResourceReader. String extractions are good. I've been trying to look online for exact info on this, but...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Shćllîpôpď 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.