473,397 Members | 1,974 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Extracting data from a document

Hi Guys,

Not a problem with my code, but something I would like to add, (ASP
VBScript) at the moment I have a form where a user uploads their details
including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
folder on the server with the address being stored in the database and I'm
tracking the user id through sessions.

What I would like to do after the upload is redirect to a blank page, where
some script extracts the data from the document and inserts it into another
field on the database associated with the user id, I think this may be called
parsing, but I'm at a complete loss, I don't suppose you guys have any ideas
on this do you.

I think this would probably make a really neat little extension also...

Look forward to your responses.

G
Jun 27 '08 #1
5 1899

"GTN170777" <GT*******@discussions.microsoft.comwrote in message
news:4D**********************************@microsof t.com...
Hi Guys,

Not a problem with my code, but something I would like to add, (ASP
VBScript) at the moment I have a form where a user uploads their details
including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
folder on the server with the address being stored in the database and I'm
tracking the user id through sessions.

What I would like to do after the upload is redirect to a blank page,
where
some script extracts the data from the document and inserts it into
another
field on the database associated with the user id, I think this may be
called
parsing, but I'm at a complete loss, I don't suppose you guys have any
ideas
on this do you.

I think this would probably make a really neat little extension also...
Given that Classic ASP is no longer being developed, you are unlikely to get
MS to consider any extensions to the framework. Also, how you obtain the
contents of the file will differ enormously. A simple text file is easy.
You just use the FileSystemObject to gain access to the text. A PDF is
totally different, and there are a number of third party components
available for messing around with PDFs. Microsoft haven't even provided a
native way to deal with PDFs in the .NET framework, which is the technology
they are now devoting all their development time on. You have to dig around
for third party stuff there too.

We use a number of third party components for text parsing, and some
conditional code to identify the filetype, and then choose the component
accordingly. However, they wouldn't be of any use to you as they are
employed in a Delphi forms app.

--
Mike Brind
Micrisift MVP - ASP/ASP.NET
Jun 27 '08 #2
In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]

You'd probably be better off with PDF, thanks to a third party component
named "AspPDF", but be forewarned that it's not cheap and it, also, has a
pretty good learning curve needed.

You are after one of the holy grails of database developers: The ability to
do "data mining" on non-database, non-text files. And each file type has to
be approached separately, using different tools, it seems. People make good
money producing tools to do this stuff, and generally they don't sell the
tools--they just sell the [expensive] service of doing the data mining for
you.

In short, if you are a newbie programmer, this probably isn't a project you
want to try tackling, yet.

Jun 27 '08 #3

"Old Pedant" <Ol*******@discussions.microsoft.comwrote in message
news:70**********************************@microsof t.com...
In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use
various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]
The Delphi bods here use Word as a COM object and cause anything that isn't
a PDF to open in Word. That's ok on a desktop, where the user is able to
dismiss any dialogue or message boxes that might be instantiated, thus
allowing the app to close, but you can imagine what will happen if these
message boxes open on a web server (on Rack #364 in some unmanned room deep
in the bowels of some Data Centre God knows where...). That's one of the
primary reasons MS advise against automating Word in web applications.

--
Mike Brind
Microsoft MVP - ASP/ASP.NET
Jun 28 '08 #4
Thanks for our input guys, you've made me re think the idea!!!, I guess for
the project that we're working on it would be a nice add on.... I'm sure the
geniuses at MS will come up with something that makes the process a little
less hair pulling in a couple of years or so, and that will be the time to
add it,..... till then it's a nice add on, that can wait.

Thanks both...

GTN

"Mike Brind [MVP]" wrote:
>
"Old Pedant" <Ol*******@discussions.microsoft.comwrote in message
news:70**********************************@microsof t.com...
In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use
various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]

The Delphi bods here use Word as a COM object and cause anything that isn't
a PDF to open in Word. That's ok on a desktop, where the user is able to
dismiss any dialogue or message boxes that might be instantiated, thus
allowing the app to close, but you can imagine what will happen if these
message boxes open on a web server (on Rack #364 in some unmanned room deep
in the bowels of some Data Centre God knows where...). That's one of the
primary reasons MS advise against automating Word in web applications.

--
Mike Brind
Microsoft MVP - ASP/ASP.NET
Jun 28 '08 #5
GTN170777 wrote:
Thanks for our input guys, you've made me re think the idea!!!, I
guess for the project that we're working on it would be a nice add
on.... I'm sure the geniuses at MS will come up with something that
makes the process a little less hair pulling in a couple of years or
so,
Don't count on it. They've had 30+ yrs now ...
--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"
Jun 28 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Ken | last post by:
What software is required to extract certain data from an XML document?
9
by: AA | last post by:
Hello, I need to extract an element from a xml document something like this <myXml> <Header> <Name/> <LastName/> <Age/> </Head> <Body> <Properties>
2
by: Chris Belcher | last post by:
First some background... The database tracks Action Items assigned to a group of 20 or so managers. Once the assignment is created it is then emailed to each of the managers that are included in...
0
by: Mico | last post by:
I would be very grateful for any help with the following: I currently have the code below. This opens a MS Word document, and uses C#'s internal regular expressions library to find if there is a...
2
by: Kevin K | last post by:
Hi, I'm having a problem with extracting text from a Word document using StreamReader. As I'm developing a web application, I do NOT want the server to make calls to Word. I want to simply...
4
by: james.eaton | last post by:
I have an XML Schema Document (.xsd) that contains documentation tags (specifically, custom tags subordinate xsd:appinfo). These documentation tags contain strings that may contain ampersands. I...
2
by: chris_j_adams | last post by:
Hi, I'm slowly discovering the world of JavaScript, so I'm not sure I'm attacking this problem in the right manner, thus if I'm in the wrong newsgroup, my apologies. What I'm trying to do is...
2
by: Lana rose | last post by:
I am trying to produce some code in C++ that will be able to scan through a mixed document and extract specific lines of data. The document will look like this (below) but will have hundreds of these...
4
by: Debbiedo | last post by:
My software program outputs an XML Driving Directions file that I need to input into an Access table (although if need be I can import a dbf or xls) so that I can relate one of the fields...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.