473,382 Members | 1,329 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Extracting text from a "word document"-stream

I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to
read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?
Nov 17 '05 #1
2 3146
Claus,

Why not save the contents of the stream to disk, and then read the
contents from that?

Also, I am pretty sure that the Document class in word implements the
IPersistStream interface (I can't imagine that it doesn't). However, this
is a COM interface, and it doesn't work with .NET streams, rather, it works
with the IStream interface in COM. All in all, you are better off saving
the contents of a stream to a file on disk, and then working from that.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Claus - Arcolutions" <cm*@arcolutions.dk> wrote in message
news:eF**************@TK2MSFTNGP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality
to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #2
Hi Claus,

Do you have any control over the format (and version of word that creates)
of the word document. If you do, then you might consider using the XML
format supported by Office 2003 Professional version of Word
(WordProcessingML is the format definition). You can look here for more
information on WordProcessingML.

http://msdn.microsoft.com/library/de...HV01113631.asp

A second option is to use a third party component to access and manipulate
Word Documents. A quick search turned up this
http://www.csharp-station.com/Articles/WordReports.aspx article that touts
someone's product. I suspect there are many more.

Otherwise you are probably stuck with using the Word automation, which is
terrifyingly slow for some operations (like table manipulation) and requires
the presence of Word installed on the machine. The above referenced article
on WordReports does discuss how to access Word Automation interfaces.

Good luck.

Tom Clement
Serena Software, Inc.

"Claus - Arcolutions" <cm*@arcolutions.dk> wrote in message
news:eF**************@TK2MSFTNGP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Microsoft | last post by:
I'm trying to display a word document inside a web page, but everytime I do I get this error: Error Type: Microsoft VBScript runtime (0x800A0046) Permission denied: 'CreateObject' Does...
3
by: Greg Andora | last post by:
Hello, I've had an ASP page that worked for at the minimum for a year and now it is acting very odd and I need some help to fix it. What my page does/did is creates a Word.Application object and...
2
by: MaxiWheat | last post by:
Hi, I am using a software that uses MS Word to create PDF files. When I try to run the sample code (ASP 3.0), I get an error on this statement : Set oWord =...
4
by: Marcel | last post by:
Hi, I have VB.NET standard version. I want to access the Word object model but I constantly read the information: "To convert your VBA code, you need to create a Word document project in Visual...
0
by: ASP Developer | last post by:
For some reason when I direct my users to a word document via Response.ContentType = "application/word" the spell check is turned off. Does anyone know which smart tag needs to be mofied to have it...
13
by: kurtj | last post by:
Hello Gurus: I have a validation script (below) that is somehow messed up. If the Name field is blank, I get the alert message, then the browser window goes to a blank document with the word...
2
by: Bryan | last post by:
Hello all, Can anyone explain when one should use the "document" object and when one should use the "this" object? Also, is the "self" object the same as the "document" or "this" object?
0
by: Vinodsrvk | last post by:
I tried to execute the bellow query for the word "about" in the NARRATIVE field. This narrative field is of CLOB type with NOTNULL SELECT SEARCHID, NARRATIVE FROM S.TBLSEARCH WHERE...
11
by: arnuld | last post by:
C takes input character by character. I did not find any Standard Library function that can take a word as input. So I want to write one of my own to be used with "Self Referential Structures" of...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.