468,290 Members | 2,030 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,290 developers. It's quick & easy.

Extracting text from a "word document"-stream

I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to
read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?
Nov 17 '05 #1
2 2899
Claus,

Why not save the contents of the stream to disk, and then read the
contents from that?

Also, I am pretty sure that the Document class in word implements the
IPersistStream interface (I can't imagine that it doesn't). However, this
is a COM interface, and it doesn't work with .NET streams, rather, it works
with the IStream interface in COM. All in all, you are better off saving
the contents of a stream to a file on disk, and then working from that.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Claus - Arcolutions" <cm*@arcolutions.dk> wrote in message
news:eF**************@TK2MSFTNGP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality
to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #2
Hi Claus,

Do you have any control over the format (and version of word that creates)
of the word document. If you do, then you might consider using the XML
format supported by Office 2003 Professional version of Word
(WordProcessingML is the format definition). You can look here for more
information on WordProcessingML.

http://msdn.microsoft.com/library/de...HV01113631.asp

A second option is to use a third party component to access and manipulate
Word Documents. A quick search turned up this
http://www.csharp-station.com/Articles/WordReports.aspx article that touts
someone's product. I suspect there are many more.

Otherwise you are probably stuck with using the Word automation, which is
terrifyingly slow for some operations (like table manipulation) and requires
the presence of Word installed on the machine. The above referenced article
on WordReports does discuss how to access Word Automation interfaces.

Good luck.

Tom Clement
Serena Software, Inc.

"Claus - Arcolutions" <cm*@arcolutions.dk> wrote in message
news:eF**************@TK2MSFTNGP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Microsoft | last post: by
11 posts views Thread by arnuld | last post: by
reply views Thread by NPC403 | last post: by
2 posts views Thread by MrBee | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.