472,779 Members | 1,925 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,779 software developers and data experts.

Extracting text from a "word document"-stream

I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to
read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?
Nov 17 '05 #1
2 3104
Claus,

Why not save the contents of the stream to disk, and then read the
contents from that?

Also, I am pretty sure that the Document class in word implements the
IPersistStream interface (I can't imagine that it doesn't). However, this
is a COM interface, and it doesn't work with .NET streams, rather, it works
with the IStream interface in COM. All in all, you are better off saving
the contents of a stream to a file on disk, and then working from that.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Claus - Arcolutions" <cm*@arcolutions.dk> wrote in message
news:eF**************@TK2MSFTNGP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality
to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #2
Hi Claus,

Do you have any control over the format (and version of word that creates)
of the word document. If you do, then you might consider using the XML
format supported by Office 2003 Professional version of Word
(WordProcessingML is the format definition). You can look here for more
information on WordProcessingML.

http://msdn.microsoft.com/library/de...HV01113631.asp

A second option is to use a third party component to access and manipulate
Word Documents. A quick search turned up this
http://www.csharp-station.com/Articles/WordReports.aspx article that touts
someone's product. I suspect there are many more.

Otherwise you are probably stuck with using the Word automation, which is
terrifyingly slow for some operations (like table manipulation) and requires
the presence of Word installed on the machine. The above referenced article
on WordReports does discuss how to access Word Automation interfaces.

Good luck.

Tom Clement
Serena Software, Inc.

"Claus - Arcolutions" <cm*@arcolutions.dk> wrote in message
news:eF**************@TK2MSFTNGP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Microsoft | last post by:
I'm trying to display a word document inside a web page, but everytime I do I get this error: Error Type: Microsoft VBScript runtime (0x800A0046) Permission denied: 'CreateObject' Does...
3
by: Greg Andora | last post by:
Hello, I've had an ASP page that worked for at the minimum for a year and now it is acting very odd and I need some help to fix it. What my page does/did is creates a Word.Application object and...
2
by: MaxiWheat | last post by:
Hi, I am using a software that uses MS Word to create PDF files. When I try to run the sample code (ASP 3.0), I get an error on this statement : Set oWord =...
4
by: Marcel | last post by:
Hi, I have VB.NET standard version. I want to access the Word object model but I constantly read the information: "To convert your VBA code, you need to create a Word document project in Visual...
0
by: ASP Developer | last post by:
For some reason when I direct my users to a word document via Response.ContentType = "application/word" the spell check is turned off. Does anyone know which smart tag needs to be mofied to have it...
13
by: kurtj | last post by:
Hello Gurus: I have a validation script (below) that is somehow messed up. If the Name field is blank, I get the alert message, then the browser window goes to a blank document with the word...
2
by: Bryan | last post by:
Hello all, Can anyone explain when one should use the "document" object and when one should use the "this" object? Also, is the "self" object the same as the "document" or "this" object?
0
by: Vinodsrvk | last post by:
I tried to execute the bellow query for the word "about" in the NARRATIVE field. This narrative field is of CLOB type with NOTNULL SELECT SEARCHID, NARRATIVE FROM S.TBLSEARCH WHERE...
11
by: arnuld | last post by:
C takes input character by character. I did not find any Standard Library function that can take a word as input. So I want to write one of my own to be used with "Self Referential Structures" of...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.