473,661 Members | 2,429 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Extracting text from a "word document"-stream

I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to
read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?
Nov 17 '05 #1
2 3180
Claus,

Why not save the contents of the stream to disk, and then read the
contents from that?

Also, I am pretty sure that the Document class in word implements the
IPersistStream interface (I can't imagine that it doesn't). However, this
is a COM interface, and it doesn't work with .NET streams, rather, it works
with the IStream interface in COM. All in all, you are better off saving
the contents of a stream to a file on disk, and then working from that.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard. caspershouse.co m

"Claus - Arcolutions" <cm*@arcolution s.dk> wrote in message
news:eF******** ******@TK2MSFTN GP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality
to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #2
Hi Claus,

Do you have any control over the format (and version of word that creates)
of the word document. If you do, then you might consider using the XML
format supported by Office 2003 Professional version of Word
(WordProcessing ML is the format definition). You can look here for more
information on WordProcessingM L.

http://msdn.microsoft.com/library/de...HV01113631.asp

A second option is to use a third party component to access and manipulate
Word Documents. A quick search turned up this
http://www.csharp-station.com/Articles/WordReports.aspx article that touts
someone's product. I suspect there are many more.

Otherwise you are probably stuck with using the Word automation, which is
terrifyingly slow for some operations (like table manipulation) and requires
the presence of Word installed on the machine. The above referenced article
on WordReports does discuss how to access Word Automation interfaces.

Good luck.

Tom Clement
Serena Software, Inc.

"Claus - Arcolutions" <cm*@arcolution s.dk> wrote in message
news:eF******** ******@TK2MSFTN GP09.phx.gbl...
I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?

Nov 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
16169
by: Microsoft | last post by:
I'm trying to display a word document inside a web page, but everytime I do I get this error: Error Type: Microsoft VBScript runtime (0x800A0046) Permission denied: 'CreateObject' Does anybody know the correct way to do this? I don't want to link to a document, but rather display a word file inside the asp page.
3
5338
by: Greg Andora | last post by:
Hello, I've had an ASP page that worked for at the minimum for a year and now it is acting very odd and I need some help to fix it. What my page does/did is creates a Word.Application object and a new document from a template. It then proceeds to fill in a bunch of the information on the word document using bookmarks and data from a database. Then it saves the document to a directory and sends the user off to go get the new document. ...
2
10234
by: MaxiWheat | last post by:
Hi, I am using a software that uses MS Word to create PDF files. When I try to run the sample code (ASP 3.0), I get an error on this statement : Set oWord = Server.CreateObject("Word.Application") That error says : The call to Server.CreateObject failed while checking permissions. Access is denied to this object.
4
1384
by: Marcel | last post by:
Hi, I have VB.NET standard version. I want to access the Word object model but I constantly read the information: "To convert your VBA code, you need to create a Word document project in Visual Studio .NET using Visual Studio Tools for the Microsoft Office System". I can't find a Word document project. Regards,
0
1638
by: ASP Developer | last post by:
For some reason when I direct my users to a word document via Response.ContentType = "application/word" the spell check is turned off. Does anyone know which smart tag needs to be mofied to have it on by default. It appears as though it might be <w:SpellingState>Clean</w:SpellingState>. However, I haven't found any web sites that discusses what alternative values of this tag might will accept. Any help would be greatly appreciated.
13
3996
by: kurtj | last post by:
Hello Gurus: I have a validation script (below) that is somehow messed up. If the Name field is blank, I get the alert message, then the browser window goes to a blank document with the word "false" on it. What the ?!?!?! To test, I commented out the 'return false;' code in the second IF block, so now if there is a value in Name then I get the alert message for Email and the page stays put.
2
1556
by: Bryan | last post by:
Hello all, Can anyone explain when one should use the "document" object and when one should use the "this" object? Also, is the "self" object the same as the "document" or "this" object?
0
2144
by: Vinodsrvk | last post by:
I tried to execute the bellow query for the word "about" in the NARRATIVE field. This narrative field is of CLOB type with NOTNULL SELECT SEARCHID, NARRATIVE FROM S.TBLSEARCH WHERE CONTAINS(NARRATIVE, 'about') >0 /
11
6783
by: arnuld | last post by:
C takes input character by character. I did not find any Standard Library function that can take a word as input. So I want to write one of my own to be used with "Self Referential Structures" of section 6.5 of K&R2. K&R2 has their own version of <getwordwhich, I think, is quite different from what I need: <getwordwill have following properties: 1.) If the word contains any number like "beauty1" or "win2e" it will
0
8341
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8851
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8754
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8542
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6181
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5650
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4177
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
1984
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1740
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.