473,387 Members | 1,553 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Extracting text from Word document (for regular expression matching)

I would be very grateful for any help with the following:

I currently have the code below. This opens a MS Word document, and
uses C#'s internal regular expressions library to find if there is a
match within this document. When I run the code I get a parser error
- I think there is an escape character in the Word doc format, or
perhaps trying to do a match with the entire document is not a good
idea.

public DataRow[] getMatches()
{
ArrayList matches = new ArrayList();

StreamReader sr = null;

foreach(DataRow dr in theData.Rows)
{
string rx = dr["Term Name"].ToString();
sr = File.OpenText(inputFilePath);

if(Regex.IsMatch(rx, sr.ReadToEnd()))
{
matches.Add(dr);
}
}

sr.Close();
return (DataRow[])matches.ToArray(typeof(DataRow));
}

Is there any way of either:

1) Extracting just the text from the word document programatically?
(I.e. I don't want all the extra stuff that MS stores)
2) Parsing it into 'words'?
3) Putting all the words into a string array?
4) All of the above

I can probably do 2, 3 and 4, but I am struggling to think of a way to
do 1.

Any help would be much appreciated...

Cheers,

Mark.
Nov 16 '05 #1
0 1619

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Richard L Rosenheim | last post by:
I have some text where I need to extract some pieces from. The text will be in a format like this: a string description color="red" type="unknown" In the above example, I would be looking to...
5
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past,...
1
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...
5
by: Casey | last post by:
Hello, Can someone give me specific code to replace text on a page using server side javascript? I need to use server-side because I need the output to be recognized in the final HTML so that...
2
by: Kevin K | last post by:
Hi, I'm having a problem with extracting text from a Word document using StreamReader. As I'm developing a web application, I do NOT want the server to make calls to Word. I want to simply...
7
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a...
3
by: Alois Treindl | last post by:
A simple XSL question from a newbie: In an xml document which I transform via xsl into html output, I have some text which I want to be suppressed. The tags looks like this <anchor_ref...
3
by: MCH | last post by:
hi there, I am working with a HTML-like text with boost:regex. For example, the following pattern might occur in my text <abc efg> <p>EFG</p 12<3> In this case, I would like to extract...
1
by: JosAH | last post by:
Greetings, Introduction This week we start building Query objects. A query can retrieve portions of text from a Library. I don't want users to build queries by themselves, because users make...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.