473,781 Members | 2,732 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Extracting text from Word document (for regular expression matching)

I would be very grateful for any help with the following:

I currently have the code below. This opens a MS Word document, and
uses C#'s internal regular expressions library to find if there is a
match within this document. When I run the code I get a parser error
- I think there is an escape character in the Word doc format, or
perhaps trying to do a match with the entire document is not a good
idea.

public DataRow[] getMatches()
{
ArrayList matches = new ArrayList();

StreamReader sr = null;

foreach(DataRow dr in theData.Rows)
{
string rx = dr["Term Name"].ToString();
sr = File.OpenText(i nputFilePath);

if(Regex.IsMatc h(rx, sr.ReadToEnd()) )
{
matches.Add(dr) ;
}
}

sr.Close();
return (DataRow[])matches.ToArra y(typeof(DataRo w));
}

Is there any way of either:

1) Extracting just the text from the word document programatically ?
(I.e. I don't want all the extra stuff that MS stores)
2) Parsing it into 'words'?
3) Putting all the words into a string array?
4) All of the above

I can probably do 2, 3 and 4, but I am struggling to think of a way to
do 1.

Any help would be much appreciated...

Cheers,

Mark.
Nov 16 '05 #1
0 1637

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1618
by: Richard L Rosenheim | last post by:
I have some text where I need to extract some pieces from. The text will be in a format like this: a string description color="red" type="unknown" In the above example, I would be looking to extract the word "red". There's couple of ways I could approach the problem. I could use IndexOf to search for the string 'color=' and then extract the value using the Substr method. Or, I could use a regular expression like:
5
2956
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past, I have used individual entry fields for each variable. I would now like to use text area boxes to simplify the data entry (this way, data can be produced by another program--FORTRAN, "C", etc.--but analyzed online, so long as it is first...
1
2806
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been assigned the task of extracting content from a given HTML page. If anyone's familiar with the Yahoo! Store order confirmation screen, I need to be able to grab the total amount from the table to the right-hand side. (Sample File:
5
4787
by: Casey | last post by:
Hello, Can someone give me specific code to replace text on a page using server side javascript? I need to use server-side because I need the output to be recognized in the final HTML so that google can index it. Here is a specific example of what I want to do: <div id=SomeText> Here is some text. I went to the baseball game </div>
2
2503
by: Kevin K | last post by:
Hi, I'm having a problem with extracting text from a Word document using StreamReader. As I'm developing a web application, I do NOT want the server to make calls to Word. I want to simply open the Word document via StreamReader and extract the text. Here's the problem, the users insist on leaving the "Track Changes" features on. Because of this, the raw text portion of the file contains the change history. I don't want the...
7
2896
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a given word. In example from:
3
5169
by: Alois Treindl | last post by:
A simple XSL question from a newbie: In an xml document which I transform via xsl into html output, I have some text which I want to be suppressed. The tags looks like this <anchor_ref name="#B4">I. Introduction - page 4 </anchor_ref> <anchor_ref name="#B4">II. Childhood - page 24 </anchor_ref> <anchor_ref name="#B4">I. Later - page 42 </anchor_ref>
3
2811
by: MCH | last post by:
hi there, I am working with a HTML-like text with boost:regex. For example, the following pattern might occur in my text <abc efg> <p>EFG</p 12<3> In this case, I would like to extract everything between and replace with <pre>, with </pre>. Meanwhile, everything outside should be unchaged except that < is
1
4437
by: JosAH | last post by:
Greetings, Introduction This week we start building Query objects. A query can retrieve portions of text from a Library. I don't want users to build queries by themselves, because users make mistakes. Instead, the Library hands out queries to the user given a simple query String. This is how the library does it:
0
9639
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10143
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10076
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9939
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6729
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5375
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5507
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4040
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3633
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.