473,218 Members | 1,809 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,218 software developers and data experts.

Parsin an RTF file

Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James

Jan 3 '06 #1
3 2978

Hi James,

Sounds like a job for the regex class.
I have used it to parse rtf files into html, concatenate multiple rtf
files to a single rtf file and translate multi page rtf files for printing.
My approach was to split the rtf file into three parts.
. Header Info
. Font Table
. Text Body
Then parse the text body into an array of text & control words.
Because control words start with \ ,you can then re-emit the text in a
different format (e.g. xhtml).

I did this using pcre.dll (perl regex lib) in another language (not c#).

Mark

<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James

Jan 3 '06 #2
Hi,

You need to read and understand the rtf format, IIRC it's a markup format,
if so you will need to parse the file.
a RegEx mayl help you, but I think that you may need something different,
like a parser ( a la LEX ).

First, do a search in google as this is probably something that has been
asked before.
--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James

Jan 3 '06 #3

Hi James & Ignacio,

The Rich Text Format (RTF) Version 1.5 Specification is freely
available.
It runs to 88 pages and may be found at
http://www.biblioscape.com/rtf15_spec.htm#Heading3

Mark

"Ignacio Machin ( .NET/ C# MVP )" <ignacio.machin AT dot.state.fl.us> wrote
in message news:%2****************@TK2MSFTNGP12.phx.gbl...
Hi,

You need to read and understand the rtf format, IIRC it's a markup format,
if so you will need to parse the file.
a RegEx mayl help you, but I think that you may need something different,
like a parser ( a la LEX ).

First, do a search in google as this is probably something that has been
asked before.
--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James


Jan 3 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: matt | last post by:
I have compiled some code, some written by me, some compiled from various sources online, and basically i've got a very simple flat file photo gallery. An upload form, to upload the photos and give...
5
by: Dave Smithz | last post by:
Hi There, I have a PHP script that sends an email with attachment and works great when provided the path to the file to send. However this file needs to be on the same server as the script. ...
7
by: Joseph | last post by:
Hi, I'm having bit of questions on recursive pointer. I have following code that supports upto 8K files but when i do a file like 12K i get a segment fault. I Know it is in this line of code. ...
3
by: StGo | last post by:
How can i read/write file's custom attributs(like subject,author...) in C#??? Thanks :))
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
13
by: Sky Sigal | last post by:
I have created an IHttpHandler that waits for uploads as attachments for a webmail interface, and saves it to a directory that is defined in config.xml. My question is the following: assuming...
1
by: Roy | last post by:
Hi, I have a problem that I have been working with for a while. I need to be able from server side (asp.net) to detect that the file i'm streaming down to the client is saved...
3
by: Shapper | last post by:
Hello, I created a script to upload a file. To determine the file type I am using userPostedFile.ContentType. For example, for a png image I get "image/png". My questions are: 1. Where can...
2
by: bombay59 | last post by:
I have a text file with the fields such as as follows: "name", "age", "profession", "street", "city", "statezip" For example: "joe", "24", "nurse", "1234 Coffee Lane", "Fremont", "CA94555" ...
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.