471,570 Members | 902 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,570 software developers and data experts.

Parsin an RTF file

Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James

Jan 3 '06 #1
3 2937

Hi James,

Sounds like a job for the regex class.
I have used it to parse rtf files into html, concatenate multiple rtf
files to a single rtf file and translate multi page rtf files for printing.
My approach was to split the rtf file into three parts.
. Header Info
. Font Table
. Text Body
Then parse the text body into an array of text & control words.
Because control words start with \ ,you can then re-emit the text in a
different format (e.g. xhtml).

I did this using pcre.dll (perl regex lib) in another language (not c#).

Mark

<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James

Jan 3 '06 #2
Hi,

You need to read and understand the rtf format, IIRC it's a markup format,
if so you will need to parse the file.
a RegEx mayl help you, but I think that you may need something different,
like a parser ( a la LEX ).

First, do a search in google as this is probably something that has been
asked before.
--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James

Jan 3 '06 #3

Hi James & Ignacio,

The Rich Text Format (RTF) Version 1.5 Specification is freely
available.
It runs to 88 pages and may be found at
http://www.biblioscape.com/rtf15_spec.htm#Heading3

Mark

"Ignacio Machin ( .NET/ C# MVP )" <ignacio.machin AT dot.state.fl.us> wrote
in message news:%2****************@TK2MSFTNGP12.phx.gbl...
Hi,

You need to read and understand the rtf format, IIRC it's a markup format,
if so you will need to parse the file.
a RegEx mayl help you, but I think that you may need something different,
like a parser ( a la LEX ).

First, do a search in google as this is probably something that has been
asked before.
--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi All

My current task is go through an rtf file, and extract text based on
its format. ie Find and extract bold text as 'heading', text between
this and next bold text is 'content', repeat until the end of the
document.

Is there a way to do this? I have looked at the rtf source (very
complicated) and RTFTextBox, and haven't worked out how to do it.

Does anybody have any hints on this one?

Thanks in advance.

James


Jan 3 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by matt | last post: by
7 posts views Thread by Joseph | last post: by
3 posts views Thread by StGo | last post: by
3 posts views Thread by Shapper | last post: by
reply views Thread by XIAOLAOHU | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by Vinnie | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by lumer26 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.