473,467 Members | 1,300 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Alternatively, I seek a HTML to RTF Function

HTML to RTF conversion is done by the clipboard in certain circumstances.
Does anyone know of an API or possibly a Framework2 class.method that will
convert HTML to RTF...?

TIA
--
Timothy Casey GPEMC! >11950 is the nu****@fieldcraft.com.au 2email
Terms & conditions apply. See www.fieldcraft.biz/GPEMC
Discover valid interoperable web menus, IE security, TSR Control,
& the most advanced speed reading application @ www.fieldcraft.biz
Jun 23 '07 #1
2 2542

Your best bet is probably automating Word to do the conversion
for you if we are talking about a small number of large conversions
with a great deal of compatibility of features. For a web service or
similar which requires a great number of small conversions to take
place without noticable delay, this might not be the best solution.

I had to do HTML<->RTF conversion for a project (both
directions) and initially bought a $299 component to do it - one
which was supposed to support all the styles required.

Turned out it was a piece of *bleep*: It had problems with
overlapping styles, e.g. <B><I></B></I>. This is fairly typical
of such components out there. Really a bad piece of code
wrapped up in a component and sold on the web to unsuspecting
developers pressed for time.

I had to spend three weeks rolling my own converter with some
help from subcomponents out there.

For HTML->RTF I used TidyATL to convert HTML->XHTML.
Then I augmented a piece of XHTML->RTF code I found somewhere
on the internet. Finally, I fed the RTF into an instance of the
RichTextBox object and read it back again in order to clean up
some superflueous parantheses.

For RTF->HTML, I converted a (rather limited) RTF parser written
in C to .Net.

At least I think that was how I did everything. Been a while.
I only had to support font names and sizes, forecolor, backcolor,
bold, italic, strikethrough, indentation, and bulleted lists.
Not images or tables or other fancy things.

Again: Whichever component you find out there (better might have
come along in the past two years), make sure they do not choke
on overlapping styles. Also check how well they cope with certain
commonly used special characters outside the ascii range.
And color names. And and ...

Regards,

Joergen Bech

On Sun, 24 Jun 2007 01:10:12 +1000, "Number 11950 - GPEMC! Replace
number with 11950" <nu****@fieldcraft.bizwrote:
>HTML to RTF conversion is done by the clipboard in certain circumstances.
Does anyone know of an API or possibly a Framework2 class.method that will
convert HTML to RTF...?

TIA
Jun 23 '07 #2
"Joergen Bech @ post1.tele.dk>" <jbech<NOSPAMNOSPAMwrote in message
news:59********************************@4ax.com...
>
Your best bet is probably automating Word to do the conversion
for you if we are talking about a small number of large conversions
with a great deal of compatibility of features. For a web service or
similar which requires a great number of small conversions to take
place without noticable delay, this might not be the best solution.
Not so sure everyone has the Word component, and Word 2.0 doesn't so much
choke on cascading style sheets as get the runs! (Lots of unformatted plain
text interspersed by randomly recognised formatted features)
I had to do HTML<->RTF conversion for a project (both
directions) and initially bought a $299 component to do it - one
which was supposed to support all the styles required.

Turned out it was a piece of *bleep*: It had problems with
overlapping styles, e.g. <B><I></B></I>. This is fairly typical
of such components out there. Really a bad piece of code
wrapped up in a component and sold on the web to unsuspecting
developers pressed for time.
This is precisely why I need this level of control. <B></Band <I></Iare
deprecated and not particularly accessible without the overlap which is akin
to a single file trying to be shared by multiple parent directories. The
correct markup is for such overlapping format is
<STRONG><EM></EM></STRONG><EM></EMso now the markup is well formed and a
brail reader can render the formatting as well. Try to get a commercially
made algorithm to do this reliably - its not going to happen unless you or I
do it ourselves. Converting back however, is key when vital text management
algortithms are missing from the programming language.
I had to spend three weeks rolling my own converter with some
help from subcomponents out there.
Or hindrance? Three weeks sounds pretty good to me. There is a French method
for conversion but I'm not so sure it can handle compound markup (eg. CSS
combined with HTML) even when running the markup through the server emulator
algorithm. Anyway, I've got just the pages to test with, & if it passes I'll
pass it on.

If not, I guess I'll have to wade through the RTF spec and write my own
converter for the elements I'll be allowing. By restricting what elements
can be used in markup, one can simplify the process of ensuring
security-compliance, standards-compliance, accessibility, and XHTML
conversion.
For HTML->RTF I used TidyATL to convert HTML->XHTML.
Then I augmented a piece of XHTML->RTF code I found somewhere
on the internet. Finally, I fed the RTF into an instance of the
RichTextBox object and read it back again in order to clean up
some superflueous parantheses.

For RTF->HTML, I converted a (rather limited) RTF parser written
in C to .Net.

At least I think that was how I did everything. Been a while.
I only had to support font names and sizes, forecolor, backcolor,
bold, italic, strikethrough, indentation, and bulleted lists.
Not images or tables or other fancy things.
[SNIP]

RTF is multipart, so images binary streams are simply bracketed
appropriately in the file. Tables are always trickier (what is higher in the
hierarchy, columns or rows - the answer depends on the format definition!)
so this promises to be an interesting or at least challenging part of the
project - but alas one I cannot avoid!
Again: Whichever component you find out there (better might have
come along in the past two years), make sure they do not choke
on overlapping styles. Also check how well they cope with certain
commonly used special characters outside the ascii range.
And color names. And and ...
Overlapping styles won't be allowed, and simply won't be possible through
the user interface. I'll only give them access to the HTML if I can get a
(X)HTML Validator Class for .NET. As to special characters, there is another
specification I need to dig up unless .NET has a UTF object?

There is something appealing about moulding HTML and RTF into a hierarchy of
clases and sub-classes ala w3c but I'm uncertain of the benefits of such an
approach, other than intimately learning the finer points of classing and
subclassing...?

--
Timothy Casey GPEMC! >11950 is the nu****@fieldcraft.com.au 2email
Terms & conditions apply. See www.fieldcraft.biz/GPEMC
Discover valid interoperable web menus, IE security, TSR Control,
& the most advanced speed reading application @ www.fieldcraft.biz
Jun 24 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Waitman Gobble | last post by:
Hello, I am new to Python. I am having trouble with zipfile.py. On a Linux machine with python 2.4.2 I have trouble opening a zipfile. Python is complaining about the bit where it does a...
3
by: RBohannon | last post by:
I'm using Access 2000. I've written a function, blnExists(), to check if a particular value exists in the primary key field of a table. blnExists returns true if the value is in the table and...
1
by: Derrick | last post by:
I'm writing a home grown csv text file search, have sorted "id" in the first "column". Other info after that in the "row". I seek half way thru the file, get to a row boundry, determine "id" that...
3
by: Mark Denardo | last post by:
Does anyone have any good VB.NET example code that shows how to use the NOTIFY option using the mciSendString API and then handle the return value. The only examples I can find show the VB way...
33
by: Kevin Brammer | last post by:
I'm trying to use seek to check for the existence of a record before saving, so there are no duplicate entries (is there another way?). I have a "groups" table, which has GroupID Island...
59
by: Rico | last post by:
Hello, I have an application that I'm converting to Access 2003 and SQL Server 2005 Express. The application uses extensive use of DAO and the SEEK method on indexes. I'm having an issue when...
3
by: Jonnh | last post by:
help with use the methods recordset and seek in access because en excel with function database, i know but en access i need seek to table or form, thanks
3
by: Michal | last post by:
Does any one know how to create a function that works similar as "Goal Seek" in Excel? For example: var a=8; var b=6; function c(a,b) { return a*b; }
4
by: vineeta | last post by:
Hi All, I was trying to overwrite the data in a file. But seek function is not working as i expecting. My code is here: public void createfile() { FileStream fs = new...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.