473,775 Members | 2,262 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

tidy ms word output as pure xhtml without css style and font styles

Hi,

ms word should output xhtml without any css style. Tidy
(http://tidy.sourceforge.net/) helps quite a lot but leaves the css
styles like the following:

<p class="P11 c2">foo</p>
<ul class="c4">
<li class="P11 c3">
<p class="P11 c2">bar</p>
</li>
</ul>

And I do want to have:

<p>foo</p>
<ul>
<li>
<p>bar</p>
</li>
</ul>

In other word: I want all the attributes to be deleted.

Is there an option for tidy to achive this or another small app?
TIA Martin

ps.: I could do this with xslt but the input must be xml and I have not
used xslt for some years...
--
http://www.bretschneidernet.de/me/contact OpenPGP-key: 0x4EA52583
_o)(o_ Philip R. Zimmermann:
-./\\//\.- If privacy is outlawed,
_\_VV_/_ only outlaws will have privacy.
Jul 10 '07 #1
3 3131
In article <46************ **********@news spool2.arcor-online.net>,
Martin Bretschneider <sp**@bretschne idernet.dewrote :
[...] Tidy
(http://tidy.sourceforge.net/) helps quite a lot but leaves the css
styles like the following:

<p class="P11 c2">foo</p[...]

And I do want to have:

<p>foo</p[...]
Have a look at Mihai Sucan's ReTidy:
<http://www.robodesign. ro/mihai/my-projects/retidy>. I haven't had a
chance to test it myself, but I expect it can do what you want:
<http://www.robodesign. ro/mihai/my-projects/retidy#dom_stri p_attrs>

--
Sander Tekelenburg
The Web Repair Initiative: <http://webrepair.org/>
Jul 10 '07 #2
Scripsit Adrienne Boswell:
Say Word does something like:
<p style="font-weight:bold">Bo ld</p>
<p style="font-style:italic">I talic</p>

HTML-Tidy will do:

<p class="c1">Bold </p>
<p class="c2">Ital ic</p>
So is that a problem? If you wish to preserve the formatting, you use the
style sheet generated (as such or as modified). If you don't, you drop the
style sheet. I don't see why the class attributes would be a problem. On the
contrary, they might turn up to be part of a solution, if you later decide
that preserving some of the formatting is a good idea, after all - then you
just write some nice style sheet using the "handles" (class attributes) that
you already have in the markup.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jul 11 '07 #3
Gazing into my crystal ball I observed "Jukka K. Korpela"
<jk******@cs.tu t.fiwriting in
news:s8******** ************@re ader1.news.saun alahti.fi:
Scripsit Adrienne Boswell:
Say Word does something like:
<p style="font-weight:bold">Bo ld</p>
<p style="font-style:italic">I talic</p>

HTML-Tidy will do:

<p class="c1">Bold </p>
<p class="c2">Ital ic</p>

So is that a problem? If you wish to preserve the formatting, you use
the style sheet generated (as such or as modified). If you don't, you
drop the style sheet. I don't see why the class attributes would be a
problem. On the contrary, they might turn up to be part of a solution,
if you later decide that preserving some of the formatting is a good
idea, after all - then you just write some nice style sheet using the
"handles" (class attributes) that you already have in the markup.
It's a problem if you want no style attributes at all. So you can leave
the style out, and then it's a empty class. Okay. It's still there, and
it just doesn't sit well with me. That's just me.

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Jul 12 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
7420
by: Clifford W. Racz | last post by:
Has anyone solved the issue of translating lists in Word 2003 (WordML) into xHTML? I have been trying to get the nested table code for my XSLT to work for a while now, with no way to get the collection that I need. To begin, I am using xsltproc that conmes with Cygwin as my processor. I have no particular affinity to this processor except that it is open source and standards compliant. I don't like M$, but if using a M$ processing...
50
3865
by: Christopher Benson-Manica | last post by:
(if this isn't the place for XHTML, I'd appreciate a redirect) According to the w3's web site, some non-HTML 4 browsers won't properly interpret non-minimized boolean attributes, i.e. <option value="blah" selected="selected">...</option> ^^^^^^^^^^^ Can anyone tell me what browsers those might be? I'm in the process of converting some marginal HTML to something resembling XHTML, and
4
4128
by: Alexander Bolotnov | last post by:
I am trying to read xhtml spec and use one of its examples about css2 in xhtml. The example on the PDF paper with internal styles defenition works just fine. When I try to use an external file with css defs it just does not work unless I use <link rel...> in the <header>. However, this is not how the xhtml examples link style sheets and I would like to know the reason why it is not working for me. Here is the html file:
12
17114
by: Stefan Weiss | last post by:
Hi. (this is somewhat similar to yesterday's thread about empty links) I noticed that Tidy issues warnings whenever it encounters empty tags, and strips those tags if cleanup was requested. This is okay in some cases (such as <tbody>), but problematic for other tags (such as <option>). Some tags (td, th, ...) do not produce warnings when they are empty.
0
2775
by: Maileen | last post by:
Hi, I try to add/modify some style in Word 2002 and 2003 using VB, but I have such error : System.runtime.interopServces.COMException(0x800A1735) : Given Items does not exist. at word.Styles.get_item(Object& Index) i don't have any number of code line where is the mistake.. i only know that my code is the following one :
2
2231
by: briano | last post by:
Is there a library that allows editing Word documents in the browser that is browser independent? Please forgive this question. Just trying to cover all bases. So far what I see is HTML or RTF editing in the browser. Need the ".DOC" standard so no conversion will have to take place when saving or retrieving this to or from the server. TIA -
4
2836
by: Schraalhans Keukenmeester | last post by:
I recently discovered the value of tidy for my html adventures. Nice little app. Only one thing is becoming a bit of a drag. If I use tidy to clean up my code, it inserts the following in every <style> segment: e.g. before: <style> p {
2
5098
by: Ola K | last post by:
Hi guys, I wrote a script that works *almost* perfectly, and this lack of perfection simply puzzles me. I simply cannot point the whys, so any help on it will be appreciated. I paste it all here, the string at the beginning explains what it does: '''A script for MS Word which does the following: 1) Assigns all Hebrew italic characters "Italic" character style. 2) Assigns all Hebrew bold characters "Bold" character style. 2) Assign all...
1
4102
by: Darsin | last post by:
What i am doing is to pull the data from a CMS and import it to Word 2007 Beta and i also have to export the data from Word 2007 Beta back to that CMS. We have with us two Web Services of the CMS. The Web Services are explained as follows: IMPORT WEB SERVICE:
0
9622
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10270
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10109
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10051
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8940
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7464
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6718
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5361
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3611
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.