473,659 Members | 2,839 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Remove Microsoft Word formating (in html doc) - regular expression

I have an html document created through MS Word (save as html).

I would like to find a regular expression that can be used to remove all of
the formatting. Any help would be greatly appreciated.
thx
dave
Jul 21 '05 #1
1 2145
A single expression? That would be pretty darned difficult.

Perhaps a series of transformations , where each one applied an expression
and returned a slightly "better" stream of text than the one before it.
After going through them in sequence, you may get there.

In the past, I've seen folks go at it the other way: to write a simple
engine that would evaluate the stream as though it was a browser, allowing
the simple tags, and replacing the complex ones with simpler ones. I dont'
know if that was better or not, but it worked.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
"dave" <da**@discussio ns.microsoft.co m> wrote in message
news:BD******** *************** ***********@mic rosoft.com...
I have an html document created through MS Word (save as html).

I would like to find a regular expression that can be used to remove all
of
the formatting. Any help would be greatly appreciated.
thx
dave

Jul 21 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
1694
by: someone | last post by:
hi A friend just sent me a text translation in norwegian, that she saved with WORD 9, as an html file It's loaded with Microsoft code like this : <p class=MsoNormal><span style='font-size:10.0pt;mso-bidi-font-size:7.5pt; font-family:"Courier New"'>send dine opplevelser og tanker om fred til
4
5149
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following testMyColumn{1, 100}Test Basically I want the regular expression to add the word test infront of the
8
7316
by: Rajeev Soni | last post by:
Hi I am looking for the regular expression for validating the allowed file types to upload like files like "zip,pdf,doc,rtf,gif,jpg,png,txt"; and the expression should not be case sensitive like it must match ZIP | zip | zIp..... regards rajeev
0
2172
by: Nitin | last post by:
How can I create and use Tables in Microsoft word document through VB .net? I've been using the following code for printing an envelope using paragraphs. Tables have to be used for better formatting. ---------------------------------------------------------------------- Dim w As New Microsoft.Office.Interop.Word.Application Dim d As Microsoft.Office.Interop.Word.Document = w.Documents.Add
1
332
by: dave | last post by:
I have an html document created through MS Word (save as html). I would like to find a regular expression that can be used to remove all of the formatting. Any help would be greatly appreciated. thx dave
6
2287
by: Ludwig | last post by:
Hi, i'm using the regular expression \b\w to find the beginning of a word, in my C# application. If the word is 'public', for example, it works. However, if the word is '<public', it does not work: it seems that < is not a valid character, so the beginning of the word starts at theletter 'p' instead of '<'. Because I'm not an expert in regular expressions, maybe someone of you guys can help me? I need the correct regex to find the...
25
5147
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access to the expression (not the matches) at runtime? Thanks, Mike
1
2030
by: Steve B. | last post by:
Hi, I'm building a web site that can render html from various user input. The problem is that the html cannot be trusted, so I need to ensure it does not contain script attack injection. That's why I'd like to provide a set of allowed tag and to remove other ones. I think about regular expression. However, I was able to find some regex samples that remove a set a untrusted tags (scripts, iframe, etc), but I'd
0
964
by: Umakanth Madich | last post by:
I want to edit the existing word processing ML doc. But I dont know how they have created that doc. Please see the attachment for reference. How to ctreate it from scratch?
0
8332
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8851
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8746
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8525
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7356
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6179
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4335
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2750
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1975
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.