473,884 Members | 2,313 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Editor to clean up MS Word-generated HTML table

I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?

I tried a few freeware editors, but wasn't able to find a way to clean
it up.
Thanks,

Greg

Oct 24 '07 #1
10 9838
On 2007-10-24, Greg Lovern wrote:
>

I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?

I tried a few freeware editors, but wasn't able to find a way to clean
it up.
Use "lynx -dump" to extract the text, then mark it up in any text
editor.

--
Chris F.A. Johnson <http://cfaj.freeshell. org>
=============== =============== =============== =============== =======
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Oct 24 '07 #2
Greg Lovern wrote:
I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?

First--don't!

1) In Word elect the table:
2) Convert table to text and use tabs for the table cells
3) Use Word's Search and Replace feature:
3a) Find what: ^t
Replace with: </td></td>
Replace all
3b) Find what: ^p
Replace with: </td></tr>^p<tr><td>
Replace all
4) Add to the beginning of your formal table:
<table>
<tr><td>
5) Add to end:
</table>
6) Select all and paste into your template HTML with any text editor.
Style to taste...

--
Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com
Oct 24 '07 #3
In article <4e************ *************** @NAXS.COM>,
"Jonathan N. Little" <lw*****@centra lva.netwrote:
Greg Lovern wrote:
I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?


First--don't!
Agreed - if at all possible, avoid using Word to generate any html.
1) In Word elect the table:
2) Convert table to text and use tabs for the table cells
3) Use Word's Search and Replace feature:
3a) Find what: ^t
Replace with: </td></td>
I think you mean </td><td??

As an alternative, the OP could look at something like
Beautiful Soup:

http://www.crummy.com/software/BeautifulSoup/

Depending on the flavour of OS and tastes/talents of the
user, there's always grep of course...
Oct 24 '07 #4
Greg Lovern wrote:
I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?
I wrote this Win32 program that might work for you.
www.industrologic.com/basic/ program called xtag

Oct 24 '07 #5
Greg Lovern wrote:
Then xtag.exe crashed while I was writing this.
Whoops! Oh, well, I asked for feedback, and I got it didn't I?

If you are in a hurry you might try splitting the file into
smaller files and running them through it.

Send me your file if you want and I'll see what the problem is.
pe**@industrolo gic.com
Oct 24 '07 #6
On Oct 24, 3:31 pm, "Jonathan N. Little" <lws4...@centra lva.net>
wrote:
Then you need to convert the table to text formatted at tabs.
"Table Convert Table to Text..." with "Separate text with Tabs"
Thanks, but because there are carriage returns (thousands of them)
within table cells, converting to text, then later trying to convert
back to a table, mangles it.

I found that nvu will remove some of the formatting. After doing that
to the small file, I was able to completely clean it manually in
notepad. I'm going to try again with the large file. I had tried that
with the large file before but it seemed like hardly any of the
strings to delete were duplicated. This time I'll try running it
through nvu's cleanup first.

Once clean, I'll work with them going forward in nvu. I found that nvu
adds absolutely no formatting to the table, at least after removing
some formatting with its settings.

Time to catch the bus now; I'll be back on this tomorrow morning.

Thanks to all for the help.
Thanks,

Greg

Oct 24 '07 #7
Greg Lovern wrote:
On Oct 24, 3:31 pm, "Jonathan N. Little" <lws4...@centra lva.net>
wrote:
>Then you need to convert the table to text formatted at tabs.
"Table Convert Table to Text..." with "Separate text with Tabs"

Thanks, but because there are carriage returns (thousands of them)
within table cells, converting to text, then later trying to convert
back to a table, mangles it.
Sounds like the original source data is far too large for a single
webpage. You should break it up into smaller, logical, "digestible " pages.
--
Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com
Oct 24 '07 #8
On Oct 23, 11:37 pm, Greg Lovern <gr...@gregl.ne twrote:
Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?
Is there anything out there that will do the html equivalent of
Notepad -- remove all formatting, leaving only the bare html table and
it's bare text contents?
Thanks,

Greg

Oct 25 '07 #9
On Nov 7, 10:15 pm, Rob Hick <rsjh...@google mail.comwrote:
[...]
A very nice little utility. I used it for something entirely
different (to clean SPSS HTML output) and it worked great. I did have
to run it in firefox though because in IE just pasting the text in
made IE crash (there was a lot of text ~1mb)!

So thanks RobG
Glad it got some use. :-)

--
Rob

Nov 8 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2042
by: mail | last post by:
How do i save some text thats is stored in an array... as a Word file ? Any help appreciated :)
8
2521
by: Preston Crawford | last post by:
I'm working on a site where one of the requirements is the ability to do a newsletter. This newsletter would either be composed with or pasted into some kind of WYSIWYG web-based editor OR they'd upload a word doc and I'd process it. I've done similar things with ASP in the past, but not with PHP. Could anyone tell me which of the above ways is going to be more perilous before I embark? I get a feeling the word doc one will be, but that's...
2
4455
by: Hatem KNANI | last post by:
Hi, I want to find a component that I can integrate in my application and that is an XML Editor which is WYSIWYG Word processor-like !! So that the user can easily create or modify XML document based on given DTD. thanks Hatem
2
1709
by: word9smith | last post by:
In other messages I have read that Word Perfect 10 can be us≠ed for sgml/xml editing. Is this still true for Word Perfect 11?
9
8935
by: Stud Muffin | last post by:
Hey Basically, I'm trying to take objects created in microsoft word using equation editor (for creating clean looking math/physics equations) and putting them into some sort of webpage format. But they come out grossly unalligned and ugly when I try to directly copy and paste into microsoft frontpage 2000. Few things I could do is place them directly using x/y coord (which i don't know how to do), or just taking screenshots and use...
2
1369
by: Ondine | last post by:
Hi I hope someone might be able to help me with this, because having paid for support from Microsoft I'm still getting nowhere. I don't know if this is due to a file corruption or virus (I have no viruses now), but I cannot get in to the VB Editor via any Office (2000) program. In Access, when trying to open a module the message is "Unexpected error; quitting". In Excel the error message is "Out of memory". In
4
6800
by: groast | last post by:
Hi guys, I'm trying to design a word editor, something similar to "Microsoft Word". This is my first time designing with Visual C#, so not really good will all the features. I wonder how to actually create a work space ( which is set to the paper size chosen by the user) and able to handle automatically for multiple pages ( changes to next page when the first one is used). I found this very confusing since I'm playing with pixel in...
0
787
by: VJ | last post by:
Is there a way I can create a Windows forms just as the Outlook Word Email Editor?. I know we can do this IE control where I can display document files. I like the Outlook email Editor way which seems more neat and clean. VJ
232
13421
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first set of examples, after decoding the HTML FORM contents, merely verifies the text within a field to make sure it is a valid representation of an integer, without any junk thrown in, i.e. it must satisfy the regular expression: ^ *?+ *$ If the...
0
901
by: =?Utf-8?B?QWJieQ==?= | last post by:
i have a project where i need to transfer the contents from word files to database. The documents are divided into groups which have different formats...for example some have vector images embeded in them, some have the mathematical equations build using the equation editor in word. Though i can deal with the documents containing no images or equations by opening the word file and reading the values and then storing in the database. My...
0
9953
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9799
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11167
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10768
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10868
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10422
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5808
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
4231
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3242
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.