473,378 Members | 1,383 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Editor to clean up MS Word-generated HTML table

I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?

I tried a few freeware editors, but wasn't able to find a way to clean
it up.
Thanks,

Greg

Oct 24 '07 #1
10 9764
On 2007-10-24, Greg Lovern wrote:
>

I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?

I tried a few freeware editors, but wasn't able to find a way to clean
it up.
Use "lynx -dump" to extract the text, then mark it up in any text
editor.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
================================================== =================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Oct 24 '07 #2
Greg Lovern wrote:
I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?

First--don't!

1) In Word elect the table:
2) Convert table to text and use tabs for the table cells
3) Use Word's Search and Replace feature:
3a) Find what: ^t
Replace with: </td></td>
Replace all
3b) Find what: ^p
Replace with: </td></tr>^p<tr><td>
Replace all
4) Add to the beginning of your formal table:
<table>
<tr><td>
5) Add to end:
</table>
6) Select all and paste into your template HTML with any text editor.
Style to taste...

--
Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com
Oct 24 '07 #3
In article <4e***************************@NAXS.COM>,
"Jonathan N. Little" <lw*****@centralva.netwrote:
Greg Lovern wrote:
I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?


First--don't!
Agreed - if at all possible, avoid using Word to generate any html.
1) In Word elect the table:
2) Convert table to text and use tabs for the table cells
3) Use Word's Search and Replace feature:
3a) Find what: ^t
Replace with: </td></td>
I think you mean </td><td??

As an alternative, the OP could look at something like
Beautiful Soup:

http://www.crummy.com/software/BeautifulSoup/

Depending on the flavour of OS and tastes/talents of the
user, there's always grep of course...
Oct 24 '07 #4
Greg Lovern wrote:
I have a very large html table created by MS Word, saved as it's "Web
Page, Filtered" file type. Every html table cell has lots of
formatting tags. Most of the file size is that formatting.

Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?
I wrote this Win32 program that might work for you.
www.industrologic.com/basic/ program called xtag

Oct 24 '07 #5
Greg Lovern wrote:
Then xtag.exe crashed while I was writing this.
Whoops! Oh, well, I asked for feedback, and I got it didn't I?

If you are in a hurry you might try splitting the file into
smaller files and running them through it.

Send me your file if you want and I'll see what the problem is.
pe**@industrologic.com
Oct 24 '07 #6
On Oct 24, 3:31 pm, "Jonathan N. Little" <lws4...@centralva.net>
wrote:
Then you need to convert the table to text formatted at tabs.
"Table Convert Table to Text..." with "Separate text with Tabs"
Thanks, but because there are carriage returns (thousands of them)
within table cells, converting to text, then later trying to convert
back to a table, mangles it.

I found that nvu will remove some of the formatting. After doing that
to the small file, I was able to completely clean it manually in
notepad. I'm going to try again with the large file. I had tried that
with the large file before but it seemed like hardly any of the
strings to delete were duplicated. This time I'll try running it
through nvu's cleanup first.

Once clean, I'll work with them going forward in nvu. I found that nvu
adds absolutely no formatting to the table, at least after removing
some formatting with its settings.

Time to catch the bus now; I'll be back on this tomorrow morning.

Thanks to all for the help.
Thanks,

Greg

Oct 24 '07 #7
Greg Lovern wrote:
On Oct 24, 3:31 pm, "Jonathan N. Little" <lws4...@centralva.net>
wrote:
>Then you need to convert the table to text formatted at tabs.
"Table Convert Table to Text..." with "Separate text with Tabs"

Thanks, but because there are carriage returns (thousands of them)
within table cells, converting to text, then later trying to convert
back to a table, mangles it.
Sounds like the original source data is far too large for a single
webpage. You should break it up into smaller, logical, "digestible" pages.
--
Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com
Oct 24 '07 #8
On Oct 23, 11:37 pm, Greg Lovern <gr...@gregl.netwrote:
Is there a free or inexpensive editor that can quickly remove all
formatting to minimize the file size?
Is there anything out there that will do the html equivalent of
Notepad -- remove all formatting, leaving only the bare html table and
it's bare text contents?
Thanks,

Greg

Oct 25 '07 #9
On Nov 7, 10:15 pm, Rob Hick <rsjh...@googlemail.comwrote:
[...]
A very nice little utility. I used it for something entirely
different (to clean SPSS HTML output) and it worked great. I did have
to run it in firefox though because in IE just pasting the text in
made IE crash (there was a lot of text ~1mb)!

So thanks RobG
Glad it got some use. :-)

--
Rob

Nov 8 '07 #10
On 24 oct, 18:09, Greg Lovern <gr...@gregl.netwrote:
On Oct 24, 3:31 pm, "Jonathan N. Little" <lws4...@centralva.net>
wrote:
Then you need to convert the table to text formatted at tabs.
"Table Convert Table to Text..." with "Separate text with Tabs"

Thanks, but because there are carriage returns (thousands of them)
within table cells, converting to text, then later trying to convert
back to a table, mangles it.

I found that nvu will remove some of the formatting. After doing that
to the small file, I was able to completely clean it manually in
notepad. I'm going to try again with the large file.
Greg,

Nvu 1.0 and KompoZer 0.77 won't do miracles. HTML Tidy won't either. I
know lots of advanced text editors (including open source ones, multi-
platform ones, free, etc) which can "find and replace" a string of
text into whatever you want, including character controls like
carriage returns.

http://en.wikipedia.org/wiki/Text_ed...ch_and_replace

Best is to avoid using FrontPage and MS-Word HTML exporting feature.

Regards, Gérard
P.S. Note that KompoZer 0.77 is more advanced, more recent with more
bug fixes in comparison to Nvu 1.0

Nov 9 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: mail | last post by:
How do i save some text thats is stored in an array... as a Word file ? Any help appreciated :)
8
by: Preston Crawford | last post by:
I'm working on a site where one of the requirements is the ability to do a newsletter. This newsletter would either be composed with or pasted into some kind of WYSIWYG web-based editor OR they'd...
2
by: Hatem KNANI | last post by:
Hi, I want to find a component that I can integrate in my application and that is an XML Editor which is WYSIWYG Word processor-like !! So that the user can easily create or modify XML document...
2
by: word9smith | last post by:
In other messages I have read that Word Perfect 10 can be us­ed for sgml/xml editing. Is this still true for Word Perfect 11?
9
by: Stud Muffin | last post by:
Hey Basically, I'm trying to take objects created in microsoft word using equation editor (for creating clean looking math/physics equations) and putting them into some sort of webpage format....
2
by: Ondine | last post by:
Hi I hope someone might be able to help me with this, because having paid for support from Microsoft I'm still getting nowhere. I don't know if this is due to a file corruption or virus (I...
4
by: groast | last post by:
Hi guys, I'm trying to design a word editor, something similar to "Microsoft Word". This is my first time designing with Visual C#, so not really good will all the features. I wonder how to...
0
by: VJ | last post by:
Is there a way I can create a Windows forms just as the Outlook Word Email Editor?. I know we can do this IE control where I can display document files. I like the Outlook email Editor way which...
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
0
by: =?Utf-8?B?QWJieQ==?= | last post by:
i have a project where i need to transfer the contents from word files to database. The documents are divided into groups which have different formats...for example some have vector images embeded...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.