473,396 Members | 1,756 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

TSV to HTML

I was wondering if anyone here on the group could point me in a
direction that would expllaing how to use python to convert a tsv file
to html. I have been searching for a resource but have only seen
information on dealing with converting csv to tsv. Specifically I want
to take the values and insert them into an html table.

I have been trying to figure it out myself, and in essence, this is
what I have come up with. Am I on the right track? I really have the
feeling that I am re-inventing the wheel here.

1) in the code define a css
2) use a regex to extract the info between tabs
3) wrap the values in the appropriate tags and insert into table.
4) write the .html file

Thanks again for your patience,
Brian

May 31 '06 #1
8 3957
> I was wondering if anyone here on the group could point me
in a direction that would expllaing how to use python to
convert a tsv file to html. I have been searching for a
resource but have only seen information on dealing with
converting csv to tsv. Specifically I want to take the
values and insert them into an html table.

I have been trying to figure it out myself, and in
essence, this is what I have come up with. Am I on the
right track? I really have the feeling that I am
re-inventing the wheel here.

1) in the code define a css
2) use a regex to extract the info between tabs
3) wrap the values in the appropriate tags and insert into
table.
4) write the .html file


Sounds like you just want to do something like

print "<table>"
for line in file("in.tsv"):
print "<tr>"
items = line.split("\t")
for item in items:
print "<td>%s</td>" % item
print "</tr>"
print "</table>"

It gets a little more complex if you need to clean each item
for HTML entities/scripts/etc...but that's usually just a
function that you'd wrap around the item:

print "<td>%s</td>" % escapeEntity(item)

using whatever "escapeEntity" function you have on hand.
E.g.

from xml.sax.saxutils import escape
:
:
print "<td>%s</td>" % escape(item)

It doesn't gracefully attempt to define headers using
<thead>, <tbody>, and <th> sorts of rows, but a little
toying should solve that.

-tim

May 31 '06 #2
> 1) in the code define a css
2) use a regex to extract the info between tabs
In place of this, you might want to look at
http://effbot.org/librarybook/csv.htm
Around the middle of that page you'll see how to use a delimiter other
than a comma
3) wrap the values in the appropriate tags and insert into table. 4)
write the .html file

Thanks again for your patience,
Brian


May 31 '06 #3
Brian wrote:
I was wondering if anyone here on the group could point me in a
direction that would expllaing how to use python to convert a tsv file
to html. I have been searching for a resource but have only seen
information on dealing with converting csv to tsv. Specifically I want
to take the values and insert them into an html table.


import csv
from xml.sax.saxutils import escape

def tsv_to_html(input_file, output_file):
output_file.write('<table><tbody>\n')
for row in csv.reader(input_file, 'excel-tab'):
output_file.write('<tr>')
for col in row:
output_file.write('<td>%s</td>' % escape(col))
output_file.write('</tr>\n')
output_file.write('</tbody></table>')

Usage example:
from cStringIO import StringIO
input_file = StringIO('"foo"\t"bar"\t"baz"\n' .... '"qux"\t"quux"\t"quux"\n') output_file = StringIO()
tsv_to_html(input_file, output_file)
print output_file.getvalue()

<table><tbody>
<tr><td>foo</td><td>bar</td><td>baz</td></tr>
<tr><td>qux</td><td>quux</td><td>quux</td></tr>
</tbody></table>
May 31 '06 #4

First let me say that I appreciate the responses that everyone has
given.

A friend of mine is a ruby programmer but knows nothing about python.
He gave me the script below and it does exactly what I want, only it is
in Ruby. Not knowing ruby this is greek to me, and I would like to
re-write it in python.

I ask then, is this essentially what others here have shown me to do,
or is it in a different vein all together?

Code:

class TsvToHTML
@@styleBlock = <<-ENDMARK
<style type='text/css'>
td {
border-left:1px solid #000000;
padding-right:4px;
padding-left:4px;
white-space: nowrap;
}
.cellTitle {
border-bottom:1px solid #000000;
background:#ffffe0;
font-weight: bold;
text-align: center;
}
.cell0 { background:#eff1f1; }
.cell1 { background:#f8f8f8; }
</style>
ENDMARK

def TsvToHTML::wrapTag(data,tag,modifier = "")
return "<#{tag} #{modifier}>" + data + "</#{tag}>\n"
end # wrapTag

def TsvToHTML::makePage(source)
page = ""
rowNum = 0
source.readlines.each { |record|
row = ""
record.chomp.split("\t").each { |field|
# replace blank fields with &nbsp;
field.sub!(/^$/,"&nbsp;")
# wrap in TD tag, specify style
row += wrapTag(field,"td","class=\"" +
((rowNum == 0)?"cellTitle":"cell#{rowNum % 2}") +
"\"")
}
rowNum += 1
# wrap in TR tag, add row to page
page += wrapTag(row,"tr") + "\n"
}
# finish page formatting
[ [ "table","cellpadding=0 cellspacing=0 border=0" ], "body","html"
].each { |tag|
page = wrapTag(@@styleBlock,"head") + page if tag == "html"
page = wrapTag(page,*tag)
}
return page
end # makePage
end # class

# stdin -> convert -> stdout
print TsvToHTML.makePage(STDIN)

Jun 1 '06 #5
Brian wrote:
First let me say that I appreciate the responses that everyone has
given.

A friend of mine is a ruby programmer but knows nothing about python.
He gave me the script below and it does exactly what I want, only it is
in Ruby. Not knowing ruby this is greek to me, and I would like to
re-write it in python.

I ask then, is this essentially what others here have shown me to do,
or is it in a different vein all together?

Leif's Python example uses the csv module which understands a lot more
about the peculiarities of the CSV/TSV formats.
The Ruby example prepends a <style>...</style> block.

The Ruby example splits each line to form a table row and each row on
tabs, to form the cells.

The thing about TSV/CSV formats is that their is no one format. you
need to check how your TSV creator generates the TSV file:
Does it put quotes around text fields?
What kind of quotes?
How does it represent null fields?
Might you get fields that include newlines?

- P.S. I'm not a Ruby programmer, just read the source ;-)

Jun 1 '06 #6

Dennis,

Thank you for that response. Your code was very helpful to me. I
think that actually seeing how it should be done in Python was a lot
more educational than spending hours with trial and error.

One question (and this is a topic that I still have trouble getting my
arms around). Why is the text in STYLEBLOCK tripple quoted?

Thanks again,
Brian

Jun 1 '06 #7
Brian wrote:
One question (and this is a topic that I still have trouble getting my
arms around). Why is the text in STYLEBLOCK tripple quoted?


Because triple-quoted strings can span lines and include single quotes
and double quotes.

--
--Scott David Daniels
sc***********@acm.org
Jun 1 '06 #8

Dennis Lee Bieber wrote:
On 1 Jun 2006 03:29:35 -0700, "Brian" <bn******@gmail.com> declaimed the
following in comp.lang.python:
Thank you for that response. Your code was very helpful to me. I
think that actually seeing how it should be done in Python was a lot
more educational than spending hours with trial and error.

It's not the best code around -- I hacked it together pretty much
line-for-line from an assumption of what the Ruby was doing (I don't do
Ruby -- too much PERL idiom in it)
One question (and this is a topic that I still have trouble getting my
arms around). Why is the text in STYLEBLOCK tripple quoted?

Triple quotes allow: 1) use of single quotes within the block
without needing to escape them; 2) allows the string to span multiple
lines. Plain string quoting must be one logical line to the parser.

I've practically never seen anyone use a line continuation character
in Python. And triple quoting looks cleaner than parser concatenation.

The alternatives would have been:

Line Continuation:
STYLEBLOCK = '\n\
<style type="text/css">\n\
td {\n\
border-left:1px solid #000000;\n\
padding-right:4px;\n\
padding-left:4px;\n\
white-space: nowrap; }\n\
.cellTitle {\n\
border-bottom:1px solid #000000;\n\
background:#ffffe0;\n\
font-weight: bold;\n\
text-align: center; }\n\
.cell0 { background:#3ff1f1; }\n\
.cell1 { background:#f8f8f8; }\n\
</style>\n\
'
Note the \n\ as the end of each line; the \n is to keep the
formatting on the generated HTML (otherwise everything would be one long
line) and the final \ (which must be the physical end of line)
signifying "this line is continued". Also note that I used ' rather than
" to avoid escaping the " on text/css.

Parser Concatenation:
STYLEBLOCK = (
'<style type="text/css">\n'
"td {\n"
" border-left:1px solid #000000;\n"
" padding-right:4px;\n"
" padding-left:4px;\n"
" white-space: nowrap; }\n"
".cellTitle {\n"
" border-bottom:1px solid #000000;\n"
" background:#ffffe0;\n"
" font-weight: bold;\n"
" text-align: center; }\n"
".cell0 { background:#3ff1f1; }\n"
".cell1 { background:#f8f8f8; }\n"
"</style>\n"
)

Note the use of ( ) where the original had """ """. Also note that
each line has quotes at start/end (the first has ' to avoid escaping
text/css). There are no commas separating each line (and the \n is still
for formatting). Using the ( ) creates an expression, and Python is nice
enough to let one split expressions inside () or[lists], {dicts}, over
multiple lines (I used that feature in a few spots to put call arguments
on multiple lines). Two strings that are next to each other

"string1" "string2"

are parsed as one string

"string1string2"

Using """ (or ''') is the cleanest of those choices, especially if
you want to do preformatted layout of the text. It works similar to the
Ruby/PERL construct that basically said: Copy all text up to the next
occurrence of MARKER_STRING.


Thank you for your explanation, now it makes sense.

Brian

Jun 1 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: VK | last post by:
09/30/03 Phil Powell posted his "Radio buttons do not appear checked" question. This question led to a long discussion about the naming rules applying to variables, objects, methods and properties...
4
by: Francois Keyeux | last post by:
hello everyone: i have a web site built using vbasic active server scripting running on iis (it works on either iis 50 and 60, but is designed for iis 50) i know how to create a plain text...
1
by: cirillo_curiosone | last post by:
Hi, i'm new to javascript. I started studing it on the web few weeks ago, but still haven't been able to solve one big problem: HOT TO PASS VALUES FROM A SCRIPT VARIABLE TO A CHILD HTML...
33
by: LRW | last post by:
http://gto.ie-studios.net/index.php When you view the above site in IE, if the 1st of the three product images is tall enough to push the cell down a couple of pixels, IE somehow doesn't show...
0
by: Boris Ammerlaan | last post by:
This notice is posted about every week. I'll endeavor to use the same subject line so that those of you who have seen it can kill-file the subject; additionally, Supersedes: headers are used to...
9
by: Patient Guy | last post by:
Taking the BODY element as an example, all of its style attributes ('alink', 'vlink', 'background', 'text', etc.) are deprecated in HTML 4.01, a fact noted in the DOM Level 2 HTML specification. ...
5
by: serge calderara | last post by:
Dear all, I am new in asp.net and prepare myself for exam I still have dificulties to understand the difference between server control and HTML control. Okey things whcih are clear are the fact...
6
by: Guy Macon | last post by:
cwdjrxyz wrote: HTML 5 has solved the above probem. See the following web page: HTML 5, one vocabulary, two serializations http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.