I was wondering if anyone here on the group could point me in a
direction that would expllaing how to use python to convert a tsv file
to html. I have been searching for a resource but have only seen
information on dealing with converting csv to tsv. Specifically I want
to take the values and insert them into an html table.
I have been trying to figure it out myself, and in essence, this is
what I have come up with. Am I on the right track? I really have the
feeling that I am re-inventing the wheel here.
1) in the code define a css
2) use a regex to extract the info between tabs
3) wrap the values in the appropriate tags and insert into table.
4) write the .html file
Thanks again for your patience,
Brian 8 3957
> I was wondering if anyone here on the group could point me in a direction that would expllaing how to use python to convert a tsv file to html. I have been searching for a resource but have only seen information on dealing with converting csv to tsv. Specifically I want to take the values and insert them into an html table.
I have been trying to figure it out myself, and in essence, this is what I have come up with. Am I on the right track? I really have the feeling that I am re-inventing the wheel here.
1) in the code define a css 2) use a regex to extract the info between tabs 3) wrap the values in the appropriate tags and insert into table. 4) write the .html file
Sounds like you just want to do something like
print "<table>"
for line in file("in.tsv"):
print "<tr>"
items = line.split("\t")
for item in items:
print "<td>%s</td>" % item
print "</tr>"
print "</table>"
It gets a little more complex if you need to clean each item
for HTML entities/scripts/etc...but that's usually just a
function that you'd wrap around the item:
print "<td>%s</td>" % escapeEntity(item)
using whatever "escapeEntity" function you have on hand.
E.g.
from xml.sax.saxutils import escape
:
:
print "<td>%s</td>" % escape(item)
It doesn't gracefully attempt to define headers using
<thead>, <tbody>, and <th> sorts of rows, but a little
toying should solve that.
-tim
> 1) in the code define a css 2) use a regex to extract the info between tabs
In place of this, you might want to look at http://effbot.org/librarybook/csv.htm
Around the middle of that page you'll see how to use a delimiter other
than a comma
3) wrap the values in the appropriate tags and insert into table. 4) write the .html file
Thanks again for your patience, Brian
Brian wrote: I was wondering if anyone here on the group could point me in a direction that would expllaing how to use python to convert a tsv file to html. I have been searching for a resource but have only seen information on dealing with converting csv to tsv. Specifically I want to take the values and insert them into an html table.
import csv
from xml.sax.saxutils import escape
def tsv_to_html(input_file, output_file):
output_file.write('<table><tbody>\n')
for row in csv.reader(input_file, 'excel-tab'):
output_file.write('<tr>')
for col in row:
output_file.write('<td>%s</td>' % escape(col))
output_file.write('</tr>\n')
output_file.write('</tbody></table>')
Usage example: from cStringIO import StringIO input_file = StringIO('"foo"\t"bar"\t"baz"\n'
.... '"qux"\t"quux"\t"quux"\n') output_file = StringIO() tsv_to_html(input_file, output_file) print output_file.getvalue()
<table><tbody>
<tr><td>foo</td><td>bar</td><td>baz</td></tr>
<tr><td>qux</td><td>quux</td><td>quux</td></tr>
</tbody></table>
First let me say that I appreciate the responses that everyone has
given.
A friend of mine is a ruby programmer but knows nothing about python.
He gave me the script below and it does exactly what I want, only it is
in Ruby. Not knowing ruby this is greek to me, and I would like to
re-write it in python.
I ask then, is this essentially what others here have shown me to do,
or is it in a different vein all together?
Code:
class TsvToHTML
@@styleBlock = <<-ENDMARK
<style type='text/css'>
td {
border-left:1px solid #000000;
padding-right:4px;
padding-left:4px;
white-space: nowrap;
}
.cellTitle {
border-bottom:1px solid #000000;
background:#ffffe0;
font-weight: bold;
text-align: center;
}
.cell0 { background:#eff1f1; }
.cell1 { background:#f8f8f8; }
</style>
ENDMARK
def TsvToHTML::wrapTag(data,tag,modifier = "")
return "<#{tag} #{modifier}>" + data + "</#{tag}>\n"
end # wrapTag
def TsvToHTML::makePage(source)
page = ""
rowNum = 0
source.readlines.each { |record|
row = ""
record.chomp.split("\t").each { |field|
# replace blank fields with
field.sub!(/^$/," ")
# wrap in TD tag, specify style
row += wrapTag(field,"td","class=\"" +
((rowNum == 0)?"cellTitle":"cell#{rowNum % 2}") +
"\"")
}
rowNum += 1
# wrap in TR tag, add row to page
page += wrapTag(row,"tr") + "\n"
}
# finish page formatting
[ [ "table","cellpadding=0 cellspacing=0 border=0" ], "body","html"
].each { |tag|
page = wrapTag(@@styleBlock,"head") + page if tag == "html"
page = wrapTag(page,*tag)
}
return page
end # makePage
end # class
# stdin -> convert -> stdout
print TsvToHTML.makePage(STDIN)
Brian wrote: First let me say that I appreciate the responses that everyone has given.
A friend of mine is a ruby programmer but knows nothing about python. He gave me the script below and it does exactly what I want, only it is in Ruby. Not knowing ruby this is greek to me, and I would like to re-write it in python.
I ask then, is this essentially what others here have shown me to do, or is it in a different vein all together?
Leif's Python example uses the csv module which understands a lot more
about the peculiarities of the CSV/TSV formats.
The Ruby example prepends a <style>...</style> block.
The Ruby example splits each line to form a table row and each row on
tabs, to form the cells.
The thing about TSV/CSV formats is that their is no one format. you
need to check how your TSV creator generates the TSV file:
Does it put quotes around text fields?
What kind of quotes?
How does it represent null fields?
Might you get fields that include newlines?
- P.S. I'm not a Ruby programmer, just read the source ;-)
Dennis,
Thank you for that response. Your code was very helpful to me. I
think that actually seeing how it should be done in Python was a lot
more educational than spending hours with trial and error.
One question (and this is a topic that I still have trouble getting my
arms around). Why is the text in STYLEBLOCK tripple quoted?
Thanks again,
Brian
Brian wrote: One question (and this is a topic that I still have trouble getting my arms around). Why is the text in STYLEBLOCK tripple quoted?
Because triple-quoted strings can span lines and include single quotes
and double quotes.
--
--Scott David Daniels sc***********@acm.org
Dennis Lee Bieber wrote: On 1 Jun 2006 03:29:35 -0700, "Brian" <bn******@gmail.com> declaimed the following in comp.lang.python:
Thank you for that response. Your code was very helpful to me. I think that actually seeing how it should be done in Python was a lot more educational than spending hours with trial and error. It's not the best code around -- I hacked it together pretty much line-for-line from an assumption of what the Ruby was doing (I don't do Ruby -- too much PERL idiom in it)
One question (and this is a topic that I still have trouble getting my arms around). Why is the text in STYLEBLOCK tripple quoted? Triple quotes allow: 1) use of single quotes within the block without needing to escape them; 2) allows the string to span multiple lines. Plain string quoting must be one logical line to the parser.
I've practically never seen anyone use a line continuation character in Python. And triple quoting looks cleaner than parser concatenation.
The alternatives would have been:
Line Continuation: STYLEBLOCK = '\n\ <style type="text/css">\n\ td {\n\ border-left:1px solid #000000;\n\ padding-right:4px;\n\ padding-left:4px;\n\ white-space: nowrap; }\n\ .cellTitle {\n\ border-bottom:1px solid #000000;\n\ background:#ffffe0;\n\ font-weight: bold;\n\ text-align: center; }\n\ .cell0 { background:#3ff1f1; }\n\ .cell1 { background:#f8f8f8; }\n\ </style>\n\ ' Note the \n\ as the end of each line; the \n is to keep the formatting on the generated HTML (otherwise everything would be one long line) and the final \ (which must be the physical end of line) signifying "this line is continued". Also note that I used ' rather than " to avoid escaping the " on text/css.
Parser Concatenation: STYLEBLOCK = ( '<style type="text/css">\n' "td {\n" " border-left:1px solid #000000;\n" " padding-right:4px;\n" " padding-left:4px;\n" " white-space: nowrap; }\n" ".cellTitle {\n" " border-bottom:1px solid #000000;\n" " background:#ffffe0;\n" " font-weight: bold;\n" " text-align: center; }\n" ".cell0 { background:#3ff1f1; }\n" ".cell1 { background:#f8f8f8; }\n" "</style>\n" )
Note the use of ( ) where the original had """ """. Also note that each line has quotes at start/end (the first has ' to avoid escaping text/css). There are no commas separating each line (and the \n is still for formatting). Using the ( ) creates an expression, and Python is nice enough to let one split expressions inside () or[lists], {dicts}, over multiple lines (I used that feature in a few spots to put call arguments on multiple lines). Two strings that are next to each other
"string1" "string2"
are parsed as one string
"string1string2"
Using """ (or ''') is the cleanest of those choices, especially if you want to do preformatted layout of the text. It works similar to the Ruby/PERL construct that basically said: Copy all text up to the next occurrence of MARKER_STRING.
Thank you for your explanation, now it makes sense.
Brian This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: VK |
last post by:
09/30/03 Phil Powell posted his "Radio buttons do not appear checked"
question.
This question led to a long discussion about the naming rules applying to
variables, objects, methods and properties...
|
by: Francois Keyeux |
last post by:
hello everyone:
i have a web site built using vbasic active server scripting running on
iis (it works on either iis 50 and 60, but is designed for iis 50)
i know how to create a plain text...
|
by: cirillo_curiosone |
last post by:
Hi,
i'm new to javascript. I started studing it on the web few weeks ago,
but still haven't been able to solve one big problem: HOT TO PASS VALUES
FROM A SCRIPT VARIABLE TO A CHILD HTML...
|
by: LRW |
last post by:
http://gto.ie-studios.net/index.php
When you view the above site in IE, if the 1st of the three product
images is tall enough to push the cell down a couple of pixels, IE
somehow doesn't show...
|
by: Boris Ammerlaan |
last post by:
This notice is posted about every week. I'll endeavor to use the same
subject line so that those of you who have seen it can kill-file the
subject; additionally, Supersedes: headers are used to...
|
by: Patient Guy |
last post by:
Taking the BODY element as an example, all of its style attributes
('alink', 'vlink', 'background', 'text', etc.) are deprecated in HTML 4.01,
a fact noted in the DOM Level 2 HTML specification.
...
|
by: serge calderara |
last post by:
Dear all,
I am new in asp.net and prepare myself for exam
I still have dificulties to understand the difference between server control
and HTML control.
Okey things whcih are clear are the fact...
|
by: Guy Macon |
last post by:
cwdjrxyz wrote:
HTML 5 has solved the above probem. See the following web page:
HTML 5, one vocabulary, two serializations
http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |