Hi,
I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Titl e</h1><p>This is a <br />test</p></div>
will be transformed to:
Title
This is a
test
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?
Thanks!
Ksenia. 6 1453
Ksenia Marasanova wrote:
Hi,
I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Titl e</h1><p>This is a <br />test</p></div>
will be transformed to:
Title
This is a
test
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?
html2text is a commandline tool. You can invoke it from python using
subprocess.
Diez
Hi,
I guess stripogram would be more pythonic : http://sourceforge.net/project/showf...?group_id=1083
Regards,
Laurent
Diez B. Roggisch wrote:
Ksenia Marasanova wrote:
>Hi,
I am looking for a library that will give me very simple text representati on of HTML. For example <div><h1>Title </h1><p>This is a <br />test</p></div>
will be transformed to:
Title
This is a test
i want to send plain text alternative of html email, and would prefer to do it automatically from HTML source. Any hints?
html2text is a commandline tool. You can invoke it from python using
subprocess.
Diez
Ksenia Marasanova <ks************ ***@gmail.comwr ote:
Hi,
I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Titl e</h1><p>This is a <br />test</p></div>
will be transformed to:
Title
This is a
test
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
something like this:
import re
text = '<div><h1>Title </h1><p>This is a <br />test</p></div>'
text = re.sub(r'[\n\ \t]+', ' ', text)
text = re.sub(r'(?i)(\ <p\>|\<br\>|\ <h[1-6]\>)', '\n', text)
result = re.sub('<.+?>', '', text)
print result
--
-----------------------------------------------------------
| Radovan GarabÃ*k http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls .savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Ksenia Marasanova wrote:
I am looking for a library that will give me very simple text
representation of HTML.
For example <div><h1>Title </h1><p>This is a <br />test</p></div>
will be transformed to:
Title
This is a
test
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?
Use htmllib:
>>import htmllib, formatter, StringIO def cleanup(s):
out = StringIO.String IO()
p = htmllib.HTMLPar ser(
formatter.Abstr actFormatter(fo rmatter.DumbWri ter(out)))
p.feed(s)
p.close()
if p.anchorlist:
print >>out
for idx,anchor in enumerate(p.anc horlist):
print >>out, "\n[%d]: %s" % (idx+1,anchor)
return out.getvalue()
>>print cleanup('''<div ><h1>Title</h1><p>This is a <br
/>test</p></div>''')
Title
This is a
test
>>print cleanup('''<div ><h1>Title</h1><p>This is a <br />test with <a
href="http://python.org">a link</ato the Python homepage</p></div>''')
Title
This is a
test with a link[1] to the Python homepage
[1]: http://python.org
On 20 Jul 2006 15:12:27 GMT, Duncan Booth <du**********@i nvalid.invalidw rote:
Ksenia Marasanova wrote:
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?
Use htmllib:
>import htmllib, formatter, StringIO def cleanup(s):
out = StringIO.String IO()
p = htmllib.HTMLPar ser(
formatter.Abstr actFormatter(fo rmatter.DumbWri ter(out)))
p.feed(s)
p.close()
if p.anchorlist:
print >>out
for idx,anchor in enumerate(p.anc horlist):
print >>out, "\n[%d]: %s" % (idx+1,anchor)
return out.getvalue()
>print cleanup('''<div ><h1>Title</h1><p>This is a <br
/>test</p></div>''')
Title
This is a
test
>print cleanup('''<div ><h1>Title</h1><p>This is a <br />test with <a
href="http://python.org">a link</ato the Python homepage</p></div>''')
Title
This is a
test with a link[1] to the Python homepage
[1]: http://python.org
cleanup() doesn't handle script and styles too well. html2text will
do a much better job of these and give a more structured output
(compatible with Markdown) http://www.aaronsw.com/2002/html2text/
>>import html2text print html2text.html2 text('''<div><h 1>Title</h1><p>This is a <br
/>test with <a href="http://python.org">a link</ato the Python
homepage</p></div>''')
# Title
This is a
test with [a link][1] to the Python homepage
[1]: http://python.org
HTH :)
Sorry for the late reply... better too late than never :)
Thanks to all for the tips. Stripogram is the winner, since it is the
most configurable and accept line-length parameter, which is handy for
email...
Ksenia.
On 7/19/06, Laurent Rahuel <lr************ *@voila.frwrote :
Hi,
I guess stripogram would be more pythonic : http://sourceforge.net/project/showf...?group_id=1083
Regards,
Laurent
Diez B. Roggisch wrote:
Ksenia Marasanova wrote:
Hi,
I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Titl e</h1><p>This is a <br />test</p></div>
will be transformed to:
Title
This is a
test
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?
html2text is a commandline tool. You can invoke it from python using
subprocess.
Diez
-- http://mail.python.org/mailman/listinfo/python-list This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Don |
last post by:
I checked the FAQs but don't seem to fing anything that covers a current
question I have. In my old age it seems I've become a coder and not a programmer.
I've put an image on the left, with descriptive text to the right, centered:
<img src="james.jpg" height="300" width="200" target="_BLANK" ALIGN=CENTER> James ca. 1874 </A>
However, for one image the text is much longer and wraps to the
bottom of the image and not immediately under...
|
by: RobG |
last post by:
Why does Firefox insert #text nodes as children of TR elements?
As a work-around for older Safari versions not properly supporting a
table row's cells collection, I used the row's childNodes collection as
it was pretty much exactly the same thing. However, in Firefox 1.0.7
text nodes are inserted between the TDs. I'm certain that this didn't
use to happen with older versions.
The HTML specification states that the only element that...
|
by: Stefan Mueller |
last post by:
With the following code I can add a new row to an existing table. That
really works great. Many thanks to all who helped me so far.
But my problem is that the added cells do somehow not have the same style as
the first row which I added by HTML.
I do everything with the JavaScript what I do with HTML except that the
added text with the JavaScript is not
<h5 class = "style_tableentry_middle">Entry middle</h5>
I guess it's only somehow...
|
by: Mantorok Redgormor |
last post by:
I always see posts that involve the representation of integers, where
some poster claims that the unerlyding representation of an integer
doesn't have to reflect on the actual integer, for example:
int foo = 0;
0 can be all zeros 0x00000000 or 00000000 00000000 00000000 00000000
Then someone chimes in and says 0 doesn't have to contain all zeros..
|
by: Yeow |
last post by:
hello,
i was trying to use the fread function on SunOS and ran into some
trouble.
i made a simple test as follows:
i'm trying to read in a binary file (generated from a fortran code)
that contains the following three floating-point numbers:
1.0 2.0 3.0
| |
by: noblEnds |
last post by:
Hi. A quick thanks to those who try to help. here's what i'm trying
to do:
<?xml>
<stuff>
<theStory>
<p>aaklsjd fakljs fakjs faskldj a;klsjdf l;aksdj f THIS TEXT SHOULD BE
HIGHLIGHED a;klsdjf a;slkjdf a;skljdf a;slkjdf a;slkjf a;sklj fas;kl
jf;ak s</p>
|
by: Derek |
last post by:
Hi
Hope that this is the correct newsgroup for this, sorry if it is not.
I have the following code which is used to display the data brought back
from a MySQL database in an input box so that a user cam make changes before
resubmitting them.
If I display $myrow all of the text is there, if I use the following code
only the first word is show The same happens if I simply use $myrow and
|
by: Xah Lee |
last post by:
Text Processing with Emacs Lisp
Xah Lee, 2007-10-29
This page gives a outline of how to use emacs lisp to do text
processing, using a specific real-world problem as example. If you
don't know elisp, first take a gander at Emacs Lisp Basics.
HTML version with links and colors is at:
http://xahlee.org/emacs/elisp_text_processing.html
|
by: jackson.rayne |
last post by:
Hello,
Another newbie question here.
Let me explain my situation first. I have bought a 3rd party tool
that runs a PHP script and gives me some HTML code which I can
directly use in my pages.
The code generated is normal HTML code, example
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |