473,325 Members | 2,771 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

extracting HTML fragments and counting words

Hi,

I want to show preview of several HTML formatted newsitems on one
page, preserving markup (and images) intact, but showing not more
thatn X first _readable_ words of every page. Is anyone aware of some
Python library that makes programming this easy? I already started to
program it with Beautiful Soup, but maybe there is a more easy way...

Thanks!
--
Ksenia
Jul 18 '05 #1
0 903

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: cassandra.flowers | last post by:
Hi, I am using VB6 and want to extract text from a string. But ONLY take out words that begin with 't' or 'd'. The mainstring is input by the user into 'txtMain' and then by clicking a command...
2
by: mtp1032 | last post by:
I need to be able to extract the values from an XmlRpcValue where I do not know in advance what the keys are, or how many exist. For example, suppose I have an XmlRpcValue, object, returned by...
8
by: John M. Gabriele | last post by:
I'm putting together a small site using Python and cgi. (I'm pretty new to this, but I've worked a little with JSP/servlets/Java before.) Almost all pages on the site will share some common...
0
by: Mico | last post by:
I would be very grateful for any help with the following: I currently have the code below. This opens a MS Word document, and uses C#'s internal regular expressions library to find if there is a...
3
by: Nhd | last post by:
I have a question which involves reading from cin and counting the number of words read until the end of file(eof). The question is as follows: Words are delimited by white spaces (blanks,...
4
by: bigbagy | last post by:
Notes The programs will be compiled and tested on the machine which runs the Linux operating system. V3.4 of the GNU C/C++ compiler (gcc ,g++) must be used. A significant amount coding is...
3
by: Frank Potter | last post by:
There are ten web pages I want to deal with. from http://www.af.shejis.com/new_lw/html/125926.shtml to http://www.af.shejis.com/new_lw/html/125936.shtml Each of them uses the charset of...
2
by: Lee Crabtree | last post by:
When reading fragments, it seems like XmlReaders try to read too much. I'm working on a file parser for a new file format, and I've run into a problem. The format has an XML fragment for a header,...
3
by: Magnus.Moraberg | last post by:
Hi, I wish to extract all the words on a set of webpages and store them in a large dictionary. I then wish to procuce a list with the most common words for the language under consideration. So,...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.