473,387 Members | 1,742 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Startying with Python, need some pointers with manipulating strings

Hi guys, I'm starting to learn Python and so far am very
impressed with it's possibilities. I do however need some help
with certain things I'm trying to do which as of yet haven't
managed to find the answer by myself. Hopefully, someone will be
able to give me some pointers :)

First my background, I haven't programmed seriously in over 5
years, but recently have started programming again in
Delphi/Pascal scripting, and that's what I'm most familiar with
right now. I'm also much more confortable with structured
programming in contrast to OO (which isn't helping much with
Python :))

Anyway, I have a very specific project in mind which I've mostly
implemented in Pascal and I'd like to implement it in Python
since the possibilities after that are much more interesting.

Basically, I'm getting a htmlsource from a URL and need to
a.) find specific URLs
b.) find specific data
c.) with specific URLs, load new html pages and repeat.

I've managed to load the html source I want into an object
called htmlsource using:
import urllib
sock = urllib.urlopen("URL Link")
htmlSource = sock.read()
sock.close()


I'm assuming that htmlSource is a string with \n at the end of
each line.
NOTE: I've become very accustomed with the TStringList class in
Delphi so forgive me if I'm trying to work in that way with
Python...

Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?

Second main thing I'd like to know has to do with urllister, I'm
very intrigued by it's use of grabbing automatically url links
from the source. but I've only managed to get it to retrive
everything, which is a lot. what are my options in term of
getting it to be more specific? Can I tell it to retrieve a URL
IF a keyword is found on the same string line?

Hopefully someone will be able able/willing to give me a hand, I
think with these roadblocks out of the way, I should be able to
figure out the rest of what I need. Thanks in advance!

Benji99

----------------------------------------------
Posted with NewsLeecher v1.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
----------------------------------------------

Jul 18 '05 #1
2 1468
"Benji99" <bo*@nospam.net> wrote in message
news:41***********************@unlimited.newshosti ng.com...

Basically, I'm getting a htmlsource from a URL and need to
a.) find specific URLs
b.) find specific data
c.) with specific URLs, load new html pages and repeat.
<snip>
Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?

If you download pyparsing (at http://pyparsing.sourceforge.net), you'll find
in the examples something very close to this called urlextractor.py (lists
out all href's and their associated links on the page at www.yahoo.com).

-- Paul
Jul 18 '05 #2
Benji99 wrote:
I've managed to load the html source I want into an object
called htmlsource using:

import urllib
sock = urllib.urlopen("URL Link")
htmlSource = sock.read()
sock.close()

I'm assuming that htmlSource is a string with \n at the end of
each line.
NOTE: I've become very accustomed with the TStringList class in
Delphi so forgive me if I'm trying to work in that way with
Python...

Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?


The Pythonic way to do this is to iterate through the lines of htmlSource and process them one at a
time.
htmlSource = htmlSource.split('\n') # Split on newline, making a list of lines
for line in htmlSource:
# Do something with line - check to see if it has the text of interest

You might want to look at Beautiful Soup. If you can find the links of interest by the tags around
them it might do what you want:
http://www.crummy.com/software/BeautifulSoup/

Kent
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: indru | last post by:
Hi all, I am new to Python programming. I am from C,C++,Perl background. I am quite convinced by the possibilities python as a very high level language is offering. I am seriously thinking of...
220
by: Brandon J. Van Every | last post by:
What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...
16
by: Paul Prescod | last post by:
I skimmed the tutorial and something alarmed me. "Strings are a powerful data type in Prothon. Unlike many languages, they can be of unlimited size (constrained only by memory size) and can hold...
24
by: Matt Feinstein | last post by:
Hi all-- I'm new to Python, and was somewhat taken aback to discover that the core language lacks some basic numerical types (e.g., single-precision float, short integers). I realize that there...
47
by: Michael Scarlett | last post by:
There is an amazing article by paul graham about python, and an even better discussion about it on slashdot. The reason I point this out, is the more I read both articles, the more I realised how...
1
by: Dave | last post by:
Hello All, I'm trying to clarify how Python avoids byte by byte string comparisons most of the time. As I understand, dictionaries keep strings, their keys (hash values), and caches of their...
5
by: BBands | last post by:
I'd like to see if a string exists, even approximately, in another. For example if "black" exists in "blakbird" or if "beatles" exists in "beatlemania". The application is to look though a long...
28
by: hlubenow | last post by:
Hello, I really like Perl and Python for their flexible lists like @a (Perl) and a (Python), where you can easily store lots of strings or even a whole text-file. Now I'm not a...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.