473,287 Members | 1,581 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

spider, why isnt it finding the url?

this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()
Jun 27 '08 #1
1 855
On 23 Maj, 02:02, notnorweg...@yahoo.se wrote:
this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()
hmm ok it it printing it rows per rows. not what i expected.

Jun 27 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Kyle Mizell | last post by:
I am looking for a script that I can use to spider a website, and then pull the images... I know how to do it for a single page, but, I would like to be able to do this for the entire site. Any...
0
by: Auction software | last post by:
Free download full version , all products http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups. Millions of valid...
0
by: leegold2 | last post by:
Hi, I'm using win2k, I have php and mysql working fine w/"localhost" apache. I've tried to install phpdig but haven't gotten anywhere - I can't even begin to understand the install. Having...
3
by: Thomas Lindgaard | last post by:
Hello I'm a newcomer to the world of Python trying to write a web spider. I downloaded the skeleton from http://starship.python.net/crew/aahz/OSCON2001/ThreadPoolSpider.py Some of the...
0
by: Auction software | last post by:
Free download full version , all products from Mewsoft dot com http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups....
0
by: dtsearch | last post by:
New release expands-through a .NET Spider API, to Linux, and to OpenOffice-dtSearch's ability to index over a terabyte of text in a single index, with indexed search time typically less than a...
7
by: baroque Chou | last post by:
anyone know how google spiders access web site, how dose they manage to get the href information? do they have special access right or something? any help is appreciated
3
by: Tony Lance | last post by:
Big Bertha Thing spider Cosmic Ray Series Possible Real World System Constructs http://web.onetel.com/~tonylance/spider.html Access page JPG 11K Image Astrophysics net ring Access site...
2
by: abeen | last post by:
Hello, I would want to know which could be the best programming language for developing web spider. More information about the spider, much better,, thanks http://www.imavista.com
2
by: =?Utf-8?B?Q2hhcnRz?= | last post by:
I have been writing C# programs to spider yellow page to get list of restaurant name, address to the database. When I encounter button or hyperlink, I don’t know how to use the program to click...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.