how can I extract all urls in a string by using re.findall() ?

I want to retrieve all urls in a string. When I use re.fiandall, I get
a list of tuples.
My code is like below:

Expand|Select|Wrap|Line Numbers

 
url=unicode(r"((http|ftp)://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)")

m=re.findall(url,html)

for i in m:

print i

html is a variable of string type which contains many urls in it.
the code will print many tuples, and each tuple seems not to represent
a url. e.g, one of them is as below:

(u'http://', u'http', u'image.zhongsou.com/image/netchina.gif', u'',
u'', u'', u'', u'image.zhongsou.com', u'.com', u'.com',
u'/netchina.gif')

Why is there two "http" in it? and why are there so many ampty strings
in the tupe above? It's obviously not a url. How can I get the urls
correctly?

Thanks in advance.
--
ðÐðÄ´ÏÃ÷¾ø¶¥¡¢¸ãÐ¦Ö®¼«£¬ÊÇÈËÀàµÄºÃÅóÓÑ¡£
Ö±µ½ÓÐÒ»Ìì£¬ÎÒ²Å·¢¾õ£¬ÎÒÊÇðÐðÄ¡£
ÎÒÊÇ·*Ç½µÄðÐðÄ¡£

Jul 18 '05 #1

Subscribe Post Reply

1164

by: Logical | last post by:

I wanted to do: include('page.htm?id=12&foo=bar'); But since I can't (and don't want to make another seperate HTTP request with include('http://...')); I was wondering if there's a function...

PHP

How to extract 2 integers from a string in python?

by: yinglcs | last post by:

Hi, how can I extract 2 integers from a string in python? for example, my source string is this: Total size: 173233 (371587) I want to extract the integer 173233 and 371587 from that...

Python

Best way to extract URL from random string?

by: deko | last post by:

If I have random and unpredictable user agent strings containing URLs, what is the best way to extract the URL? For example, let's say the string looks like this: registered NYSE 943 <a...

PHP

Extract URL from String

by: markmcgookin | last post by:

Hi Folks, I am writing a program to analyse an html page in java, I am connecting to a website, then going to extract ALL the links from it. I think the best way to do this is using the <a...

Java

BeautifulSoup - extract the <object tag

by: gcmartijn | last post by:

I'm trying to extract something like this: <object classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000 codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/...

Python

List(Of String)

by: shapper | last post by:

Hello, How can I filter a List(Of String)? I need to get the list elements which start with the letters contained in the variable Text. Thanks, Miguel

ASP.NET

Extract all Img Src tags using Java Regular Expression

by: dkasyap | last post by:

Hi, I have a huge string containing html tags, some of these tags being <img src="URL"> ones. I need to extract the urls from all the occurences of these tags in the input string. This is what I...

Java

Re: Extract Information from Tables in html

by: Walter Cruz | last post by:

On Fri, Sep 5, 2008 at 11:29 AM, Jackie Wang <jackie.python@gmail.comwrote: Use BeautifulSoup. 's - Walter

Python

how do I extract data from html file ? - web scraping

by: masterinex | last post by:

Hi guys , Im a little unfamiliar with Python . Hope you can take a look at this: Im trying to extract the number 7.2 from the html string below using python: '''<a...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

how can I extract all urls in a string by using re.findall() ?

Similar topics