473,322 Members | 1,232 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

how to get 20000 html pages content quickly from one server?

Hi
I want to get 200000 html pages content from one server, you know
urllib.urlopen need construct network connection, it will be very
slowly, how to speed up this function?
I try to using multi-thread, it speed up, but I want to quickly more,
any idea about it?
Thanks!

Mar 15 '06 #1
8 1056
JuHui wrote:
Hi
I want to get 200000 html pages content from one server, you know
urllib.urlopen need construct network connection, it will be very
slowly, how to speed up this function?
I try to using multi-thread, it speed up, but I want to quickly more,
any idea about it?
Thanks!

You are most likely constrained by the speed of your Internet
connection. If that is the case, there is nothing you can do.

-Larry Bates
Mar 15 '06 #2
On Mar 15, 2006, at 11:22 AM, JuHui wrote:
Hi
I want to get 200000 html pages content from one server, you know
urllib.urlopen need construct network connection, it will be very
slowly, how to speed up this function?
I try to using multi-thread, it speed up, but I want to quickly more,
any idea about it?
Physically remove the harddrive and reinstall it locally?
Thanks!


Happy to help.

Zac

Mar 15 '06 #3
....
I will do it later. but i want to optimize the script first.
after useing muti-thread, the time speed up from 8s to 2.3s per page.

any other suggestions?

Mar 15 '06 #4

JuHui wrote:
Hi
I want to get 200000 html pages content from one server, you know
urllib.urlopen need construct network connection, it will be very
slowly, how to speed up this function?
I try to using multi-thread, it speed up, but I want to quickly more,
any idea about it?
Thanks!


Why don't you try and use wget?

Mar 15 '06 #5

JuHui wrote:
Hi
I want to get 200000 html pages content from one server, you know
urllib.urlopen need construct network connection, it will be very
slowly, how to speed up this function?
I try to using multi-thread, it speed up, but I want to quickly more,
any idea about it?
Thanks!


Bad: use backstreet Browser and see if your IP gets blacklisted

good: ask permission

Mar 15 '06 #6
in fact, I want to do a script to get news on others site.
I must use script get the content and analyze the html code, where is
the title, where is the body....
so, I can't ask permission, use wget and "Physically remove the
harddrive and reinstall it locally"
:)

Mar 15 '06 #7
On Mar 15, 2006, at 11:58 AM, JuHui wrote:
in fact, I want to do a script to get news on others site.
I must use script get the content and analyze the html code, where is
the title, where is the body....
so, I can't ask permission, use wget and "Physically remove the
harddrive and reinstall it locally"


The only one it looks like you *can't* do is physically remove the
hard drive and reinstall it locally. Seems more like you *won't* do
the other two.

Zac

Mar 15 '06 #8
JuHui a écrit :
in fact, I want to do a script to get news on others site.


Then ask the webmasters of theses sites if they do have a ress feed...
Mar 17 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

34
by: Niels Berkers | last post by:
Hi, i'd like to host my web pages using multiparts to reduce the number of hits on the server. i know this isn't a real PHP subject, but i'll try it anyway. i've been searching the web for...
7
by: Sherry Littletree | last post by:
Hi All I am working on a site that has a large amount of common html on all its web pages. I am looking for a way to place this in a single file so, if changes are made, I can change this...
20
by: Guadala Harry | last post by:
In an ASCX, I have a Literal control into which I inject a at runtime. litInjectedContent.Text = dataClass.GetHTMLSnippetFromDB(someID); This works great as long as the contains just...
15
by: Nathan | last post by:
I have an aspx page with a data grid, some textboxes, and an update button. This page also has one html input element with type=file (not inside the data grid and runat=server). The update...
82
by: Eric Lindsay | last post by:
I have been trying to get a better understanding of simple HTML, but I am finding conflicting information is very common. Not only that, even in what seemed elementary and without any possibility...
10
by: Eric Lindsay | last post by:
This may be too far off topic, however I was looking at this page http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson. It is served as text/plain, according to Firefox...
3
by: vunet.us | last post by:
Hello, I am breaking my head running out of ideas about the best solution to my goal. I want to load some pages generated with the server (ASP) and assign their html results to JavaScript, so...
15
by: lxyone | last post by:
Using a flat file containing table names, fields, values whats the best way of creating html pages? I want control over the html pages ie 1. layout 2. what data to show 3. what controls to...
0
by: henry | last post by:
Folks: Thank you all for your replies. I'll reply briefly to each key point: Thanks! You are probably correct. I wanted to be aware of other options, that's all. Part of what got me in...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.