By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,387 Members | 1,729 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,387 IT Pros & Developers. It's quick & easy.

Help me optimize my feed script.

P: n/a
I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'
Jun 27 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Jun 26, 3:30 pm, bsag...@gmail.com wrote:
I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'
Using the += operator on strings is a common bottleneck in programs.
First thing you should try is to get rid of that. (Recent versions of
Python have taken steps to optimize it, but still it sometimes doesn't
work, such as if you have more than one reference to the string
alive.)

Instead, create a list like this:

s = []

And append substrings to the list, like this:

s.append('</head>\n<body>\n<br />\n')

Then, when writing the string out (or otherwise using it), join all
the substrings with the str.join method:

f.write(''.join(s))
Carl Banks
Jun 27 '08 #2

P: n/a
On Jun 26, 12:30*pm, bsag...@gmail.com wrote:
I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
* * *'h3{margin:10px 0 0 0;padding:0}\n'\
* * *'a.x{color:black}'\
* * *'p{margin:5px 0 0 0;padding:0}'\
* * *'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
* * * * d = feedparser.parse(url)
* * * * title = d.feed.title
* * * * link = d.feed.link
* * * * s += '\n<h3><a href="'+ link +'" class="x">'+ title+'</a></h3>\n'
* * * * # aldaily.com has weird feed
* * * * if link.find('aldaily.com') != -1:
* * * * * * * * description = d.entries[0].description
* * * * * * * * s += description + '\n'
* * * * for x in range(0,3):
* * * * * * * * if link.find('aldaily.com') != -1:
* * * * * * * * * * * * continue
* * * * * * * * title = d.entries[x].title
* * * * * * * * link = d.entries[x].link
* * * * * * * * s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'
I can 100% guarantee you that the extended run time is network I/O
bound. Investigate using a thread pool to load the feeds in parallel.
Some code you might be able to shim in:

# Extra imports
import threading
import Queue

# Function that fetches and pushes
def parse_and_put(url, queue_):
parsed_feed = feedparser.parse(url)
queue_.put(parsed_feed)

# Set up some variables
my_queue = Queue.Queue()
threads = []

# Set up a thread for fetching each URL
for url in rss:
url_thread = threading.Thread(target=parse_and_put, name=url,
args=(url, my_queue))
threads.append(url_thread)
url_thread.setDaemonic(False)
url_thread.start()

# Wait for threads to finish
for thread in threads:
thread.join()

# Push the results into a list
feeds_list = []
while not my_queue.empty():
feeds_list.append(my_queue.get())

# Do what you were doing before, replacing the for url in rss with for
d in feedS_list
for d in feeds_list:
title = d.feed.title
link = d.feed.link

Jun 27 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.