473,698 Members | 2,180 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

slow import

54 New Member
I'm importing a script that I made and it's literally take 10+mins to to run or import into PythonWin.

I've put the script at the bottom. But i'm also having a problem with it.
What i'm trying to do:
1.) Go the the SEC's website and look for recently filed 10-q's (that's a financial report)
2.) collect all the links for these new 10-q's
3.) add the link to the end of what i call pageroot (which is www.sec.gov)
4.) on the newly formed full web address go one page at a time and look for a piece in the source code that is " <td nowrap="nowrap" ><a href= " which will lead me to the next linked addres i need. (to navigate to the actual 10-q its 2 or 3 links away from the original search)
5.) also write these 2nd linked addresses to a file, so that i can check to make sure that it is working the intended way
6.) clean up the linked addresses with a bunch of regex

now once I get that working i'll add more, but my problem is this...
it seems to be reading to do this: (purely for example)
"google, apple, ebay, and IBM filed 10-q's, now lets collect a history of 10-qs filed for just google"
and again it should be
"google, apple, ebay and IBM filed 10-q's, now lets collect the link for each of them so that I can redirect my scrape to the actual 10-q"

if anyone could help i'd be very appreciative.

here's the code.
Expand|Select|Wrap|Line Numbers
  1. import urllib
  2. import re
  3. page = 'http://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=10-Q&owner=include&count=100&action=getcurrent'
  4. raw = []
  5. for line in urllib.urlopen(page):
  6.     if '<td bgcolor="#E6E6E6" valign="top" align="left"><a href="' in line:
  7.         raw.append(line)
  8.  
  9. codestring = ' '.join(raw)
  10. pattern = re.compile('/\S+') 
  11. results = re.findall(pattern, codestring)
  12.  
  13. pageroot = 'http://www.sec.gov' 
  14. count= len(results) 
  15.  
  16. fn = open("c://Python25/tmp.txt", 'w')
  17.  
  18. line10q = []
  19. number = 0
  20. while number < count:
  21.     newpage = pageroot + results[number]
  22.     for line in urllib.urlopen(newpage):
  23.         if '<td nowrap="nowrap"><a href="' in line:
  24.             line10q.append(line)
  25.         fn.write(line)
  26.     number += 1
  27.  
  28. fn.close()
  29.  
  30. line10qstring = ' '.join(line10q)
  31. pattern2 = re.compile('="/\S+">')
  32. results10q = re.findall(pattern, line10qstring) 
  33.  
  34. newstring = ' '.join(results10q)
  35. pattern3 = re.compile('/\S+.htm')
  36. linkresults = re.findall(pattern3, newstring) 
  37.  
  38. pattern4 = re.compile('/\S+.[a-z]{3}"')
  39. linktest2 = ' '.join(linkresults)
  40. link2 = re.findall(pattern4, linktest2) 
  41.  
  42. link2string = ' '.join(link2)
  43. pattern5 = re.compile('/\S+.htm')
  44. link4 = re.findall(pattern5, link2string) 
  45. link4string = ' '.join(link4)
  46.  
  47. linkNumber = len(link4)
  48.  
  49.  
Aug 31 '07 #1
1 2652
William Manley
56 New Member
It's because your code is executed everytime it is imported. enclosing it in a function would fix the problem.

so change

Expand|Select|Wrap|Line Numbers
  1. import urllib
  2. import re
  3. page = 'http://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=10-Q&owner=include&count=100&action=getcurrent'
  4. raw = []
  5. for line in urllib.urlopen(page):
  6.     if '<td bgcolor="#E6E6E6" valign="top" align="left"><a href="' in line:
  7.         raw.append(line)
  8.  
  9. codestring = ' '.join(raw)
  10. pattern = re.compile('/\S+') 
  11. results = re.findall(pattern, codestring)
  12.  
  13. pageroot = 'http://www.sec.gov' 
  14. count= len(results) 
  15.  
  16. fn = open("c://Python25/tmp.txt", 'w')
  17.  
  18. line10q = []
  19. number = 0
  20. while number < count:
  21.     newpage = pageroot + results[number]
  22.     for line in urllib.urlopen(newpage):
  23.         if '<td nowrap="nowrap"><a href="' in line:
  24.             line10q.append(line)
  25.         fn.write(line)
  26.     number += 1
  27.  
  28. fn.close()
  29.  
  30. line10qstring = ' '.join(line10q)
  31. pattern2 = re.compile('="/\S+">')
  32. results10q = re.findall(pattern, line10qstring) 
  33.  
  34. newstring = ' '.join(results10q)
  35. pattern3 = re.compile('/\S+.htm')
  36. linkresults = re.findall(pattern3, newstring) 
  37.  
  38. pattern4 = re.compile('/\S+.[a-z]{3}"')
  39. linktest2 = ' '.join(linkresults)
  40. link2 = re.findall(pattern4, linktest2) 
  41.  
  42. link2string = ' '.join(link2)
  43. pattern5 = re.compile('/\S+.htm')
  44. link4 = re.findall(pattern5, link2string) 
  45. link4string = ' '.join(link4)
  46.  
  47. linkNumber = len(link4)
  48.  
  49.  
to

Expand|Select|Wrap|Line Numbers
  1. import urlib
  2. import re
  3.  
  4. def myfunc():
  5.     page = 'http://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=10-Q&owner=include&count=100&action=getcurrent'
  6.     # rest of code....
  7.  
That way you just do:
Expand|Select|Wrap|Line Numbers
  1. import myscript
  2. myscript.myfunc()
  3.  
and your done!
Sep 1 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

3
2357
by: Freddie | last post by:
Hi, I posted a while ago for some help with my word finder program, which is now quite a lot faster than I could manage. Thanks to all who helped :) This time, I've written a basic batch binary usenet poster in Python, but encoding the data into yEnc format is fairly slow. Is it possible to improve the routine any, WITHOUT using non-standard libraries? I don't want to have to rely on something strange ;)
16
2581
by: Jason | last post by:
Hey, I'm an experience programmer but new to Python. I'm doing a simple implementation of a field morphing techinique due to Beier and Neely (1992) and I have the simple case working in Python 2.3 - but it's REALLY slow. Basically, you specify two directed line segments in the coordinate system of a raster image and use the difference between those two lines to transform the image.
2
3346
by: David | last post by:
Hi, We have an internal network of 3 users. Myself & one other currently have individual copies of the front-end MS Access forms and via our individual ODBC links we have used the: File > Get External Data > Link Tables > select ODBC Databases facility to link to our back-end MySQL Server. On both our machines the tables appear in the window very quickly and if we hit 'Select All', all the tables start loading really quickly into our...
3
2861
by: chrisperkins99 | last post by:
It seems to me that str.count is awfully slow. Is there some reason for this? Evidence: ######## str.count time test ######## import string import time import array s = string.printable * int(1e5) # 10**7 character string
4
2813
by: HYRY | last post by:
Why the exec time of test(readdata()) and test(randomdata()) of following program is different? my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata function to return a list with 2586024 samples. the exec result is: 2586024 <type 'list'> 10.8603842736 2586024
4
7582
by: Andrew Jackson | last post by:
I am writing a newsgroup client. I have the protocol figured out. But I get slow transfer speeds off any of the network objects read the data from For example one of the commands for a news client to use is "XOVER articlenumber-" This return string after string of all the news articles from article number on.... Another newsclient, i wont name names, pulls data down just fine. Using a
4
2116
by: Joshua Kugler | last post by:
We've recently been doing some profiling on a project of ours. It runs quite fast on Linux but *really* bogs down on Windows 2003. We initially thought it was the simplejson libraries (we don't use the C extensions) but profiling proved otherwise. We have a function that does some runtime imports via calls to __import__. We ran 1000 iterations (we used cProfile) of the application (web app). There were eight calls to __import__ per...
2
3528
by: dineshchand | last post by:
I m working on a application importing data into oracle.But after few import into Oracle DB my import process is taking so long time.My oracle is getting slow after running import few times.Can somebody suggest me some performance tips. Thanks And Regards Dinesh
21
2484
by: Michele | last post by:
Hi, I'm trying to encode a byte data. Let's not focus on the process of encoding; in fact, I want to emphasize that the method create_random_block takes 0.5s to be executed (even Java it's faster) on a Dual-Core 3.0Ghz machine: took 46.746999979s, avg: 0.46746999979s Thus I suppose that the xor operation between bytes raise the execution time to 0.5; why I suppose that?
3
6835
by: Rüdiger Werner | last post by:
Hello! Out of curiosity and to learn a little bit about the numpy package i've tryed to implement a vectorised version of the 'Sieve of Zakiya'. While the code itself works fine it is astounding for me that the numpy Version is almost 7 times slower than the pure python version. I tryed to find out if i am doing something wrong but wasn't able to find any answer.
0
8673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9021
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8892
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8860
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7716
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6518
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4614
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3043
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2327
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.