473,383 Members | 1,818 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Help running BeautifulSoup script

Hi,
I'm very new to Python. Basically all I want to do is run this scipt. But I don't know how to.

My OS is MS Widows XP
Any help is much appreciated.

P.S. I have already installed BeautifulSoup
Aug 22 '07 #1
11 2872
bartonc
6,596 Expert 4TB
Hi,
I'm very new to Python. Basically all I want to do is run this scipt. But I don't know how to.

My OS is MS Widows XP
Any help is much appreciated.

P.S. I have already installed BeautifulSoup
You'll need to add the Python directory to your PATH environment variable. After that, I'd say de-compress the backup_fotopic.tar.gz file into a folder. Start a DOS shell and cd to that folder and type

backup_fotopic.py

Assuming that Python is installed correctly (.py files have the python icon) the script will run.
Aug 23 '07 #2
You'll need to add the Python directory to your PATH environment variable.
How do I do this?

Thanks
Aug 23 '07 #3
bartonc
6,596 Expert 4TB
You'll need to add the Python directory to your PATH environment variable.
Right-click My Computer, go to Properties. On the Advanced tab, click the Environment Variables button. In the "User variables for <your name>" list, find the one called PATH. Select it and click the Edit button. Add some like

C:\python24;

(depending on the actual path and version of your python installation)
to the beginning or the line (yours may not have anything in it yet).
Aug 23 '07 #4
Got the PATH variable set, thanks.

When I run the script I get:
Expand|Select|Wrap|Line Numbers
  1. Traceback (most recent call last):
  2.   File "C:/Python25/backup_fotopic", line 30, in <module>
  3.     title = image_soup.first('title').contents[0]
  4. AttributeError: 'NoneType' object has no attribute 'contents' 
How do I fix this?

Thank you for your continued support.
Aug 24 '07 #5
bartonc
6,596 Expert 4TB
Got the PATH variable set, thanks.

When I run the script I get:
Expand|Select|Wrap|Line Numbers
  1. Traceback (most recent call last):
  2.   File "C:/Python25/backup_fotopic", line 30, in <module>
  3.     title = image_soup.first('title').contents[0]
  4. AttributeError: 'NoneType' object has no attribute 'contents' 
How do I fix this?

Thank you for your continued support.
This is telling you that the result of
Expand|Select|Wrap|Line Numbers
  1. image_soup.first('title')
is None.
My guess is that 'title' is meaningless to the function.

You'll need to include some code and an explanation of what you expect it to do.
Aug 24 '07 #6
Expand|Select|Wrap|Line Numbers
  1. #! /usr/bin/env python
  2.  
  3. import urllib, string
  4. from BeautifulSoup import BeautifulSoup
  5.  
  6. collections_soup = BeautifulSoup()
  7.  
  8. # Replace the example URL below with the address of the pictures you want to backup
  9. base_url = 'http://andiday.fotopic.net/c1343336.html'
  10.  
  11. f = urllib.urlopen(base_url + '/list_collections.php')
  12. result = f.read()
  13. f.close()
  14. collections_soup.feed(result)
  15. for collection in collections_soup('a'):
  16.     print '>>>' + base_url + collection['href']
  17.     f = urllib.urlopen(base_url + collection['href'])
  18.     result = f.read()
  19.     f.close()
  20.     collection_soup = BeautifulSoup()
  21.     collection_soup.feed(result)
  22.     for thumb in collection_soup('td', {'class' : 'thumbs'}):
  23.         for image in thumb('a'):
  24.             if string.find(image['href'], 'javascript') == -1 and string.find(image['href'], 'title') == -1:
  25.                 f = urllib.urlopen(base_url + image['href'])
  26.                 result = f.read()
  27.                 f.close()
  28.                 image_soup = BeautifulSoup()
  29.                 image_soup.feed(result)
  30.                 title = image_soup.first('title').contents[0]
  31.                 filename = string.split(title.string, '.JPG')[0]
  32.                 print filename
  33.                 for photo_div in image_soup('div', {'class' : 'photo-image'}):
  34.                     for img in photo_div('img'):
  35.                         print img
  36.                         print filename
  37.                         f = urllib.urlopen(img['src'])
  38.                         result = f.read()
  39.                         f.close()
  40.  
  41.             # Replace /tmp/ below with the path to a folder on your hard drive
  42.                         img = open('C:\Downloads' + filename + '.JPG', 'wb+')
  43.                         img.write(result)
  44.                         img.close()
  45.  
The script is designed to scrape the url (in this case http://andiday.fotopic.net/c1343336.html) and download all the .jpg files from it to a folder on the hard disk (in this case, C:\Downloads)
Aug 24 '07 #7
bartonc
6,596 Expert 4TB
The script is designed to scrape the url (in this case http://andiday.fotopic.net/c1343336.html) and download all the .jpg files from it to a folder on the hard disk (in this case, C:\Downloads)
When debugging/troubleshooting, alway start at the source.
I'm not too internet savvy, but I'm pretty sure that this says something about "couldn't find the file":
Expand|Select|Wrap|Line Numbers
  1. >>> import urllib
  2. >>> f = urllib.urlopen('http://andiday.fotopic.net/c1343336.html/list_collections.php')
  3. >>> f.read()
  4. '\n\n<!-- /export/fotopic.net/userland/www/css/18102.css -->\n\n<link rel="stylesheet" href="http://media.fotopic.net/virtualv1/25/style.css" type="text/css">\n<body>\n\n<table border=0 cellpadding=4 cellspacing=2 width=100%>\n<tr><td class="content">\n<center><h2><div class="photo">404: Page Not found</h2></center>\n<div class="photo">We\'re sorry but we couldn\'t find the file you requested:\n<ul>\n<strong><div class="photo">http://andiday.fotopic.net/c1343336.html/list_collections.php</strong>\n</ul>\n<div class="photo">So either it doesn\'t exist, or it\'s been moved.\n<p>\n\n<div class="photo">If you\'re looking for a particular person\'s gallery, you could try looking at<br/>\nour <a href="http://fotopic.net/community/">Community</a> section, or\nalternatively take a look at <a href="http://fotopic.net/">the main Fotopic\nsite</a>.\n\n<p>\n\n</td></tr>\n</table>\n\n</body>\n'
  5. >>> f.close()
  6. >>> del f
  7. >>> del urllib
  8. >>> 
Aug 24 '07 #8
When debugging/troubleshooting, alway start at the source.
I'm not too internet savvy, but I'm pretty sure that this says something about "couldn't find the file":
Expand|Select|Wrap|Line Numbers
  1. >>> import urllib
  2. >>> f = urllib.urlopen('http://andiday.fotopic.net/c1343336.html/list_collections.php')
  3. >>> f.read()
  4. '\n\n<!-- /export/fotopic.net/userland/www/css/18102.css -->\n\n<link rel="stylesheet" href="http://media.fotopic.net/virtualv1/25/style.css" type="text/css">\n<body>\n\n<table border=0 cellpadding=4 cellspacing=2 width=100%>\n<tr><td class="content">\n<center><h2><div class="photo">404: Page Not found</h2></center>\n<div class="photo">We\'re sorry but we couldn\'t find the file you requested:\n<ul>\n<strong><div class="photo">http://andiday.fotopic.net/c1343336.html/list_collections.php</strong>\n</ul>\n<div class="photo">So either it doesn\'t exist, or it\'s been moved.\n<p>\n\n<div class="photo">If you\'re looking for a particular person\'s gallery, you could try looking at<br/>\nour <a href="http://fotopic.net/community/">Community</a> section, or\nalternatively take a look at <a href="http://fotopic.net/">the main Fotopic\nsite</a>.\n\n<p>\n\n</td></tr>\n</table>\n\n</body>\n'
  5. >>> f.close()
  6. >>> del f
  7. >>> del urllib
  8. >>> 
What should I do with this code?
Aug 25 '07 #9
bartonc
6,596 Expert 4TB
What should I do with this code?
Find the address of a valid list_collections.php.
Aug 25 '07 #10
I don't understand :(

Sorry for being so noobish! lol
Aug 26 '07 #11
bartonc
6,596 Expert 4TB
The script is designed to scrape the url (in this case http://andiday.fotopic.net/c1343336.html) and download all the .jpg files from it to a folder on the hard disk (in this case, C:\Downloads)
The script will do what you want, but the message from f.read() is telling you that the address that you are using is not valid. It looks like there is information in that message that may help you find a valid address. Sorry to be so vague, but, as I've said, I'm not a web-scraping kind of developer.
Aug 26 '07 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Dan Stromberg | last post by:
Has anyone tried to construct an HTML janitor script using BeautifulSoup? My situation: I'm trying to convert a series of web pages from .html to palmdoc format, using plucker, which is...
4
by: Johnny Lee | last post by:
Hi, I've met a problem in match a regular expression in python. Hope any of you could help me. Here are the details: I have many tags like this: xxx<a href="http://xxx.xxx.xxx" xxx>xxx xxx<a...
2
by: ted | last post by:
Hi, I'm using the BeautifulSoup module and having some trouble processing a file. It's not printing what I'm expecting. In the code below, I'm expecting cells with only "bgcolor" attributes to...
7
by: Gonzillaaa | last post by:
I'm trying to get the data on the "Central London Property Price Guide" box at the left hand side of this page http://www.findaproperty.com/regi0018.html I have managed to get the data :) but...
10
by: RunLevelZero | last post by:
I have some data and I need to put it in a list in a particular way. I have that figured out but there is " stuff " in the data that I don't want. Example: 10:00am - 11:00am:</b> <a...
4
by: William Xu | last post by:
Hi, all, This piece of code used to work well. i guess the error occurs after some upgrade. >>> import urllib >>> from BeautifulSoup import BeautifulSoup >>> url = 'http://www.google.com'...
2
by: s. d. rose | last post by:
Hello All. I am learning Python, and have never worked with HTML. However, I would like to write a simple script to audit my 100+ Netware servers via their web portal. I was reading Chapter 8...
2
by: Frank Stutzman | last post by:
I've got a simple script that looks like (watch the wrap): --------------------------------------------------- import BeautifulSoup,urllib ifile =...
2
by: Alexnb | last post by:
Okay, I am not sure if there is a better way of doing this than findAll() but that is how I am doing it right now. I am making an app that screen scapes dictionary.com for definitions. However, I...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.