Hi,
I'm very new to Python. Basically all I want to do is run this scipt. But I don't know how to.
My OS is MS Widows XP
Any help is much appreciated.
P.S. I have already installed BeautifulSoup
11 2872
Hi,
I'm very new to Python. Basically all I want to do is run this scipt. But I don't know how to.
My OS is MS Widows XP
Any help is much appreciated.
P.S. I have already installed BeautifulSoup
You'll need to add the Python directory to your PATH environment variable. After that, I'd say de-compress the backup_fotopic.tar.gz file into a folder. Start a DOS shell and cd to that folder and type
backup_fotopic.py
Assuming that Python is installed correctly (.py files have the python icon) the script will run.
You'll need to add the Python directory to your PATH environment variable.
How do I do this?
Thanks
You'll need to add the Python directory to your PATH environment variable.
Right-click My Computer, go to Properties. On the Advanced tab, click the Environment Variables button. In the "User variables for <your name>" list, find the one called PATH. Select it and click the Edit button. Add some like
C:\python24;
(depending on the actual path and version of your python installation)
to the beginning or the line (yours may not have anything in it yet).
Got the PATH variable set, thanks.
When I run the script I get: - Traceback (most recent call last):
-
File "C:/Python25/backup_fotopic", line 30, in <module>
-
title = image_soup.first('title').contents[0]
-
AttributeError: 'NoneType' object has no attribute 'contents'
How do I fix this?
Thank you for your continued support.
Got the PATH variable set, thanks.
When I run the script I get: - Traceback (most recent call last):
-
File "C:/Python25/backup_fotopic", line 30, in <module>
-
title = image_soup.first('title').contents[0]
-
AttributeError: 'NoneType' object has no attribute 'contents'
How do I fix this?
Thank you for your continued support.
This is telling you that the result of - image_soup.first('title')
is None.
My guess is that 'title' is meaningless to the function.
You'll need to include some code and an explanation of what you expect it to do.
- #! /usr/bin/env python
-
-
import urllib, string
-
from BeautifulSoup import BeautifulSoup
-
-
collections_soup = BeautifulSoup()
-
-
# Replace the example URL below with the address of the pictures you want to backup
-
base_url = 'http://andiday.fotopic.net/c1343336.html'
-
-
f = urllib.urlopen(base_url + '/list_collections.php')
-
result = f.read()
-
f.close()
-
collections_soup.feed(result)
-
for collection in collections_soup('a'):
-
print '>>>' + base_url + collection['href']
-
f = urllib.urlopen(base_url + collection['href'])
-
result = f.read()
-
f.close()
-
collection_soup = BeautifulSoup()
-
collection_soup.feed(result)
-
for thumb in collection_soup('td', {'class' : 'thumbs'}):
-
for image in thumb('a'):
-
if string.find(image['href'], 'javascript') == -1 and string.find(image['href'], 'title') == -1:
-
f = urllib.urlopen(base_url + image['href'])
-
result = f.read()
-
f.close()
-
image_soup = BeautifulSoup()
-
image_soup.feed(result)
-
title = image_soup.first('title').contents[0]
-
filename = string.split(title.string, '.JPG')[0]
-
print filename
-
for photo_div in image_soup('div', {'class' : 'photo-image'}):
-
for img in photo_div('img'):
-
print img
-
print filename
-
f = urllib.urlopen(img['src'])
-
result = f.read()
-
f.close()
-
-
# Replace /tmp/ below with the path to a folder on your hard drive
-
img = open('C:\Downloads' + filename + '.JPG', 'wb+')
-
img.write(result)
-
img.close()
-
The script is designed to scrape the url (in this case http://andiday.fotopic.net/c1343336.html) and download all the .jpg files from it to a folder on the hard disk (in this case, C:\Downloads)
The script is designed to scrape the url (in this case http://andiday.fotopic.net/c1343336.html) and download all the .jpg files from it to a folder on the hard disk (in this case, C:\Downloads)
When debugging/troubleshooting, alway start at the source.
I'm not too internet savvy, but I'm pretty sure that this says something about "couldn't find the file": -
>>> import urllib
-
>>> f = urllib.urlopen('http://andiday.fotopic.net/c1343336.html/list_collections.php')
-
>>> f.read()
-
'\n\n<!-- /export/fotopic.net/userland/www/css/18102.css -->\n\n<link rel="stylesheet" href="http://media.fotopic.net/virtualv1/25/style.css" type="text/css">\n<body>\n\n<table border=0 cellpadding=4 cellspacing=2 width=100%>\n<tr><td class="content">\n<center><h2><div class="photo">404: Page Not found</h2></center>\n<div class="photo">We\'re sorry but we couldn\'t find the file you requested:\n<ul>\n<strong><div class="photo">http://andiday.fotopic.net/c1343336.html/list_collections.php</strong>\n</ul>\n<div class="photo">So either it doesn\'t exist, or it\'s been moved.\n<p>\n\n<div class="photo">If you\'re looking for a particular person\'s gallery, you could try looking at<br/>\nour <a href="http://fotopic.net/community/">Community</a> section, or\nalternatively take a look at <a href="http://fotopic.net/">the main Fotopic\nsite</a>.\n\n<p>\n\n</td></tr>\n</table>\n\n</body>\n'
-
>>> f.close()
-
>>> del f
-
>>> del urllib
-
>>>
When debugging/troubleshooting, alway start at the source.
I'm not too internet savvy, but I'm pretty sure that this says something about "couldn't find the file": -
>>> import urllib
-
>>> f = urllib.urlopen('http://andiday.fotopic.net/c1343336.html/list_collections.php')
-
>>> f.read()
-
'\n\n<!-- /export/fotopic.net/userland/www/css/18102.css -->\n\n<link rel="stylesheet" href="http://media.fotopic.net/virtualv1/25/style.css" type="text/css">\n<body>\n\n<table border=0 cellpadding=4 cellspacing=2 width=100%>\n<tr><td class="content">\n<center><h2><div class="photo">404: Page Not found</h2></center>\n<div class="photo">We\'re sorry but we couldn\'t find the file you requested:\n<ul>\n<strong><div class="photo">http://andiday.fotopic.net/c1343336.html/list_collections.php</strong>\n</ul>\n<div class="photo">So either it doesn\'t exist, or it\'s been moved.\n<p>\n\n<div class="photo">If you\'re looking for a particular person\'s gallery, you could try looking at<br/>\nour <a href="http://fotopic.net/community/">Community</a> section, or\nalternatively take a look at <a href="http://fotopic.net/">the main Fotopic\nsite</a>.\n\n<p>\n\n</td></tr>\n</table>\n\n</body>\n'
-
>>> f.close()
-
>>> del f
-
>>> del urllib
-
>>>
What should I do with this code?
What should I do with this code?
Find the address of a valid list_collections.php.
I don't understand :(
Sorry for being so noobish! lol
The script is designed to scrape the url (in this case http://andiday.fotopic.net/c1343336.html) and download all the .jpg files from it to a folder on the hard disk (in this case, C:\Downloads)
The script will do what you want, but the message from f.read() is telling you that the address that you are using is not valid. It looks like there is information in that message that may help you find a valid address. Sorry to be so vague, but, as I've said, I'm not a web-scraping kind of developer.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Dan Stromberg |
last post by:
Has anyone tried to construct an HTML janitor script using BeautifulSoup?
My situation:
I'm trying to convert a series of web pages from .html to palmdoc format,
using plucker, which is...
|
by: Johnny Lee |
last post by:
Hi,
I've met a problem in match a regular expression in python. Hope
any of you could help me. Here are the details:
I have many tags like this:
xxx<a href="http://xxx.xxx.xxx" xxx>xxx
xxx<a...
|
by: ted |
last post by:
Hi,
I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm expecting
cells with only "bgcolor" attributes to...
|
by: Gonzillaaa |
last post by:
I'm trying to get the data on the "Central London Property Price Guide"
box at the left hand side of this page
http://www.findaproperty.com/regi0018.html
I have managed to get the data :) but...
|
by: RunLevelZero |
last post by:
I have some data and I need to put it in a list in a particular way. I
have that figured out but there is " stuff " in the data that I don't
want.
Example:
10:00am - 11:00am:</b> <a...
|
by: William Xu |
last post by:
Hi, all,
This piece of code used to work well. i guess the error occurs after
some upgrade.
>>> import urllib
>>> from BeautifulSoup import BeautifulSoup
>>> url = 'http://www.google.com'...
|
by: s. d. rose |
last post by:
Hello All.
I am learning Python, and have never worked with HTML. However, I would
like to write a simple script to audit my 100+ Netware servers via their web
portal.
I was reading Chapter 8...
|
by: Frank Stutzman |
last post by:
I've got a simple script that looks like (watch the wrap):
---------------------------------------------------
import BeautifulSoup,urllib
ifile =...
|
by: Alexnb |
last post by:
Okay, I am not sure if there is a better way of doing this than findAll() but
that is how I am doing it right now. I am making an app that screen scapes
dictionary.com for definitions. However, I...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
| |