471,348 Members | 1,862 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,348 software developers and data experts.

Python - why don't this script work?

I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:
http://web.mit.edu/pgbovine/www/imag...e-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:
http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os
Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.
--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #1
10 4163
On Oct 22, 9:47 pm, Ohmster <r...@dev.nul.invalidwrote:
I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:http://web.mit.edu/pgbovine/www/imag...e-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.pyhttp://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os

Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.
I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.

Oct 23 '07 #2
Ohmster wrote:
I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:
http://web.mit.edu/pgbovine/www/imag...e-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:
http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os# Usage: python image-harvester.py <url-to-harvest>
Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.

Your linux shell thinks it is running a shell script (from is not a
valid command in bash).

To execute this script with the python interpreter type (from a shell
prompt):

python image-harvester.py http://some.url.whatever/images_page

Read the comments at the beginning of the script and you will discover
all sorts of important usage information.

Regards,

John
Oct 23 '07 #3
Ohmster <ro**@dev.nul.invalidwrote in
news:Xn************************@194.177.96.26:
Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not
found [ohmster@ohmster bench]$
The original page for this script is here:
http://web.mit.edu/pgbovine/www/image-harvester.htm

I figured it out, I have to run python I think first then the script and
the URL like this:
$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it

--
~Ohmster | ohmster /a/t/ ohmster dot com call ohmster
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #4
Ohmster wrote:
Here is my output when I try to run it on my Fedora 6 machine:
[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
Check line 59 in the python script and you see which command you are missing.
I bet you didn't read what it said on the page
http://web.mit.edu/pgbovine/www/image-harvester.htm

Install the programs that mentioned on the page before you use the python script.

--

//Aho
Oct 23 '07 #5
Steve Ackman <st***@SNIP-THIS.twoloonscoffee.com wrote in
news:sl******************@sorceror.wizard.dyndns.o rg:

[snip]
Did you bother reading the comments? If you had, you'd
know that's not how you run it.
When run as directed (and common sense dictates),
it works fine.
[snip]

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.
--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #6
John McMonagle <jm********@velseis.com.auwrote in
news:ma**************************************@pyth on.org:
Your linux shell thinks it is running a shell script (from is not a
valid command in bash).

To execute this script with the python interpreter type (from a shell
prompt):

python image-harvester.py http://some.url.whatever/images_page

Read the comments at the beginning of the script and you will discover
all sorts of important usage information.

Regards,

John
Thanks John.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #7
"J.O. Aho" <us**@example.netwrote in
news:5o************@mid.individual.net:
Check line 59 in the python script and you see which command you are
missing. I bet you didn't read what it said on the page
http://web.mit.edu/pgbovine/www/image-harvester.htm

Install the programs that mentioned on the page before you use the
python script.
I figured it out, see my other reply. I have to run this command begining
with "python". I still don't get the results I want but I think it is
because the images are protected with script. My other post in this thread
gives the details. If you have more ideas, I am all ears.

Thanks AHO.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #8
Adam Atlas <ad**@atlas.stwrote in news:1193108392.089611.91170
@v29g2000prd.googlegroups.com:
I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.
Hey that is a cool idea, I think I will try it. I found out what is wrong
and did not get the results I want, I think the images are protected with
script. See my other post in this thread for details.

Shoot, the followup might have gone to alt.os.linux. I will repost for you
here.

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #9
On Oct 23, 6:50 am, Ohmster <r...@dev.nul.invalidwrote:
Adam Atlas <a...@atlas.stwrote in news:1193108392.089611.91170
@v29g2000prd.googlegroups.com:
I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.

Hey that is a cool idea, I think I will try it. I found out what is wrong
and did not get the results I want, I think the images are protected with
script. See my other post in this thread for details.

Shoot, the followup might have gone to alt.os.linux. I will repost for you
here.

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.pyhttp://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Do note that the reason you may not see images is that the website
has, '''correctly''', identified your program as an automated bot and
blocked it access to things...

Oct 23 '07 #10
co*********@gmail.com wrote in news:1193127053.740024.144730
@q5g2000prf.googlegroups.com:
>
Do note that the reason you may not see images is that the website
has, '''correctly''', identified your program as an automated bot and
blocked it access to things...
Probably so, I did not get anything, even with all of that flurry of
activity, the results were 0 images. :(

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 24 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by dmbkiwi | last post: by
4 posts views Thread by Logan | last post: by
34 posts views Thread by Erik Johnson | last post: by
52 posts views Thread by Olivier Scalbert | last post: by
16 posts views Thread by Paul Prescod | last post: by
33 posts views Thread by Darren Dale | last post: by
68 posts views Thread by Lad | last post: by
19 posts views Thread by John Salerno | last post: by
1 post views Thread by Ronak mishra | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.