473,288 Members | 2,725 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

Python - why don't this script work?

I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:
http://web.mit.edu/pgbovine/www/imag...e-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:
http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os
Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.
--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #1
10 4334
On Oct 22, 9:47 pm, Ohmster <r...@dev.nul.invalidwrote:
I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:http://web.mit.edu/pgbovine/www/imag...e-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.pyhttp://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os

Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.
I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.

Oct 23 '07 #2
Ohmster wrote:
I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:
http://web.mit.edu/pgbovine/www/imag...e-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:
http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os# Usage: python image-harvester.py <url-to-harvest>
Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.

Your linux shell thinks it is running a shell script (from is not a
valid command in bash).

To execute this script with the python interpreter type (from a shell
prompt):

python image-harvester.py http://some.url.whatever/images_page

Read the comments at the beginning of the script and you will discover
all sorts of important usage information.

Regards,

John
Oct 23 '07 #3
Ohmster <ro**@dev.nul.invalidwrote in
news:Xn************************@194.177.96.26:
Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not
found [ohmster@ohmster bench]$
The original page for this script is here:
http://web.mit.edu/pgbovine/www/image-harvester.htm

I figured it out, I have to run python I think first then the script and
the URL like this:
$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it

--
~Ohmster | ohmster /a/t/ ohmster dot com call ohmster
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #4
Ohmster wrote:
Here is my output when I try to run it on my Fedora 6 machine:
[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
Check line 59 in the python script and you see which command you are missing.
I bet you didn't read what it said on the page
http://web.mit.edu/pgbovine/www/image-harvester.htm

Install the programs that mentioned on the page before you use the python script.

--

//Aho
Oct 23 '07 #5
Steve Ackman <st***@SNIP-THIS.twoloonscoffee.com wrote in
news:sl******************@sorceror.wizard.dyndns.o rg:

[snip]
Did you bother reading the comments? If you had, you'd
know that's not how you run it.
When run as directed (and common sense dictates),
it works fine.
[snip]

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.
--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #6
John McMonagle <jm********@velseis.com.auwrote in
news:ma**************************************@pyth on.org:
Your linux shell thinks it is running a shell script (from is not a
valid command in bash).

To execute this script with the python interpreter type (from a shell
prompt):

python image-harvester.py http://some.url.whatever/images_page

Read the comments at the beginning of the script and you will discover
all sorts of important usage information.

Regards,

John
Thanks John.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #7
"J.O. Aho" <us**@example.netwrote in
news:5o************@mid.individual.net:
Check line 59 in the python script and you see which command you are
missing. I bet you didn't read what it said on the page
http://web.mit.edu/pgbovine/www/image-harvester.htm

Install the programs that mentioned on the page before you use the
python script.
I figured it out, see my other reply. I have to run this command begining
with "python". I still don't get the results I want but I think it is
because the images are protected with script. My other post in this thread
gives the details. If you have more ideas, I am all ears.

Thanks AHO.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #8
Adam Atlas <ad**@atlas.stwrote in news:1193108392.089611.91170
@v29g2000prd.googlegroups.com:
I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.
Hey that is a cool idea, I think I will try it. I found out what is wrong
and did not get the results I want, I think the images are protected with
script. See my other post in this thread for details.

Shoot, the followup might have gone to alt.os.linux. I will repost for you
here.

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 23 '07 #9
On Oct 23, 6:50 am, Ohmster <r...@dev.nul.invalidwrote:
Adam Atlas <a...@atlas.stwrote in news:1193108392.089611.91170
@v29g2000prd.googlegroups.com:
I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.

Hey that is a cool idea, I think I will try it. I found out what is wrong
and did not get the results I want, I think the images are protected with
script. See my other post in this thread for details.

Shoot, the followup might have gone to alt.os.linux. I will repost for you
here.

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.pyhttp://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Do note that the reason you may not see images is that the website
has, '''correctly''', identified your program as an automated bot and
blocked it access to things...

Oct 23 '07 #10
co*********@gmail.com wrote in news:1193127053.740024.144730
@q5g2000prf.googlegroups.com:
>
Do note that the reason you may not see images is that the website
has, '''correctly''', identified your program as an automated bot and
blocked it access to things...
Probably so, I did not get anything, even with all of that flurry of
activity, the results were 0 images. :(

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
Oct 24 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: dmbkiwi | last post by:
I am new to this group, and relatively new to python programming, however, have encountered a problem I just cannot solve through reading the documentation, and searching this group on google. I...
4
by: Logan | last post by:
Several people asked me for the following HOWTO, so I decided to post it here (though it is still very 'alpha' and might contain many (?) mistakes; didn't test what I wrote, but wrote it - more or...
34
by: Erik Johnson | last post by:
This is somewhat a NEWBIE question... My company maintains a small RDBS driven website. We currently generate HTML using PHP. I've hacked a bit in Python, and generally think it is a rather...
52
by: Olivier Scalbert | last post by:
Hello , What is the python way of doing this : perl -pi -e 's/string1/string2/' file ? Thanks Olivier
16
by: Paul Prescod | last post by:
I skimmed the tutorial and something alarmed me. "Strings are a powerful data type in Prothon. Unlike many languages, they can be of unlimited size (constrained only by memory size) and can hold...
33
by: Darren Dale | last post by:
I love the language. I love the community. My only complaint is that Python for Windows is built with Visual Studio. It is too difficult to build python, or a module, from source. This is what...
68
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
19
by: John Salerno | last post by:
Hey all. Just thought I'd ask a general question for my own interest. Every time I think of something I might do in Python, it usually involves creating a GUI interface, so I was wondering what kind...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.