473,806 Members | 2,319 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

using python to visit web sites and print the web sites image to files

imx
Hi there,

I wonder whether python can be used to simulate a real user to do the
following:
1) open a web site in a browser;
2) printscreen, so to copy the current active window image to
clipboard;
3) save the image file to a real file

Any pointer will be apprieciated!

Xiong

Mar 12 '07 #1
10 1810
On Mar 12, 7:32 am, "imx" <xiong.xu...@gm ail.comwrote:
I wonder whether python can be used to simulate a real user to do the
following:
1) open a web site in a browser;
2) printscreen, so to copy the current active window image to
clipboard;
3) save the image file to a real file
Any pointer will be apprieciated!
Xiong
Google pywinauto.

HTH

Davy
Mar 12 '07 #2
>
I wonder whether python can be used to simulate a real user to do the
following:
1) open a web site in a browser;
2) printscreen, so to copy the current active window image to
clipboard;
3) save the image file to a real file

Any pointer will be apprieciated!
Which OS?
Mar 12 '07 #3
You can definitely create a web bot with python. It doesn't require
that you "drive" A real web browser. There are libraries to open web
pages, scrape their contents, and do downloading. That would make your
bot platform neutral. Driving a GUI browser has the risk of being a
brittle script that might not handle different browsers, different
platforms, maybe even not handle different versions.

I run a mediawiki web site, and found a handy python-based library
written to manage it called pywikipediabot at http://sourceforge.net/projects/pywikipediabot/.

Okay, this library won't do your leg work for you, but it has pieces
and parts that demonstrate how to use python to surf a web site. Then,
with an HTML parser, you can hunt down images.

Greg

Mar 12 '07 #4
Goldfish wrote:
>
I run a mediawiki web site, and found a handy python-based library
written to manage it called pywikipediabot at
http://sourceforge.net/projects/pywikipediabot/.
>
This sounds interesting. My daughter had a nightmare that a hacker
invaded her Orkut and blanked all 1500+ scraps. This is not impossible.
Maybe I should save the contents to a file...

Alberto Monteiro

Mar 12 '07 #5
Goldfish wrote:
You can definitely create a web bot with python. It doesn't require
that you "drive" A real web browser.
That's true, but if you want to print the page to a file, you need
something that can reproduce the intended layout. The Pyglet library
developers mention "XML/HTML+CSS" as something the layout engine can
deal with, which sounds quite impressive if its support of CSS is
comprehensive:

http://pyglet.org/

Paul

Mar 12 '07 #6
imx
On 3ÔÂ13ÈÕ, ÉÏÎç4ʱ26·Ö, "Paul Boddie" <p...@boddie..o rg.ukwrote:
Goldfish wrote:
You can definitely create a web bot with python. It doesn't require
that you "drive" A real web browser.

That's true, but if you want to print the page to a file, you need
something that can reproduce the intended layout. The Pyglet library
developers mention "XML/HTML+CSS" as something the layout engine can
deal with, which sounds quite impressive if its support of CSS is
comprehensive:

http://pyglet.org/

Paul
Thanks for all the replies.
I will check pyglet to see if it can help.

The reason I want to do simulation but not just crawling is : we have
to check many web pages' front page to see whether it conform to our
visual standard, e.g, it should put a search box on the top part of
the page. It's tedious for human work. So I want to 'crawl and save
the visual presentation of the web site automatically', and check
these image files later with human eyes.

-Xiong

Mar 13 '07 #7
imx
On 3ÔÂ13ÈÕ, ÉÏÎç12ʱ39·Ö, "daftspan...@gm ail.com" <daftspan...@gm ail.com>
wrote:
On Mar 12, 7:32 am, "imx" <xiong.xu...@gm ail.comwrote:
I wonder whether python can be used to simulate a real user to do the
following:
1) open a web site in a browser;
2) printscreen, so to copy the current active window image to
clipboard;
3) save the image file to a real file
Any pointer will be apprieciated!
Xiong

Google pywinauto.

HTH

Davy
I checked pyglet, it's in early development stage. Since I'm using
windows, I will try pywinauto.

Thanks,
Xiong

Mar 13 '07 #8
The reason I want to do simulation but not just crawling is : we have
to check many web pages' front page to see whether it conform to our
visual standard, e.g, it should put a search box on the top part of
the page. It's tedious for human work. So I want to 'crawl and save
the visual presentation of the web site automatically', and check
these image files later with human eyes.

-Xiong
Hi Xiong,

I have been working on a program to do something very similar to
generate thumbnails of websites.

The code is in IronPython (which may put you off!) and would need
modified or scripted with pywinauto to deal with multiple images.

Let me know if it is of use to you and I will upload it.

Cheers,
Davy

Mar 13 '07 #9
imx
On 3ÔÂ14ÈÕ, ÉÏÎç5ʱ44·Ö, "daftspan...@gm ail.com" <daftspan...@gm ail.com>
wrote:
The reason I want to do simulation but not just crawling is : we have
to check many web pages' front page to see whether it conform to our
visual standard, e.g, it should put a search box on the top part of
the page. It's tedious for human work. So I want to 'crawl and save
the visual presentation of the web site automatically', and check
these image files later with human eyes.
-Xiong

Hi Xiong,

I have been working on a program to do something very similar to
generate thumbnails of websites.

The code is in IronPython (which may put you off!) and would need
modified or scripted with pywinauto to deal with multiple images.

Let me know if it is of use to you and I will upload it.

Cheers,
Davy
Cool, but does it mean that I will need .net to run the code?

Xiong

Mar 14 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
1973
by: Tim Parkin | last post by:
Terry Ready said: > YUCK< YUCK< YUCK. > <snip> > The pollenation site is one of the worst I have seen. The mockup page > has teeny type that IE will not enlarge. > <snip> > I care that the site remain physically readable and that it remain a > vehicle for information rather than childish egos. > <snip> > Using IE6, I need a magnifying glass
5
2456
by: Premshree Pillai | last post by:
Hello, I recently wrote a Perl version of pyAlbum.py -- a Python script to create an image album from a given directory -- plAlbum.pl . It made me realize how easy-to-use Python is. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/271246
44
4075
by: Xah Lee | last post by:
here's a large exercise that uses what we built before. suppose you have tens of thousands of files in various directories. Some of these files are identical, but you don't know which ones are identical with which. Write a program that prints out which file are redundant copies. Here's the spec. -------------------------- The program is to be used on the command line. Its arguments are one or
3
3749
by: Ricardo Sanchez | last post by:
Hello, I'm trying to upload images to http://imageshac.us via a Python script. I have looked at the POST request with HTTPLiveHeaders Firefox extension when I upload an image, but I can't figure what's wrong. (if I disable the cookies in the browser, it still works, so it's not that).
4
2718
by: Japhy | last post by:
Hello, I'm am pulling data from a mysql db and want to use the data to populate a <ul. Here are relavent parts of my code : $wohdate = mysql_result($wohRS,$wohndx,woh_date); $woh_display .="<li>".$wohdate."</li>" ; $TemplateText = Replace($TemplateText,"@$wohdisplayndx@",$woh_display);
5
6782
by: Michael Sperlle | last post by:
Is it possible? Bestcrypt can supposedly be set up on linux, but it seems to need changes to the kernel before it can be installed, and I have no intention of going through whatever hell that would cause. If I could create a large file that could be encrypted, and maybe add files to it by appending them and putting in some kind of delimiter between files, maybe a homemade version of truecrypt could be constructed. Any idea what it...
4
34916
true911m
by: true911m | last post by:
Here's a little walkthrough to get py2exe up and running. I'm not an expert, so I can't help much with any problems you might have. This is what worked for me. The result here will be to convert a simple python app into a single .exe file that can be copied and run on any Windows XP machine. It may work on many other Windows platforms, but I haven't tested it. You'll need a working Python installation first, preferably v2.3 or later. ...
21
34450
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most obvious of which is the sharing of files. For example, you upload images to a server to share them with other people over the Internet. Perl comes ready equipped for uploading files via the CGI.pm module, which has long been a core module and allows users...
11
2738
by: Faisal Vali | last post by:
Are there any guidelines people use that help them decide when it is better to dynamically generate all html elements using javascript versus actually writing some html and using it as scaffolding? I have been using the extjs framework ( I haven't see this library critiqued much on this forum - unlike prototype, jquery and dojo which the regulars here tend to eviscerate - unless i've missed some threads, which is quite possible) and it...
0
10618
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10371
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10110
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9187
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7649
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5678
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4329
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3850
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.