473,402 Members | 2,055 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

Open and process remote page

I want to be able to ask users for a URL, open that page, change some of the
contents and then display that page as if they had typed the URL into a
browser. I have toyed with some of the php functions for opening URLs, but
what I am not clear on is how much work my script will have to do (do I need
to fully emulate a browser, for example).

The net effect I am after is very similar to the page translation feature
that Google offers. Does anyone have any examples of this kind of technique.
Any ideas how much work is involved? (My 'translation' is pretty trivial, so
really it is mostly a question of how much work to display the remote page).

Thanks in advance!

Regards,

William
Dec 9 '05 #1
4 2855
Following on from William Hudson's message. . .
I want to be able to ask users for a URL, open that page, change some of the
contents and then display that page as if they had typed the URL into a
browser. I have toyed with some of the php functions for opening URLs, but
what I am not clear on is how much work my script will have to do (do I need
to fully emulate a browser, for example).

The net effect I am after is very similar to the page translation feature
that Google offers. Does anyone have any examples of this kind of technique.
Any ideas how much work is involved? (My 'translation' is pretty trivial, so
really it is mostly a question of how much work to display the remote page).


Personally I'd approach this by writing a proxy server in Java and deal
at the header level with http requests/responses rather than try to be
an arms-length browser/server.

With your PHP approach: If you want to 'serve' mydomain.com/index.htm
then you need to fetch it, parse it recursively looking for urls inside
frameset, css and javascript and perhaps hack them and at least fetch
them. A 'web page' is not necessarily a single entity. Suppose
index.htm is just a frameset. You could 'translate' this as much as you
like but the 'real' content would be missed.

With the proxy server you look for http responses with mime types of
interest and translate the data as appropriate then pass on.

Java Examples In A Nutshell (Pub. O'Reilly) shows how straightforward it
is. You'd need to bootstrap a proxy session from your normal PHP pages
by telling it who's calling and what they want to see.

--
PETER FOX Not the same since the borehole business dried up
pe******@eminent.demon.co.uk.not.this.bit.no.html
2 Tees Close, Witham, Essex.
Gravity beer in Essex <http://www.eminent.demon.co.uk>
Dec 9 '05 #2
William Hudson wrote:
I want to be able to ask users for a URL, open that page, change some of
the contents and then display that page as if they had typed the URL into
a browser. I have toyed with some of the php functions for opening URLs,
but what I am not clear on is how much work my script will have to do (do
I need to fully emulate a browser, for example).

The net effect I am after is very similar to the page translation feature
that Google offers. Does anyone have any examples of this kind of
technique. Any ideas how much work is involved? (My 'translation' is
pretty trivial, so really it is mostly a question of how much work to
display the remote page).

Thanks in advance!
Hi William,

Some thoughts:

- Opening a remote URL is very easy in PHP as you probably found out.
(Just fopen and offer an URL, PHP will in most cases wrap the whole complex
request into a handle that can be treated as a (readonly) file.)

beware however of the paranoid webdesigner.
Many people have this twisted idea that they want to offer content to the
world, but try to make it difficult for you to read the source.
Often Javascript is used to make things more difficult.
(Beats me why, but they come in masses.)

If you only want normal plain HTML-pages, I think you can just fopen,
replace the stuff you want, and deliver that (in a frame eg., or whatever
you like).

Also be aware of redirects by the server. (page moved)
I have seen a few situation where PHP doesn't handle that very well.
Or maybe it was the webserver sending something strange, I do not remember
for sure, I only remember that PHP and redirects with fopen-wrapper around
an URL had some issues.
Beside the above possible traps, I do not expect you will find a lot of
trouble. I once wrote something similar, be it more simple than what you
are doing, and it was all very straightforward.

You could also get yourself in trouble (with regexpr. or substringsearching,
etc) when trying to replace some pieces in the HTML when the HTML is not
coded as it 'should' be: Think about missing end-tags and the like.
Browsers are very forgiving, but the programmers of the browsers had
headaches before their program was forgiving enough. :-)

But maybe you can get away with just replacing stuff you understand, and let
the remainding HTML as it was. Then the browser can display it the way it
was ment. (probably).

just my 2 cents.

Good luck.

Regards,
Erwin Moller

Regards,

William


Dec 9 '05 #3
Ian
Erwin Moller wrote:
You could also get yourself in trouble (with regexpr. or substringsearching,
etc) when trying to replace some pieces in the HTML when the HTML is not
coded as it 'should' be: Think about missing end-tags and the like.
Browsers are very forgiving, but the programmers of the browsers had
headaches before their program was forgiving enough. :-)

But maybe you can get away with just replacing stuff you understand, and let
the remainding HTML as it was. Then the browser can display it the way it
was ment. (probably).

HTML tidy is your friend here. It has saved me from many a nasty
frontpage generate HTML page :)

Ian
Dec 9 '05 #4
Krustov wrote:
$rocky=str_replace("BBC","Bungholes",$contents); $contents=$rocky;
$rocky=str_replace("bbc","Bungholes",$contents); $contents=$rocky;
$rocky=str_replace("the","its only a word",$contents); $contents=$rocky;


What's all this $rocky junk?

$contents = str_replace("BBC","Bungholes",$contents);
$contents = str_replace("bbc","Bungholes",$contents);
$contents = str_replace("the","its only a word",$contents);
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Dec 11 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: TJ | last post by:
Anyone know how to display process id's (normally found by going to Component Services and switching to applications to 'Status View') in a web page? I'm trying to keep remote users from having...
3
by: rmcgregor | last post by:
I want to open remote csv files from within Access XP that are behind websites. I was trying to do something like this (below) to just show me it could see the file and allow me to process it...
6
by: Brad | last post by:
I have a win2003 server workstation with multiple webs, each web has it's own ip address. In VS2005, if I select to open an existing web site, select Local IIS, the dialog correctly displays a...
0
by: John Holmes | last post by:
I would like to develop my asp.net applications on a remote development IIS server that has all of our website on it so the image paths work correctly and so I can work with others. I am using...
0
by: quinalnking | last post by:
Hello All, I am working on a project in VB 2005 and would want to have a list of open windows on a remote computer on my LAN. Can anyone guide me to it? I tried making a standalone app and...
6
by: =?Utf-8?B?YzY3NjIyOA==?= | last post by:
Hi all, I am thinking about doing this since I got several cases that some of our internal users open more than one browser at the same time from our server. When one of the transactions was not...
7
by: =?Utf-8?B?Vmlua2k=?= | last post by:
public void sendKeysTest() { Process myProcess = Process.Start(@"C:\winnt\system32\cmd.exe"); SetForegroundWindow(myProcess.Handle); if (myProcess.Responding) SendKeys.SendWait("{ENTER}");...
0
by: Guern1 | last post by:
Hi Sorry if I have posted this to the wrong forum. Need a bit of help here please to point me in the right direction. I have a java class file here which i wish from a menu item to open a...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.