473,853 Members | 1,984 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

capturing and using html from website

Hi all. I'm a bit of a novice in this arena so please forgive if this
question reflects that. I am trying to grab the html from a website and
display it within another webpage (once I get this to work I am going to
manipulate the html in other ways - this isn't the end purpose of this
effort). To do this I am trying to open another window containing the
source html from a URL and then capture the html from that window. I
can open the window fine but get an "access denied" error when trying to
assign the html to a variable. The basic code follows. Basically any
way that I can assign the html that results from an entered URL to a
javascript variable or object that I can then manipulate should work for
me. Suggestions?
Thanks in advance
Larry

<html>
<form>
Paste URL here: <input name=url value='http://www.yahoo.com'>
<input type=button onclick="try()" value=Go>
<input type=reset>
</form>
<p id=here></p>
<script>
function try() {
if (document.forms[0].url.value=='') {return};
// open a new window with the url from the user.

window2=window. open(document.f orms[0].url.value,""," height=0,width= 0");
// get the content of the new page. NEXT IS THE LINE THAT GETS THE
ACCESS DENIED ERROR.

t=window2.docum ent.body.innerH TML;

// display the content in this page.

here.innerHTML= t;

// close the new page.

window2.close() ;

};

</script>
</html>
Jul 23 '05 #1
15 2025
Larry Asher wrote:
Hi all. I'm a bit of a novice in this arena so please forgive if this
question reflects that. I am trying to grab the html from a website and
display it within another webpage


http://jibbering.com/faq/#FAQ4_19

--
David Dorward <http://blog.dorward.me .uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Jul 23 '05 #2
VK
It's called "content stealing", the oldest and the worst sin of WWW.
This group is not a new-thief info source. Try www.astalavista.com or
so.

Jul 23 '05 #3
VK wrote:
It's called "content stealing", the oldest and the worst sin of WWW.
This group is not a new-thief info source. Try www.astalavista.com or
so.


Thank you for your reply. However, just because "content stealing" is
one application of this doesn't mean it is the only one. If you are
interested we are applying AI algorithms and genetic search techniques
to analyze semantic representations and navigation paths of large
complex corporate intranets - WITH permission. The hard part (the AI
algorithms and such) is done. We just need to automate the process of
navigating through the links.
Jul 23 '05 #4
David Dorward wrote:
Larry Asher wrote:

Hi all. I'm a bit of a novice in this arena so please forgive if this
question reflects that. I am trying to grab the html from a website and
display it within another webpage

http://jibbering.com/faq/#FAQ4_19


Thank you for the link. That is quite useful. Anyone know a way around
this?
Jul 23 '05 #5
Larry Asher wrote:
http://jibbering.com/faq/#FAQ4_19
Thank you for the link. That is quite useful. Anyone know a way around
this?


It tells you the ways around it at that link

--
David Dorward <http://blog.dorward.me .uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Jul 23 '05 #6
Larry Asher wrote:
Thank you for your reply. However, just because "content stealing" is
one application of this doesn't mean it is the only one. If you are
interested we are applying AI algorithms and genetic search techniques
to analyze semantic representations and navigation paths of large
complex corporate intranets - WITH permission.


.... and your writing the application in JavaScript that runs in a
webbrowser? Blimey.

--
David Dorward <http://blog.dorward.me .uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Jul 23 '05 #7
David Dorward wrote:
Larry Asher wrote:

Thank you for your reply. However, just because "content stealing" is
one application of this doesn't mean it is the only one. If you are
interested we are applying AI algorithms and genetic search techniques
to analyze semantic representations and navigation paths of large
complex corporate intranets - WITH permission.

... and your writing the application in JavaScript that runs in a
webbrowser? Blimey.


We are only using javascript to collect and archive the html
(alternative suggestions appreciated). The algorithms have been written
in C++. I'm a mathematician with some programming skills not the other
way around. I am particularly inexperienced at web based programming,
thus my question.

On the workaround, I somehow missed the link - mental fatigue no doubt.
Thanks again for that information.
Jul 23 '05 #8
VK
> However, just because "content stealing" is
one application of this doesn't mean it is the only one. If you are
interested we are applying AI algorithms and genetic search techniques
to analyze semantic representations and navigation paths of large
complex corporate intranets - WITH permission. The hard part (the AI
algorithms and such) is done. We just need to automate the process of
navigating through the links.


It's like "I need a full local drive access w/o prompts. I don't do
anything malicious, I just want to provide more convenience to our
users" (pops up from time to time in this group). Unfortunately browser
has its security politics and it doesn't accept any swears, whether
verbal, written or blood signed.

If involved sites are *really* involved, they have to change their page
accordingly (onload="report the content to the parent")

Or, if your project is so serious as it is stated, you definitely can
found extra $199 for code signing certificate and read what you want
from wherever you want (putting your good name on it).
<http://www.thawte.com/codesign/index.html>

Jul 23 '05 #9
On Sun, 17 Jul 2005 14:16:35 GMT, Larry Asher <la***@nowhere. com>
wrote:
We are only using javascript to collect and archive the html
(alternative suggestions appreciated).


Just automate a browser in C++ or javascript or whatever, the
solutions are Zeepe or HTA type constructs to do it in Script with IE,
or IWebBrowser2 automation in Windows C++. Or for mozilla, automate
it using a Mozilla plugin. It's all simple, there's lots of ways of
collecting HTML for such purposes. Pure javascript is probably not
the best, unless you want a quick knock up to automate sites in IE.

Jim.
Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
5629
by: Earl Eiland | last post by:
Anyone know how to capture text from GUI output? I need to process information returned via a GUI window. Earl
15
7065
by: Tony Gahlinger | last post by:
I'm using my browser (Mozilla/5.0 Linux i686 Gecko/20031007 Firebird/0.7) to do some client-side image processing. I want to capture the sequence of coordinates a user clicks on in xxx.jpg in the following html <a href="#"><IMG SRC="xxx.jpg" ISMAP></a> and save these to a file for later handling. The coordinates appear on the bottom left of the window as I move the mouse, so I know they're being passed around somehow, but I haven't...
12
15874
by: Sharad Gupta | last post by:
i have this problem of capturing the filename on the instance when onclick is activated in the <body> the function should catch the filename and display it. Second problem, i have to catch the image name in the same fashion using =onclick showimage(this). Is this possible at at I have tried all combinations of document.getElementbyId(); document.getElementbyName(); document.getElementbyTagName();
33
5671
by: Joerg Schuster | last post by:
Hello, Python regular expressions must not have more than 100 capturing groups. The source code responsible for this reads as follows: # XXX: <fl> get rid of this limitation! if p.pattern.groups > 100: raise AssertionError( "sorry, but this version only supports 100 named groups"
1
3428
by: kevin | last post by:
I am trying to strip the outermost html tag by capturing this tag with regex and then using the string replace function to replace it with an empty string. while stepping through the code, RegEx returns the entire input string although testing this in The Regulator returns just what I want. What am I doing wrong here? *********************************************** Regex regX; RegexOptions options = (RegexOptions.Multiline |...
14
2556
by: Brent Burkart | last post by:
I am trying to capture the Windows Authenticated username, but I want to be able to capture the login name that exists in IIS, not Windows. In order to enter my company's intranet through the internet, they have to login. I want to be able to capture that login versus their Windows login because I need to know who they are from any computer rather than only their computer. Any ideas? Thanks
10
6027
by: Andrew | last post by:
Hi, I have a messagebox that pops up due to an event. I did it in javascript. ie. alert("Time's up. Assessment Ended"); I want to capture the OK and Cancel events of this alert messagebox. My code is in C#/ASP.NET. TIA. Andrew.
2
3017
by: sergio | last post by:
i have a huge database that contains large amounts of html that i need to translate to ascii.. i have tried using html2text.py: http://www.aaronsw.com/2002/html2text/ but i could not figure out how to import it and use it as a library without getting errors everywhere..
7
2971
by: David Lozzi | last post by:
Howdy, I'm trying to capture the session end event. I put a spot of code in the Session_End event in the Global.asax.vb file. The function simply writes to a database table logging the event. I have the same function in the Session_Begin and the Application events. I am capturing the Session beginning and the App begin and end but no Session end. My end goal is to capture some session objects before it dies and log it. Thanks,
0
9748
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10670
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9506
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7907
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7074
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5736
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4549
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4143
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3180
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.