473,776 Members | 1,565 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with encrypted web pages?

I am trying to write a personal spider to crawl through websites and create
a highly specialized personal list of sites and pages that I may like to see
based on preferences that I have supplied. I have found some interesting
pages - interesting in the fact that they use javascript to encrypt the
pages to block people from ?stealing thier content?.

There are javascript tricks that you can use on the downloaded encrypted
page to get around these irritations. You have to run a javascript line in
the browsers address line and you get another window with the unencrypted
HTML in it. But, I want to see the HTML unencrypted without downloading
every image, wav, activex object and flash thingy on the page into an actual
webbrowser control. Utilizing a webbrowser control for this, and having to
dl all images and such would dramatically decrease the speed the spider can
crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way to
defeat it is by pasting
"javascript:win dow.open('about :blank').docume nt.write('<pre> ' +
document.docume ntElement.outer HTML.replace(/</g, '&lt;') + '</pre>')" in the
IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content theft.

But, is there anything in the .Net framework that will help with viewing an
encrypted web page's source for my spider? It seems I need to be able to
run Javascript to decode the page into readable HTML.....but, as I may have
said, I am only after the HTML....I really don't want to DL the pics and all
that other stuff - it kills my speed.

Any ideas?
Oct 5 '06 #1
8 1362
"smerf" <sm***@shroom.c omwrote in
news:zE******** **********@bign ews2.bellsouth. net:
But, is there anything in the .Net framework that will help with
viewing an encrypted web page's source for my spider? It seems I need
to be able to run Javascript to decode the page into readable
HTML.....but, as I may have said, I am only after the HTML....I really
don't want to DL the pics and all that other stuff - it kills my
speed.
No, you'll need to run the javascript to decode it.
Oct 5 '06 #2
Hello smerf,

If you studied the js and did a lil investigation you could easily figure
this out.

Here's some of the guts to get you started:

function p(y) {
var d='',i,r,m,g;
for(i=1;i<=y.le ngth;i++) {
r=y.charAt(i-1);
m=b.indexOf(r);
if(m>-1) {
g=((m+1)%n-1);
if(g<=0) {
g+=n
}
d+=b.charAt(g-1)
} else {
d+=r
}
}
k+=d
};

function fff() {
document.write( k);e=""
}
-Boo
I am trying to write a personal spider to crawl through websites and
create a highly specialized personal list of sites and pages that I
may like to see based on preferences that I have supplied. I have
found some interesting pages - interesting in the fact that they use
javascript to encrypt the pages to block people from ?stealing thier
content?.

There are javascript tricks that you can use on the downloaded
encrypted page to get around these irritations. You have to run a
javascript line in the browsers address line and you get another
window with the unencrypted HTML in it. But, I want to see the HTML
unencrypted without downloading every image, wav, activex object and
flash thingy on the page into an actual webbrowser control. Utilizing
a webbrowser control for this, and having to dl all images and such
would dramatically decrease the speed the spider can crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way
to defeat it is by pasting
"javascript:win dow.open('about :blank').docume nt.write('<pre> ' +
document.docume ntElement.outer HTML.replace(/</g, '&lt;') + '</pre>')"
in the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content
theft.

But, is there anything in the .Net framework that will help with
viewing an encrypted web page's source for my spider? It seems I need
to be able to run Javascript to decode the page into readable
HTML.....but, as I may have said, I am only after the HTML....I really
don't want to DL the pics and all that other stuff - it kills my
speed.

Any ideas?

Oct 5 '06 #3
Code is not what I need. I need a JavaScript Interpreter that I can include
in my app. Then I could rab the HTML (encoded or not) and run the
javascript to decode the HTML page.

But I can't find one that I can include with my app.

Of course, even if I did this, there's also VBscript Encoding and God only
knows what else to contend with......

I may just be relegated to doing it the slow way. Even then, it seems that
there should be some way to tell webbrowser control to load a page into the
DOM but not to retrieve images or audio or (*insert bandwidth wasting object
name here*).

Can you turn off DL everything but the text in a webbrowser control?

I just want the unencrypted, unobfuscated HTML code to scan. Nothing else.
"GhostInAK" <gh*******@gmai l.comwrote in message
news:be******** *************** ***@news.micros oft.com...
Hello smerf,

If you studied the js and did a lil investigation you could easily figure
this out.

Here's some of the guts to get you started:

function p(y) {
var d='',i,r,m,g;
for(i=1;i<=y.le ngth;i++) {
r=y.charAt(i-1);
m=b.indexOf(r);
if(m>-1) {
g=((m+1)%n-1);
if(g<=0) {
g+=n
}
d+=b.charAt(g-1)
} else {
d+=r
}
}
k+=d
};

function fff() {
document.write( k);e=""
}
-Boo
>I am trying to write a personal spider to crawl through websites and
create a highly specialized personal list of sites and pages that I
may like to see based on preferences that I have supplied. I have
found some interesting pages - interesting in the fact that they use
javascript to encrypt the pages to block people from ?stealing thier
content?.

There are javascript tricks that you can use on the downloaded
encrypted page to get around these irritations. You have to run a
javascript line in the browsers address line and you get another
window with the unencrypted HTML in it. But, I want to see the HTML
unencrypted without downloading every image, wav, activex object and
flash thingy on the page into an actual webbrowser control. Utilizing
a webbrowser control for this, and having to dl all images and such
would dramatically decrease the speed the spider can crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way
to defeat it is by pasting
"javascript:wi ndow.open('abou t:blank').docum ent.write('<pre >' +
document.docum entElement.oute rHTML.replace(/</g, '&lt;') + '</pre>')"
in the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content
theft.

But, is there anything in the .Net framework that will help with
viewing an encrypted web page's source for my spider? It seems I need
to be able to run Javascript to decode the page into readable
HTML.....but , as I may have said, I am only after the HTML....I really
don't want to DL the pics and all that other stuff - it kills my
speed.

Any ideas?


Oct 5 '06 #4
Hello smerf,

A browser is really the only thing that will give you reliable results.

-Boo
Code is not what I need. I need a JavaScript Interpreter that I can
include in my app. Then I could rab the HTML (encoded or not) and run
the javascript to decode the HTML page.

But I can't find one that I can include with my app.

Of course, even if I did this, there's also VBscript Encoding and God
only knows what else to contend with......

I may just be relegated to doing it the slow way. Even then, it seems
that there should be some way to tell webbrowser control to load a
page into the DOM but not to retrieve images or audio or (*insert
bandwidth wasting object name here*).

Can you turn off DL everything but the text in a webbrowser control?

I just want the unencrypted, unobfuscated HTML code to scan. Nothing
else.

"GhostInAK" <gh*******@gmai l.comwrote in message
news:be******** *************** ***@news.micros oft.com...
>Hello smerf,

If you studied the js and did a lil investigation you could easily
figure this out.

Here's some of the guts to get you started:

function p(y) {
var d='',i,r,m,g;
for(i=1;i<=y.l ength;i++) {
r=y.charAt(i-1);
m=b.indexOf(r) ;
if(m>-1) {
g=((m+1)%n-1);
if(g<=0) {
g+=n
}
d+=b.charAt( g-1)
} else {
d+=r
}
}
k+=d
};
function fff() {
document.write (k);e=""
}
-Boo
>>I am trying to write a personal spider to crawl through websites and
create a highly specialized personal list of sites and pages that I
may like to see based on preferences that I have supplied. I have
found some interesting pages - interesting in the fact that they use
javascript to encrypt the pages to block people from ?stealing thier
content?.

There are javascript tricks that you can use on the downloaded
encrypted page to get around these irritations. You have to run a
javascript line in the browsers address line and you get another
window with the unencrypted HTML in it. But, I want to see the HTML
unencrypted without downloading every image, wav, activex object and
flash thingy on the page into an actual webbrowser control.
Utilizing a webbrowser control for this, and having to dl all images
and such would dramatically decrease the speed the spider can crawl
at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript
way to defeat it is by pasting
"javascript:w indow.open('abo ut:blank').docu ment.write('<pr e>' +
document.docu mentElement.out erHTML.replace(/</g, '&lt;') +
'</pre>')" in the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content
theft.

But, is there anything in the .Net framework that will help with
viewing an encrypted web page's source for my spider? It seems I
need to be able to run Javascript to decode the page into readable
HTML.....bu t, as I may have said, I am only after the HTML....I
really don't want to DL the pics and all that other stuff - it kills
my speed.

Any ideas?

Oct 6 '06 #5
Smerf,

Do you really think that we are supplying code here to hack peoples Email
adresses to help spammers?

It is a fool who supplies that.

Cor

"smerf" <sm***@shroom.c omschreef in bericht
news:zE******** **********@bign ews2.bellsouth. net...
>I am trying to write a personal spider to crawl through websites and create
a highly specialized personal list of sites and pages that I may like to
see based on preferences that I have supplied. I have found some
interesting pages - interesting in the fact that they use javascript to
encrypt the pages to block people from ?stealing thier content?.

There are javascript tricks that you can use on the downloaded encrypted
page to get around these irritations. You have to run a javascript line
in the browsers address line and you get another window with the
unencrypted HTML in it. But, I want to see the HTML unencrypted without
downloading every image, wav, activex object and flash thingy on the page
into an actual webbrowser control. Utilizing a webbrowser control for
this, and having to dl all images and such would dramatically decrease the
speed the spider can crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way to
defeat it is by pasting
"javascript:win dow.open('about :blank').docume nt.write('<pre> ' +
document.docume ntElement.outer HTML.replace(/</g, '&lt;') + '</pre>')" in
the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content theft.

But, is there anything in the .Net framework that will help with viewing
an encrypted web page's source for my spider? It seems I need to be able
to run Javascript to decode the page into readable HTML.....but, as I may
have said, I am only after the HTML....I really don't want to DL the pics
and all that other stuff - it kills my speed.

Any ideas?

Oct 6 '06 #6
And if I was looking to "hack", do you think I would have come here instead
of alt.hack or one of a thousand websites with black art experts?

Get a life Cor.

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:eh******** ******@TK2MSFTN GP02.phx.gbl...
Smerf,

Do you really think that we are supplying code here to hack peoples Email
adresses to help spammers?

It is a fool who supplies that.

Cor

"smerf" <sm***@shroom.c omschreef in bericht
news:zE******** **********@bign ews2.bellsouth. net...
>>I am trying to write a personal spider to crawl through websites and
create a highly specialized personal list of sites and pages that I may
like to see based on preferences that I have supplied. I have found some
interesting pages - interesting in the fact that they use javascript to
encrypt the pages to block people from ?stealing thier content?.

There are javascript tricks that you can use on the downloaded encrypted
page to get around these irritations. You have to run a javascript line
in the browsers address line and you get another window with the
unencrypted HTML in it. But, I want to see the HTML unencrypted without
downloading every image, wav, activex object and flash thingy on the page
into an actual webbrowser control. Utilizing a webbrowser control for
this, and having to dl all images and such would dramatically decrease
the speed the spider can crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way to
defeat it is by pasting
"javascript:wi ndow.open('abou t:blank').docum ent.write('<pre >' +
document.docum entElement.oute rHTML.replace(/</g, '&lt;') + '</pre>')" in
the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content theft.

But, is there anything in the .Net framework that will help with viewing
an encrypted web page's source for my spider? It seems I need to be able
to run Javascript to decode the page into readable HTML.....but, as I may
have said, I am only after the HTML....I really don't want to DL the pics
and all that other stuff - it kills my speed.

Any ideas?


Oct 6 '06 #7
Smerf,

You take it very personally. If we supply that code it is free on Internet.
One search on Google would make our hiding of emailadresses without sense.

Why do you take it so personal, I thought that there was nothing personal in
my reply.

Cor

"smerf" <sm***@shroom.c omschreef in bericht
news:RI******** ***********@big news7.bellsouth .net...
And if I was looking to "hack", do you think I would have come here
instead of alt.hack or one of a thousand websites with black art experts?

Get a life Cor.

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:eh******** ******@TK2MSFTN GP02.phx.gbl...
>Smerf,

Do you really think that we are supplying code here to hack peoples Email
adresses to help spammers?

It is a fool who supplies that.

Cor

"smerf" <sm***@shroom.c omschreef in bericht
news:zE******* ***********@big news2.bellsouth .net...
>>>I am trying to write a personal spider to crawl through websites and
create a highly specialized personal list of sites and pages that I may
like to see based on preferences that I have supplied. I have found some
interestin g pages - interesting in the fact that they use javascript to
encrypt the pages to block people from ?stealing thier content?.

There are javascript tricks that you can use on the downloaded encrypted
page to get around these irritations. You have to run a javascript line
in the browsers address line and you get another window with the
unencrypted HTML in it. But, I want to see the HTML unencrypted without
downloading every image, wav, activex object and flash thingy on the
page into an actual webbrowser control. Utilizing a webbrowser control
for this, and having to dl all images and such would dramatically
decrease the speed the spider can crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way
to defeat it is by pasting
"javascript:w indow.open('abo ut:blank').docu ment.write('<pr e>' +
document.docu mentElement.out erHTML.replace(/</g, '&lt;') + '</pre>')" in
the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content theft.

But, is there anything in the .Net framework that will help with viewing
an encrypted web page's source for my spider? It seems I need to be
able to run Javascript to decode the page into readable HTML.....but, as
I may have said, I am only after the HTML....I really don't want to DL
the pics and all that other stuff - it kills my speed.

Any ideas?



Oct 6 '06 #8
"Do you really think that we are supplying code here to hack peoples Email
adresses to help spammers?"

I was the one requesting the help. So, you basically accused me of being a
spammer. You assumed the code would be used for crap like email
collections - that we ALL (me included) HATE. (If I had my way, we could
all kill spammers and hackers in the streets.)

If you had simply meant that *others* may use it for such, you could have
offered help via email or some other venue. (My guess is that you have no
such code - which makes your accusation even more personal.)

You shoud be more careful in how you reply to others' posts.

I would never assume the worst of someone I didn't know all the facts. If I
thought the info they were asking for was inappropriate, I would simply keep
moving. I wouldn't reply at all. Maybe you should do the same.

And, FYI, there are ways to obfuscate only your email addresses, in web
pages, that help to hide them from bots. Personally, I don't even post my
email address on my sites in the HTML. It is on the page as a
human-readable image.

If someone wants to email me, they have to type in my email address or use
the reply form on the site that sends me an email from server side script -
also effectively hiding my email address.

Now, move along. I have work to do.

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:eA******** ******@TK2MSFTN GP02.phx.gbl...
Smerf,

You take it very personally. If we supply that code it is free on
Internet. One search on Google would make our hiding of emailadresses
without sense.

Why do you take it so personal, I thought that there was nothing personal
in my reply.

Cor

"smerf" <sm***@shroom.c omschreef in bericht
news:RI******** ***********@big news7.bellsouth .net...
>And if I was looking to "hack", do you think I would have come here
instead of alt.hack or one of a thousand websites with black art experts?

Get a life Cor.

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:eh******* *******@TK2MSFT NGP02.phx.gbl.. .
>>Smerf,

Do you really think that we are supplying code here to hack peoples
Email adresses to help spammers?

It is a fool who supplies that.

Cor

"smerf" <sm***@shroom.c omschreef in bericht
news:zE****** ************@bi gnews2.bellsout h.net...
I am trying to write a personal spider to crawl through websites and
create a highly specialized personal list of sites and pages that I may
like to see based on preferences that I have supplied. I have found
some interesting pages - interesting in the fact that they use
javascrip t to encrypt the pages to block people from ?stealing thier
content?.

There are javascript tricks that you can use on the downloaded
encrypted page to get around these irritations. You have to run a
javascript line in the browsers address line and you get another window
with the unencrypted HTML in it. But, I want to see the HTML
unencrypte d without downloading every image, wav, activex object and
flash thingy on the page into an actual webbrowser control. Utilizing
a webbrowser control for this, and having to dl all images and such
would dramatically decrease the speed the spider can crawl at.

An example of an encrypted page can be found at
http://www.aw-soft.com/htmlguard-sample.html. A simple Javascript way
to defeat it is by pasting
"javascript: window.open('ab out:blank').doc ument.write('<p re>' +
document.doc umentElement.ou terHTML.replace (/</g, '&lt;') + '</pre>')"
in the IE address bar and clicing GO.

I have no interest in (or drive space for) mass web page content theft.

But, is there anything in the .Net framework that will help with
viewing an encrypted web page's source for my spider? It seems I need
to be able to run Javascript to decode the page into readable
HTML.....but , as I may have said, I am only after the HTML....I really
don't want to DL the pics and all that other stuff - it kills my speed.

Any ideas?



Oct 6 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
1876
by: smerf | last post by:
I am trying to write a personal spider to crawl through websites and create a highly specialized personal list of sites and pages that I may like to see based on preferences that I have supplied. I have found some interesting pages - interesting in the fact that they use javascript to encrypt the pages to block people from ?stealing thier content?. There are javascript tricks that you can use on the downloaded encrypted page to get...
1
1059
by: apondu | last post by:
Hi Friends... I have a simple query on building the Web Setup project. When i build my setup, i see that the code behind the .aspx pages (.cs code pages) are also included. And when i deploy the project i can see that even those code files are also deployed on the machine and the code with in those .cs files can be seen ( read-only format). But i want the code to be hidden, either those code files should not be deployed ot they should...
5
1325
by: archana | last post by:
Hi all I am new to asp.net. I want to implement authentication in all pages. What i want to do is validate user from database table. So currently what i am doing is on login page validating user and storing valid user id in sesstion. On every page i am checking userid from session.. But i don't want to behavirour. what i want is to provide authentication to all pages once not on every page .
4
1258
by: Bacchus | last post by:
I have some files with encrypted code. I am trying to edit the pages and believe the encrypted portion has some code I need to edit. Is there a way to print out the code to see the raw code without the encryption or any other way to view the code?? thanks
6
2879
by: priyajohal | last post by:
#include<fstream.h> #include<process.h> #include<stdlib.h> #include<conio.h> #include<string.h> #include<dos.h> #include<ctype.h> #include<stdio.h> void setup() void help();
0
10287
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10119
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10060
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9922
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8951
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5492
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4030
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3621
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2859
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.