472,107 Members | 1,372 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,107 software developers and data experts.

Tricks to help prevent site ripping

I know that its impossible to completely prevent somebody from ripping a
site (or cracking software) if that person has the skills and the
time/patience, but there are tricks that can be employed in software which
slow crackers down, from things like self-decrypting code to anti-debug
tricks, and most people have a breaking point - if you can slow them down
and waste enough of their time they usually move on to easier targets (and
as there are so many easy targets out there most people wouldn't waste too
much time on hard targets).

I was wondering if there are any such tricks that can be used to slow down
people who want to rip your website to modify and use for their own? The
dynamic server-side nature of PHP would suggest that something could be
done, because obviously when somebody rips a site they only get the HTML
from the PHP and not its source code

Any ideas?
Aug 15 '05 #1
25 3813
You can use different templates (simple and plain) for your site - so when soeone goes to another page it doesnt have to be the exact design and colour.
This way there wont be a pattern for site extraction.
Aug 15 '05 #2
Only thing I can think of is to write your PHP code and when complete
remove all the new lines - so the entire page is on one line. However
a simple search and replace could undo your work (replace all ; with
;\n).

Rick
www.e-connected.com

Aug 15 '05 #3
Dave Turner (not@dave) wrote:
: I know that its impossible to completely prevent somebody from ripping a
: site (or cracking software) if that person has the skills and the
: time/patience, but there are tricks that can be employed in software which
: slow crackers down, from things like self-decrypting code to anti-debug
: tricks, and most people have a breaking point - if you can slow them down
: and waste enough of their time they usually move on to easier targets (and
: as there are so many easy targets out there most people wouldn't waste too
: much time on hard targets).

: I was wondering if there are any such tricks that can be used to slow down
: people who want to rip your website to modify and use for their own? The
: dynamic server-side nature of PHP would suggest that something could be
: done, because obviously when somebody rips a site they only get the HTML
: from the PHP and not its source code

: Any ideas?

When somebody rips a site they only get the HTML from the PHP and not its
source code.
--

This space not for rent.
Aug 15 '05 #4
On 2005-08-15, Dave Turner <not@dave> wrote:
I was wondering if there are any such tricks that can be used to slow down
people who want to rip your website to modify and use for their own? The
dynamic server-side nature of PHP would suggest that something could be
done, because obviously when somebody rips a site they only get the HTML
from the PHP and not its source code


So you want to prevent people from reading your HTML?

--
Cheers,
- Jacob Atzen
Aug 15 '05 #5
Dave Turner wrote:
I know that its impossible to completely prevent somebody from ripping a
site (or cracking software) if that person has the skills and the
time/patience, but there are tricks that can be employed in software which
slow crackers down, from things like self-decrypting code to anti-debug
tricks, and most people have a breaking point - if you can slow them down
and waste enough of their time they usually move on to easier targets (and
as there are so many easy targets out there most people wouldn't waste too
much time on hard targets).

I was wondering if there are any such tricks that can be used to slow down
people who want to rip your website to modify and use for their own? The
dynamic server-side nature of PHP would suggest that something could be
done, because obviously when somebody rips a site they only get the HTML
from the PHP and not its source code

Any ideas?


Yes lots (maybe I should write a book). The first problem you're going to
have is discriminating between searchbots (which you probably want on your
site and harvesters. But you'd also need to be a lot clearer about the
architecture of your site and what exactly you want to protect. Is it
really the HTML?

You should also be addressing the issue of *why* people might want to rip
your site. Perhaps providing syndicated content might be a better solution.

To get you started on anti-harvesting - think honeypot.

HTH

C.
Aug 15 '05 #6
Jacob Atzen wrote:
On 2005-08-15, Dave Turner <not@dave> wrote:
I was wondering if there are any such tricks that can be used to slow
down people who want to rip your website to modify and use for their own?
The dynamic server-side nature of PHP would suggest that something could
be done, because obviously when somebody rips a site they only get the
HTML from the PHP and not its source code


So you want to prevent people from reading your HTML?


Excactly Jacob.

What is the OP trying to achieve?
I do not see the point...

Regards,
Erwin Moller
Aug 15 '05 #7
Dave Turner wrote:
I know that its impossible to completely prevent somebody from ripping a
site (or cracking software) if that person has the skills and the
time/patience, but there are tricks that can be employed in software which
slow crackers down,


If you use sessions, you can track the number of page requests in a
time interval. If you see an unreasonable amount of requests per
second, you could stop serving pages to that session. That would
prevent many harvesters from obtaining more than a slice of your site
at any one time.

Is that the kind of impediment you had in mind?

Aug 15 '05 #8
>> I know that its impossible to completely prevent somebody from ripping a
site (or cracking software) if that person has the skills and the
time/patience, but there are tricks that can be employed in software which
slow crackers down,
If you use sessions, you can track the number of page requests in a
time interval.


If you use sessions, the harvesters probably don't use cookies. Or
they have a bunch of harvesters running in parallel with different
sessions and different session cookies. Or, if you're using trans_sid,
there's a bunch of harvesters using different session IDs in the URLs.
If you see an unreasonable amount of requests per
second, you could stop serving pages to that session. That would
prevent many harvesters from obtaining more than a slice of your site
at any one time.

Is that the kind of impediment you had in mind?


I don't think it's much of an impediment. You might try detecting
a lot of requests from the same IP, which means you will slow down
or deny service to proxies like AOL and other large ISPs use.

Gordon L. Burditt
Aug 15 '05 #9
> So you want to prevent people from reading your HTML?

err, no ... (obviously). Re-read my question. I didn't ask how to stop
people reading HTML, I asked if anyone knew of any tricks (of which there
are at least several) which can be used to slow people down from ripping
your sites content (ie. to use your website design for themselves).
Obviously this is something which can't be achieved 100% - if somebody has
enough time skill and patience then they can copy/rip and site, but slowing
them down will deter most people.
Aug 16 '05 #10
Dave Turner wrote:
So you want to prevent people from reading your HTML?


err, no ... (obviously). Re-read my question. I didn't ask how to stop
people reading HTML, I asked if anyone knew of any tricks (of which
there are at least several) which can be used to slow people down from
ripping your sites content (ie. to use your website design for
themselves). Obviously this is something which can't be achieved 100%
- if somebody has enough time skill and patience then they can
copy/rip and site, but slowing them down will deter most people.


You're question is still not clear. Can you give and example of "ripping
your site's content"? Do you mean copy the text of a site? All browsers
support copy and paste of text as do most windowed programs. Do you mean
copy the images you might have? Hey if the browser knows how to get the
image (and it must) then so to can the user. Heck browsers support File:
Save As so people can just do that. Not achievable 100%? Try not
achievable at all! The best you can do is copyright your material as
many do already. Then again the whole concept of copyright and web pages
never made much sense to me. The mere act of rendering a page is indeed
violating the copyright in that a copy has been made! Besides the net is
a place where information is freely exchanged. If you participate in the
net then you are participating in the free exchange of ideas and
material. That's the way it works!

If you mean copy your PHP code, well they can't. Your page is served out
to them as HTML, not the PHP code that produced the HTML so at least
there you are safe.

--
A Messy Kitchen Is A Happy Kitchen And This Kitchen Is Delirious

Aug 16 '05 #11
I agree that "professional" harvester programs aren't going to be
stopped by session lock-ups if some bandwith limit is exceeded (lock up
the session if more than X number of page requests are sent per second,
etc).

However I'm pretty sure this approach would effectively stall casual
site ripping with packages like WebLeech or Web Stripper. Haven't tried
it though-- not yet at least.

Much depends on what Dave Turner (the OP) actually needs. So far it
isn't clear whether he's trying to protect the content of a commercial
site from being undercut by an unscrupulous competitor or if he's tryin
to keep his wawoo-neat design ideas from showing up on half of Yahoo's
free web sites.

Aug 16 '05 #12
Following on from Will Woodhull's message. . .
I agree that "professional" harvester programs aren't going to be
stopped by session lock-ups if some bandwith limit is exceeded (lock up
the session if more than X number of page requests are sent per second,
etc).

However I'm pretty sure this approach would effectively stall casual
site ripping with packages like WebLeech or Web Stripper. Haven't tried
it though-- not yet at least.

Much depends on what Dave Turner (the OP) actually needs. So far it
isn't clear whether he's trying to protect the content of a commercial
site from being undercut by an unscrupulous competitor or if he's tryin
to keep his wawoo-neat design ideas from showing up on half of Yahoo's
free web sites.

THOUGHT!
Obviously if a plain browser can see all the stuff on your pages they
are in the wild and fair game.

You could /try/ the following but beware of the effect of caching.

The object is to put a poison pill into the pages which triggers when it
isn't being shown live and on your site. I can think of two ways of
doing this - both purely theoretical and both require you to dynamically
tweak some javascript.

1. Your javascript (which might be loading images or other OnLoad()
activities) tests the date and time (client) against the 'now' on your
server. The 'now' on your server is hard coded into the javascript -
hence the need for dynamic creation. Your js code might say "if client
time is two weeks behind the time the page was created then open 'a this
site has been ripped window'.

2. Have js ask for some resource from your server which has to be
dynamic. For example an image that gives today's date or a news feed
extract. This will look a bit weird when looked at statically.

So my conclusion is
You can't stop ripping but you might be able to flag it to an
unsuspecting viewer of HTML.
In a slightly different vein.
How about getting js to call some resources in a way that is not easy to
deduce from looking at the code automatically is a URL. Basically some
low level encryption. I don't know how rippers work but surely one of
the things they would do is try to redirect all <a
href="my.site/page.htm"> to
<a href="ripped.site/page.htm"> Your js could be written to trap this
sort of thing by (a) not fetching stuff for a 'real time' (as in 1
above) load but having the url clearly present in the js along the lines
of "ha ha this will break the page 'cos now you're trying to link to a
resource on ripped.site which doesn't exist." (b) (say) ROT 13 complete
URLs including the http bit and calling them by an OnClick(). This
brings them onto your site and what you do then is up to you. Perhaps
it is a page that has a lifetime of a week and then gets filled with
poisonous content. Or just eaxmine the referrer in the header to
discover that the come-from page was somewhere out in cyberspace and
then you could decide what to do.

I'm trying to think of a way to call a style sheet with variable (js -
date based) parameters. Then you could really upset the page layout
after a fortnight.

All the above is just thoughts.

--
PETER FOX Not the same since the bookshop idea was shelved
pe******@eminent.demon.co.uk.not.this.bit.no.html
2 Tees Close, Witham, Essex.
Gravity beer in Essex <http://www.eminent.demon.co.uk>
Aug 17 '05 #13
Hi Dave,

Another trick, that can help a little.

Use javascript at least for the creation of parts of the internal links.
most harvesters don't execute javascript and will therefore never follow
links, that are being created by javascript.

bye

nkf

Dave Turner wrote:
So you want to prevent people from reading your HTML?

err, no ... (obviously). Re-read my question. I didn't ask how to stop
people reading HTML, I asked if anyone knew of any tricks (of which there
are at least several) which can be used to slow people down from ripping
your sites content (ie. to use your website design for themselves).
Obviously this is something which can't be achieved 100% - if somebody has
enough time skill and patience then they can copy/rip and site, but slowing
them down will deter most people.

Aug 17 '05 #14
Yes - and it's almost 100% and fairly easy to implement:

AJAX

Each page is made up nothing but an AJAX loader. Doing view source,
save page, etc will view/save the AJAX loader.

The AJAX loader simply makes requests (one or many) to your PHP code.
By requiring your AJAX loader to request the page in a relatively short
period of time (i.e. 1 second) the URL that AJAX asks for is only valid
for the "short period of time"
(for instance
http://www.yourpage.com/CoolSite?get...2&key=234A32CD) where ts
is a unix timestmp and key is a MD5 encrypted timestamp+magic word.

This will allow users to copy/paste your CONTENT but not your HTML (and
of course your PHP is safe as long as you are not distributing it and
as long you don't have a server breach).

-CF

Aug 17 '05 #15
ChronoFish (de**@chronofish.com) wrote:
: Yes - and it's almost 100% and fairly easy to implement:

: AJAX

: Each page is made up nothing but an AJAX loader. Doing view source,
: save page, etc will view/save the AJAX loader.

: The AJAX loader simply makes requests (one or many) to your PHP code.
: By requiring your AJAX loader to request the page in a relatively short
: period of time (i.e. 1 second) the URL that AJAX asks for is only valid
: for the "short period of time"
: (for instance
: http://www.yourpage.com/CoolSite?get...2&key=234A32CD) where ts
: is a unix timestmp and key is a MD5 encrypted timestamp+magic word.

: This will allow users to copy/paste your CONTENT but not your HTML (and
: of course your PHP is safe as long as you are not distributing it and
: as long you don't have a server breach).

Methods such as this do nothing to stop a person who has a desire to steal
your html.

A trivial proxy will allow a programmer to save all your html no matter
what little inconveniences you put in the way.

The typical result of things like the above is simply to inconvenience
regular people when something unexpected happens, such as (in this
example) a slow connection perhaps.

Some people have suggested JavaScript to hide urls. Again, they can
inconvenience a regular user (e.g. preventing them from bookmarking some
pages) but do nothing to a programmer who decides to "rip" your site.
--

This space not for rent.
Aug 17 '05 #16
This is not 100% foolproof, but if you don't give anyone on the
internet access to your content, then the only way they could get to it
would be to break into your house and steal it. :)

Aug 17 '05 #17
ChronoFish wrote:
Yes - and it's almost 100% and fairly easy to implement:

AJAX

Each page is made up nothing but an AJAX loader. Doing view source,
save page, etc will view/save the AJAX loader.

The AJAX loader simply makes requests (one or many) to your PHP code.
By requiring your AJAX loader to request the page in a relatively
short period of time (i.e. 1 second) the URL that AJAX asks for is
only valid for the "short period of time" (for instance
http://www.yourpage.com/CoolSite?get...2&key=234A32CD) where ts
is a unix timestmp and key is a MD5 encrypted timestamp+magic word.

This will allow users to copy/paste your CONTENT but not your HTML
(and of course your PHP is safe as long as you are not distributing it
and as long you don't have a server breach).


I don't understand. If I wget(1) this URL wouldn't I be left with a file
on my system that is essential HTML handed to the browser after whatever
block you put in place?
--
I once wanted to become an atheist but I gave up...They have no holidays

Aug 17 '05 #18
Malcolm Dew-Jones wrote:

<snip>
Methods such as this do nothing to stop a person who has a desire to steal
your html.

A trivial proxy will allow a programmer to save all your html no matter
what little inconveniences you put in the way.

The typical result of things like the above is simply to inconvenience
regular people when something unexpected happens, such as (in this
example) a slow connection perhaps.

Some people have suggested JavaScript to hide urls. Again, they can
inconvenience a regular user (e.g. preventing them from bookmarking some
pages) but do nothing to a programmer who decides to "rip" your site.
Excactly!

If the browser can display the HTML needed, so can a skilled programmer.
That is the bottomline for ALL tricks presented here.

That is why I think creating a few hurdles is not worth the trouble.
It costs you a lot of time, it will surely not stop a serious skilled ripper
longer than a few minutes, and you might accidently hurt normal visitors
too with all these 'smart tricks'.

If you content is so precious, don't publish it.
Or simply copyright it, and send a lawyer their way if they rip it.

Regards,
Erwin Moller


--

This space not for rent.


Aug 18 '05 #19
cameron7 wrote:
This is not 100% foolproof, but if you don't give anyone on the
internet access to your content, then the only way they could get to it
would be to break into your house and steal it. :)


A smarter foolproofing would be to not create the content, but
keep the ideas in your head. However, this would not prevent
somebody getting at it by breaking into your house and
performing the Vulcan mind meld on you. The safest way, as
far as I can see, is to not think of any content at all.

--
Jock
Aug 18 '05 #20
"....I don't understand. If I wget(1) this URL wouldn't I be left with
a file..."

No - If you wget the page URL you get the AJAX loader. You could
investigate the HTML and see that there is an HTTPObject call specified
with a URL, but that URL would only be accurate for a short period of
time.
It is absolutely true that it's not 100% - and I didn't say it was.
But to rip it you would have to have something that was capable of
executing javascript - and magically knowing when the javascript is
done executing. As javascript is event driven and this increases the
difficulty. Unless of course a human is sitting there assisting the
ripping - and then that's not really "ripping" as much as kracking.

The content is not protected - as I noted - but it doesn't sound like
Dave wants to protect the content as much as the site design.

As I noted the content can still be copied/pasted very easily. But the
HTML itself does require more effort.

I have built a browser that would get around this by dumping the HTML
that is currently displayed. So I know first hand that this meathod is
not full-proof. But again it would be much harder to automate - (again
because it doesn't know when to dump the HTML) - not impossible - just
harder.

In terms of inconviencing users - I didn't hear any problems that
couldn't be solved. Dave asked for tricks. It's a good trick, it
would thrawrt most of today's rippers (software rippers) and certainly
the layman who does the "file->saveAs" or "view source" using one of
the standard (firefox, opera, ie, etc) browsers.

I understand that it's not very PC - as in runs counter to the
OpenSource Culture (which I am a proud contributor ) hence the "if you
don't want people to use it don't put it on the net" comments - but
don't shoot the messenger - AJAX does give you the ability to display
HTML without *easily* subjecting it to being ripped.

-CF

Aug 18 '05 #21
On 2005-08-18, ChronoFish <de**@chronofish.com> wrote:
I have built a browser that would get around this by dumping the HTML
that is currently displayed. So I know first hand that this meathod is
not full-proof. But again it would be much harder to automate - (again
because it doesn't know when to dump the HTML) - not impossible - just
harder.

In terms of inconviencing users - I didn't hear any problems that
couldn't be solved. Dave asked for tricks. It's a good trick, it
would thrawrt most of today's rippers (software rippers) and certainly
the layman who does the "file->saveAs" or "view source" using one of
the standard (firefox, opera, ie, etc) browsers.

I understand that it's not very PC - as in runs counter to the
OpenSource Culture (which I am a proud contributor ) hence the "if you
don't want people to use it don't put it on the net" comments - but
don't shoot the messenger - AJAX does give you the ability to display
HTML without *easily* subjecting it to being ripped.


tcpdump. And I'm sure it would be quite easy to manufacture something
for Firefox to get around it too...

--
Cheers,
- Jacob Atzen
Aug 18 '05 #22
As I've mentioned on a number of occasion, getting around something
like that is extremely easy. The DOM is capable of HTML of the current
document as it appear on screen. Just type
javascript:alert(document.body.innerHTML) into the URL and you'll see
what I mean. With a few lines of Javascript code, a frameset page, and
cross-site scripting turned on, you can fully automated the process.

Aug 19 '05 #23
tcpdump would help only if the AJAX loader is dumb. In otherwords only
if the AJAX loader spits out exactly what is given to it. But in
general - you're right tcpdump or other 3rd party stream listener would
be able to caputre what ever gets streamed. Not sure if that would
help you across an https connection.

Manufacturing something for FireFox probably would get you closer to
hijacking the site with an AJAX loader - but it would still lend to the
same problem of not knowing when to dump (before or after the AJAX has
requested a secondary page. Before or after the secondary page has
been displayed Before or after all timers have stopped? Would it be
the same for every page? What if it's waiting for user input? How
would the "automation" know?)

I appreciate the "you can't secure your HTML". And I understand that.
But the "trick" I presented is much more difficult to work around than
you (you=those who dismiss it) are giving it credit for.

-CF

Aug 19 '05 #24
Again if you've got a user there monitoring the process - then yeah -
it can be beat. But if you're ripping (my idea of ripping is stealing
all/most of the site with no human interaction) it's more difficult.
For instance if you wait for a page load your automation will get
kicked off as soon as the AJAX loader is done loading. And that's all
you get. If you wait 5 seconds, then you might be right - if the AJAX
loader is loading into the same document....

I'll try to get a demo setup. Check back in few days...

Aug 19 '05 #25
How hard is to to just keep checking the innerHTML property until
something appears?

Aug 20 '05 #26

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by George Hernandez | last post: by
7 posts views Thread by Ted | last post: by
8 posts views Thread by Johnny Knoxville | last post: by
822 posts views Thread by Turamnvia Suouriviaskimatta | last post: by
reply views Thread by kamalpp | last post: by
24 posts views Thread by Lee | last post: by
reply views Thread by e.expelliarmus | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.