By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,636 Members | 1,189 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,636 IT Pros & Developers. It's quick & easy.

How to limit the number of web pages downloaded from a site?

P: n/a
Nad
I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

Aug 8 '08 #1
Share this Question
Share on Google+
16 Replies


P: n/a
In article <g7**********@aioe.org>, na*@invalid.com (Nad) wrote:
I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.
Password protect folders or pages, make users register to get the
passwords, that would slow them down a bit. But really, if you make
stuff available publicly...

--
dorayme
Aug 8 '08 #2

P: n/a
Gazing into my crystal ball I observed na*@invalid.com (Nad) writing in
news:g7**********@aioe.org:
I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

You could store their IP address in a session, and check to see the length
of time between requests.

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Aug 9 '08 #3

P: n/a
On 08 Aug 2008, na*@invalid.com (Nad) wrote:
I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.
Change the articles' text to Olde Englishe.

--
Neredbojias
http://www.neredbojias.net/
Public Website
Aug 9 '08 #4

P: n/a
Nad
In article <Xn****************************@69.16.185.247>, Adrienne Boswell
<ar****@yahoo.comwrote:
>Gazing into my crystal ball I observed na*@invalid.com (Nad) writing in
news:g7**********@aioe.org:
>I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.


You could store their IP address in a session, and check to see the length
of time between requests.
Well, something along those lines.
The problem is the server side support.
Some servers do not allow cgi, php, javascript, or even ssi
executable commands, and I'd like it to work on ANY server.
Aug 9 '08 #5

P: n/a
Nad
In article <Xn*****************************@194.177.96.78>, Neredbojias
<Sc********@gmail.comwrote:
>On 08 Aug 2008, na*@invalid.com (Nad) wrote:
>I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

Change the articles' text to Olde Englishe.
:--}

I like that!!!
Aug 9 '08 #6

P: n/a
In our last episode, <g7**********@aioe.org>, the lovely and talented Nad
broadcast on alt.html:
I have a very large site with valuable information. Is there any way to
prevent downloading a large number of articles. Some people want to
download the entire site.
It depends upon what you mean by 'articles.' If put you html documents on a
web server. you are pretty much inviting the public to view/download as much
of it as they want. If it is 'valuable', why are you giving it away? And
if you are giving it away valuable stuff, what did you expect? What is your
real concern here?

If you only worried about server load, why not zip or tar and gzip it up
and put it on an FTP server? This is most practical for related documents,
such as parts of a tutorial or parts of a spec. If you are a philanthropist
who is giving away valuable stuff, you can give it away in big chunks so
the nickel and dime requests don't bug you.

Well-behaved download-the-whole-site spiders will obey robots.txt, but that
is pretty much a courtesy thing, and it won't stop anyone who is manually
downloading a page at a time, and it won't stop rogue or altered spiders.
Likewise, you can block nice spiders which send a true user-agent ID, but
not so nice spiders can spoof their ID. That's kind of pointless, because
most of the nice spiders will obey robots.txt anyway.

You can make pages available through php or cgi which keeps track of the
number of documents with hidden controls. This is easily defeated by
anyone determined to do so, and like a cheap lock, will only keep the honest
people out. Beyond that, you can go to various user account schemes up to
putting your documents on a secure server.

But I think what you are asking is 'Can I keep my documents public and still
limit public access?' And the answer to that is, of course not because
there is a fundamental contradiction in what you want.
Any hints or pointers would be appreciated.
--
Lars Eighner <http://larseighner.com/us****@larseighner.com
War hath no fury like a noncombatant.
- Charles Edward Montague
Aug 9 '08 #7

P: n/a
On 08 Aug 2008, na*@invalid.com (Nad) wrote:
In article <Xn*****************************@194.177.96.78>, Neredbojias
<Sc********@gmail.comwrote:
>>On 08 Aug 2008, na*@invalid.com (Nad) wrote:
>>I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

Change the articles' text to Olde Englishe.

:--}

I like that!!!
<grin>

Seriously, I don't think there's much you can do that is practical. With
server-side support, you could impliment some kind of time limit and/or p/w
but you indicated you didn't want to rely on that. An off-the-wall "non-
solution" would be to use reasonably long meta page redirects, but the user
could always come back with a new time limit.

--
Neredbojias
http://www.neredbojias.net/
Public Website
Aug 9 '08 #8

P: n/a
Nad
In article <do**********************************@news-vip.optusnet.com.au>,
dorayme <do************@optusnet.com.auwrote:
>In article <g7**********@aioe.org>, na*@invalid.com (Nad) wrote:
>I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

Password protect folders or pages, make users register to get the
passwords, that would slow them down a bit.
It doesn't work. For example, Teleport Pro (a program to download
the entire sites) allows you to specify login/passwd.
So, once they register, they can enter this info and boom...
>But really, if you make
stuff available publicly...
Well, the site is 150 megs, over 20k articles.
And there are plenty of people who would LOVE to have
the entire site on their own box.
Then you have a problem. Providers usually charge for
the amount of traffic. In one month, you'd have to shell
out some bux, just to give the information to the
"gimme free Coce" zombies.
That does not make sense.

Aug 9 '08 #9

P: n/a
Nad
In article <sl*******************@debranded.larseighner.com >, Lars Eighner
<us****@larseighner.comwrote:
>In our last episode, <g7**********@aioe.org>, the lovely and talented Nad
broadcast on alt.html:
>I have a very large site with valuable information. Is there any way to
prevent downloading a large number of articles. Some people want to
download the entire site.

It depends upon what you mean by 'articles.' If put you html documents on a
web server. you are pretty much inviting the public to view/download as much
of it as they want. If it is 'valuable', why are you giving it away? And
if you are giving it away valuable stuff, what did you expect? What is your
real concern here?
Downloading the entire 150+ meg site, which translates into
all sorts of things.
>If you only worried about server load, why not zip or tar and gzip it up
and put it on an FTP server? This is most practical for related documents,
such as parts of a tutorial or parts of a spec. If you are a philanthropist
who is giving away valuable stuff, you can give it away in big chunks so
the nickel and dime requests don't bug you.

Well-behaved download-the-whole-site spiders will obey robots.txt,
That doesn't work. Some random user may come and download the
entire site. By the time you put him into robots.txt, it is too late.
>but that
is pretty much a courtesy thing, and it won't stop anyone who is manually
downloading a page at a time,
That is not a problem. They can manually download as much as they want.
But no automated downloads.
>and it won't stop rogue or altered spiders.
Likewise, you can block nice spiders which send a true user-agent ID, but
not so nice spiders can spoof their ID. That's kind of pointless, because
most of the nice spiders will obey robots.txt anyway.
>You can make pages available through php or cgi which keeps track of the
number of documents with hidden controls. This is easily defeated by
anyone determined to do so,
How?
>and like a cheap lock, will only keep the honest
people out. Beyond that, you can go to various user account schemes up to
putting your documents on a secure server.
Well, no account schemes, no user verification, no limits beyond
trying to automatically download the entire site pretty much.
>But I think what you are asking is 'Can I keep my documents public and still
limit public access?'
Not really. AUTOMATED download.
>And the answer to that is, of course not because
there is a fundamental contradiction in what you want.
I do not see it at the moment.
>Any hints or pointers would be appreciated.
Aug 9 '08 #10

P: n/a
In article <g7**********@aioe.org>, na*@invalid.com (Nad) wrote:
In article <do**********************************@news-vip.optusnet.com.au>,
dorayme <do************@optusnet.com.auwrote:
In article <g7**********@aioe.org>, na*@invalid.com (Nad) wrote:
I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.
Password protect folders or pages, make users register to get the
passwords, that would slow them down a bit.

It doesn't work. For example, Teleport Pro (a program to download
the entire sites) allows you to specify login/passwd.
So, once they register, they can enter this info and boom...
I understand your concerns and it is natural to worry a bit. But
consider again.

That there is Teleport Pro does not actually show that my suggestion
would not work. Perhaps you are looking at every stage at worst case
possibilities. It would limit it to people who knew about this program
or be prepared to get it. That is one thing. The other thing is that
granting passwords might be conditional on them agreeing not to do what
you fear. Is your site a serious site liable to attract serious people?
You might be surprised how decent most people are if you make things
clear.

But really, if you make
stuff available publicly...

Well, the site is 150 megs, over 20k articles.
And there are plenty of people who would LOVE to have
the entire site on their own box.
Then you have a problem. Providers usually charge for
the amount of traffic. In one month, you'd have to shell
out some bux, just to give the information to the
"gimme free Coce" zombies.
That does not make sense.
How sure are you of the likelihood of a whole bunch of people wanting to
download the whole lot? Most people are wary of over exposing themselves
to information and will get what they are interested in. So I guess, you
need to do some guessing and some analysis. Perhaps you are worrying
excessively?

Presumably you would be hoping your site is used and is useful. If a
bunch of folk download a small bunch of articles each, this might well
be the biggest factor rather than a few who download the lot. You would
have to make some projections concerning this, you would be in the best
position to crunch some numbers as it is your field. If you are more
successful than you imagine via people doing reasoanable things rather
than unreasonable things, you perhaps ought to be preparing yourself for
the possibility of serious server charges. I understand your concern to
limit things, but a huge site carves out a certain territory and you may
need to consider charging for access?

The other suggestion I might make is that you provide for the odd
possibility of some people wanting the lot by employing compressed
archives and utilising other than your own server, there might be some
free servers or cheap servers for this express purpose.

--
dorayme
Aug 9 '08 #11

P: n/a
Nad
In article <do**********************************@news-vip.optusnet.com.au>,
dorayme <do************@optusnet.com.auwrote:
>In article <g7**********@aioe.org>, na*@invalid.com (Nad) wrote:
>I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

Password protect folders or pages, make users register to get the
passwords, that would slow them down a bit.
It doesn't work. For example, Teleport Pro (a program to download
the entire sites) allows you to specify login/passwd.
So, once they register, they can enter this info and boom...
>But really, if you make
stuff available publicly...
Well, the site is 150 megs, over 20k articles.
And there are plenty of people who would LOVE to have
the entire site on their own box.
Then you have a problem. Providers usually charge for
the amount of traffic. In one month, you'd have to shell
out some bux, just to give the information to the
"gimme free Coce" zombies.
That does not make sense.
--
The most powerful Usenet tool you have ever heard of.

NewsMaestro v. 4.0.8 has been released.

* Several nice improvements and bug fixes.

Note: In some previous releases some class files were missing.
As a result, the program would not run.
Sorry for the inconvenience.

Web page:
http://newsmaestro.sourceforge.net/

Download page:
http://newsmaestro.sourceforge.net/D...nformation.htm

Send any feedback, ideas, suggestions, test results to
newsmaestroinfo \at/ mail.ru.

Your personal info will not be released and your privacy
will be honored.
Aug 9 '08 #12

P: n/a
Nad
In article <sl*******************@debranded.larseighner.com >, Lars Eighner
<us****@larseighner.comwrote:
>In our last episode, <g7**********@aioe.org>, the lovely and talented Nad
broadcast on alt.html:
>I have a very large site with valuable information. Is there any way to
prevent downloading a large number of articles. Some people want to
download the entire site.

It depends upon what you mean by 'articles.' If put you html documents on a
web server. you are pretty much inviting the public to view/download as much
of it as they want. If it is 'valuable', why are you giving it away? And
if you are giving it away valuable stuff, what did you expect? What is your
real concern here?
Downloading the entire 150+ meg site, which translates into
all sorts of things.
>If you only worried about server load, why not zip or tar and gzip it up
and put it on an FTP server? This is most practical for related documents,
such as parts of a tutorial or parts of a spec. If you are a philanthropist
who is giving away valuable stuff, you can give it away in big chunks so
the nickel and dime requests don't bug you.

Well-behaved download-the-whole-site spiders will obey robots.txt,
That doesn't work. Some random user may come and download the
entire site. By the time you put him into robots.txt, it is too late.
>but that
is pretty much a courtesy thing, and it won't stop anyone who is manually
downloading a page at a time,
That is not a problem. They can manually download as much as they want.
But no automated downloads.
>and it won't stop rogue or altered spiders.
Likewise, you can block nice spiders which send a true user-agent ID, but
not so nice spiders can spoof their ID. That's kind of pointless, because
most of the nice spiders will obey robots.txt anyway.
>You can make pages available through php or cgi which keeps track of the
number of documents with hidden controls. This is easily defeated by
anyone determined to do so,
How?
>and like a cheap lock, will only keep the honest
people out. Beyond that, you can go to various user account schemes up to
putting your documents on a secure server.
Well, no account schemes, no user verification, no limits beyond
trying to automatically download the entire site pretty much.
>But I think what you are asking is 'Can I keep my documents public and still
limit public access?'
Not really. AUTOMATED download.
>And the answer to that is, of course not because
there is a fundamental contradiction in what you want.
I do not see it at the moment. Can you expand on that?
>Any hints or pointers would be appreciated.
Aug 9 '08 #13

P: n/a
Nad
In article <Xn*****************************@194.177.96.78>, Neredbojias
<Sc********@gmail.comwrote:
>On 08 Aug 2008, na*@invalid.com (Nad) wrote:
>In article <Xn*****************************@194.177.96.78>, Neredbojias
<Sc********@gmail.comwrote:
>>>On 08 Aug 2008, na*@invalid.com (Nad) wrote:

I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

Change the articles' text to Olde Englishe.

:--}

I like that!!!

<grin>

Seriously, I don't think there's much you can do that is practical.
Well, Google does it. Sure, it is slightly a different setup,
but they limit the number of queries to 100.
With
server-side support, you could impliment some kind of time limit
Time limit on high bandwidth does not work.
>and/or p/w
but you indicated you didn't want to rely on that. An off-the-wall "non-
solution" would be to use reasonably long meta page redirects, but the user
could always come back with a new time limit.
Could you expand on that idea?
Aug 9 '08 #14

P: n/a
On 09 Aug 2008, na*@invalid.com (Nad) wrote:
>>I like that!!!

<grin>

Seriously, I don't think there's much you can do that is practical.

Well, Google does it. Sure, it is slightly a different setup,
but they limit the number of queries to 100.
Sure, but tell me they do it without server-side techniques which you so
explicitly eschewed...
>>but you indicated you didn't want to rely on that. An off-the-wall
"non- solution" would be to use reasonably long meta page redirects, but
the user could always come back with a new time limit.

Could you expand on that idea?
Don't think it would work but just 10-15 minute meta refresh in page head.

--
Neredbojias
http://www.neredbojias.net/
Public Website
Aug 9 '08 #15

P: n/a
na*@invalid.com (Nad) wrote:
>In article <48**************@news.individual.net>, Ra************@pircarre.be
(Raymond SCHMIT) wrote:
>>On Sat, 09 Aug 2008 11:11:07 GMT, na*@invalid.com (Nad) wrote:
>>>In article <dt********************************@4ax.com>, richard
<me****@newsguy.comwrote:
On Fri, 08 Aug 2008 22:42:51 GMT, na*@invalid.com (Nad) wrote:

>I have a very large site with valuable information.
>Is there any way to prevent downloading a large number
>of articles. Some people want to download the entire site.
If it is done with php, what exactly the code that would do it?

If it is done some other way, what is the exact code to do it?
As you have explained in a separate thread you want to obtain the
information on your site free by downloading the content of this and
other newsgroups. You even want to obtain free advice on how to edit
this content. You expect someone to provide, without charge, the
"exact code" to prevent others from downloading the valuable
information that you have obtained without paying for it. Of course,
all this should run on a server you don't pay for. I'm not surprised
that you are paranoid about someone stealing your site.

You have answered your own question. If you make the content public it
is not possible to prevent downloading it. It is relatively easy, and
quite common, to write a program that emulates a human downloading the
contents of a site. As you point out, if you charge for the
information, then your audience will look for the information
elsewhere and find it from the same sources that you used for your
site.

Live with it. Someone will "build on" your published work just as you
have "built on" other people's work. You can use copyright laws to
protect the exact content of your site and you can use your skills in
maintaining and updating your content to make sure that your site is
more attractive than theirs.
Aug 10 '08 #16

P: n/a
In comp.lang.javascript message <g7**********@aioe.org>, Fri, 8 Aug 2008
22:42:51, Nad <na*@invalid.composted:
>I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.
If you have a well-crafted index.htm page, and a robots.txt file that
allows robot access only to that page, then it seems likely that the
proportion of access from those whose searches have found something that
might have been of interest but was not will be significantly reduced.
Certainly using such a robots.txt works for me, to reduce total
download.

Keep page sizes down, so that a page access which turned out to be
uninteresting or only partly interesting does not cost you so many
bytes.

Omit inessential figures from the text pages, link to them instead, so
that a click is needed and will open a new tab or window. Maybe do
similar with tables.

Check how the access is counted. If a page in plain HTML requires 50 kB
but can be compressed to 25 kB, is it delivered compressed and is it
counted as 25 or 50 kB?

Consider zipping material, as a possible means of deterring mere
passers-by. Consider compressing material in a manner less easy of
access - zip with password or a rarer compressing tool. Consider
encoding material by writing not in English but, say, in German. You
can always if necessary rephrase your German so that translate tools
make reasonable sense of it.

Don't expect any of these to prevent all downloading of the whole site;
they are merely ways likely to reduce downloading by those who don't
need the material.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk IE7 FF2 Op9 Sf3
news:comp.lang.javascript FAQ <URL:http://www.jibbering.com/faq/index.html>.
<URL:http://www.merlyn.demon.co.uk/js-index.htmjscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/TP/BP/Delphi/jscr/&c, FAQ items, links.
Aug 10 '08 #17

This discussion thread is closed

Replies have been disabled for this discussion.