By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,075 Members | 950 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,075 IT Pros & Developers. It's quick & easy.

User agent fingerprinting

P: n/a
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?
Jul 24 '08 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Gordon schreef:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?
Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller
Jul 24 '08 #2

P: n/a
Gordon wrote:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?
Won't help much. For instance, everyone at a corporation is probably
running the same user agent, but since they're all behind one server,
they'll all have the same ip address.

Or, they may have an ISP (such as AOL) which uses round-robin servers,
which may route any particular request through one of several servers.
Whichever IP address you get will be solely dependent on the server used
for *that request*.

In short, there is no good way to prevent duplicates without requiring
things like singups. And even that isn't foolproof.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Jul 25 '08 #3

P: n/a
On Jul 24, 5:15 pm, Erwin Moller
<Since_humans_read_this_I_am_spammed_too_m...@spam yourself.comwrote:
Gordon schreef:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.
We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.
My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.
I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE,
concatinate them and store the result as an MD5 hash.
Are there any other fields that it would be useful to use as well?

Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller
Two main reasons for the MD5ing, first, like you say, it reduces the
size of data that needs storing. Second, all I need to know is
whether or not the browser is unique(ish), I don't need to know the IP
address, user agent, or other data that might identify a user. I'm
not entirely sure where the Data Protection Act stands on storing such
data but as I don't need it I figure I'm better off not storing it.

It is only intended as a rough estimate, just to flag some responses
for further scrutiny. We're not going to use the fingerprints to
prevent users from submitting if their fingerprint matches a stored
one.

I seem to remember seeing an article a while back that advised using a
few more browser sent fields to increase the possible uniqueness of
the fingerprint, but I can't find it again now.
Jul 25 '08 #4

P: n/a
On Jul 25, 9:26 am, Gordon <gordon.mc...@ntlworld.comwrote:
On Jul 24, 5:15 pm, Erwin Moller

<Since_humans_read_this_I_am_spammed_too_m...@spam yourself.comwrote:
Gordon schreef:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.
We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.
My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.
I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE,
concatinate them and store the result as an MD5 hash.
Are there any other fields that it would be useful to use as well?
Hi Gordon,
Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?
Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out
But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)
Regards,
Erwin Moller

Two main reasons for the MD5ing, first, like you say, it reduces the
size of data that needs storing. Second, all I need to know is
whether or not the browser is unique(ish), I don't need to know the IP
address, user agent, or other data that might identify a user. I'm
not entirely sure where the Data Protection Act stands on storing such
data but as I don't need it I figure I'm better off not storing it.

It is only intended as a rough estimate, just to flag some responses
for further scrutiny. We're not going to use the fingerprints to
prevent users from submitting if their fingerprint matches a stored
one.

I seem to remember seeing an article a while back that advised using a
few more browser sent fields to increase the possible uniqueness of
the fingerprint, but I can't find it again now.
The UA can change over time and as Jerry points out, the other details
are not unique identifiers either. If you need unique identifiers a
better solution would be a validated email address - but how easy is
it to get more than one email address.

Really the most prgmatic solution is to check the browser accepts
cookies and drop one that won't expire for a long time. If you need
something more sophisticated than that then you'll need to go down the
road of authenticated identity management and neither most governments
nor financial institutions have got that one sorted yet.

C.
Jul 25 '08 #5

P: n/a
On 24 Jul, 16:52, Gordon <gordon.mc...@ntlworld.comwrote:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. *I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

You could serialize the $_SERVER array (or selected elements), and
store that in your DB.

As for DPA, the contents of $_SERVER (with the possible exception of
the IP address) is primarily data about the browser client and the
host OS, not the user of the browser, and hence i expect wouldn't fall
within the remit of the Data Protection Act anyway.

Chris
Jul 25 '08 #6

This discussion thread is closed

Replies have been disabled for this discussion.