473,804 Members | 3,094 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

User agent fingerprinting

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?
Jul 24 '08 #1
5 7060
Gordon schreef:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?
Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller
Jul 24 '08 #2
Gordon wrote:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?
Won't help much. For instance, everyone at a corporation is probably
running the same user agent, but since they're all behind one server,
they'll all have the same ip address.

Or, they may have an ISP (such as AOL) which uses round-robin servers,
which may route any particular request through one of several servers.
Whichever IP address you get will be solely dependent on the server used
for *that request*.

In short, there is no good way to prevent duplicates without requiring
things like singups. And even that isn't foolproof.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jul 25 '08 #3
On Jul 24, 5:15 pm, Erwin Moller
<Since_humans_r ead_this_I_am_s pammed_too_m... @spamyourself.c omwrote:
Gordon schreef:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.
We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.
My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.
I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.
Are there any other fields that it would be useful to use as well?

Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller
Two main reasons for the MD5ing, first, like you say, it reduces the
size of data that needs storing. Second, all I need to know is
whether or not the browser is unique(ish), I don't need to know the IP
address, user agent, or other data that might identify a user. I'm
not entirely sure where the Data Protection Act stands on storing such
data but as I don't need it I figure I'm better off not storing it.

It is only intended as a rough estimate, just to flag some responses
for further scrutiny. We're not going to use the fingerprints to
prevent users from submitting if their fingerprint matches a stored
one.

I seem to remember seeing an article a while back that advised using a
few more browser sent fields to increase the possible uniqueness of
the fingerprint, but I can't find it again now.
Jul 25 '08 #4
On Jul 25, 9:26 am, Gordon <gordon.mc...@n tlworld.comwrot e:
On Jul 24, 5:15 pm, Erwin Moller

<Since_humans_r ead_this_I_am_s pammed_too_m... @spamyourself.c omwrote:
Gordon schreef:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.
We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.
My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.
I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.
Are there any other fields that it would be useful to use as well?
Hi Gordon,
Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?
Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out
But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)
Regards,
Erwin Moller

Two main reasons for the MD5ing, first, like you say, it reduces the
size of data that needs storing. Second, all I need to know is
whether or not the browser is unique(ish), I don't need to know the IP
address, user agent, or other data that might identify a user. I'm
not entirely sure where the Data Protection Act stands on storing such
data but as I don't need it I figure I'm better off not storing it.

It is only intended as a rough estimate, just to flag some responses
for further scrutiny. We're not going to use the fingerprints to
prevent users from submitting if their fingerprint matches a stored
one.

I seem to remember seeing an article a while back that advised using a
few more browser sent fields to increase the possible uniqueness of
the fingerprint, but I can't find it again now.
The UA can change over time and as Jerry points out, the other details
are not unique identifiers either. If you need unique identifiers a
better solution would be a validated email address - but how easy is
it to get more than one email address.

Really the most prgmatic solution is to check the browser accepts
cookies and drop one that won't expire for a long time. If you need
something more sophisticated than that then you'll need to go down the
road of authenticated identity management and neither most governments
nor financial institutions have got that one sorted yet.

C.
Jul 25 '08 #5
On 24 Jul, 16:52, Gordon <gordon.mc...@n tlworld.comwrot e:
As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. *I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

You could serialize the $_SERVER array (or selected elements), and
store that in your DB.

As for DPA, the contents of $_SERVER (with the possible exception of
the IP address) is primarily data about the browser client and the
host OS, not the user of the browser, and hence i expect wouldn't fall
within the remit of the Data Protection Act anyway.

Chris
Jul 25 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1944
by: d.schulz81 | last post by:
Hi all, We have about 10 different domains that are linked very closely and we want to identify and track every single user that surfs our websites. Later we want to analyse user paths and find out the search robots with the referring search words. What are the possibilities? Cookies are not accepted by 40 % of our users and in addition to that for each domain a different cookie is created what makes it really
7
5827
by: Fuzzyman | last post by:
I'm writing a function that will query the comp.lang.python newsgroup via google groups....... (I haven't got nntp access from work..) I'm using urllib (for the first time)..... and google don't seem very keen to let me search the group from within a program - the returned pages all tell me 'you're not allowed to do that' :-) I read in the urllib manual pages : class URLopener( ])
60
7314
by: Fotios | last post by:
Hi guys, I have put together a flexible client-side user agent detector (written in js). I thought that some of you may find it useful. Code is here: http://fotios.cc/software/ua_detect.htm The detector requires javascript 1.0 to work. This translates to netscape 2.0 and IE 3.0 (although maybe IE 2.0 also works with it)
4
3131
by: Don Adams | last post by:
Does the Mozilla browser even support CSS? Very few of my stylesheet formats work in Mozilla. Is there a local stylesheet that is overriding my stylesheet? If so, is there an option to turn off the local stylesheet? Also, I would treat Mozilla like Netscape 4.x in my PHP code; however, I can't see in the user agent (shown below) any way to determine that the browser is Mozilla. I've seen other browser user agents with "Gecko" in them...
1
11723
by: Jon Spivey | last post by:
Hi, I'm using webrequest to scrape some urls, works fine, however one url seems to be throwing an error when it doesn't recognise the user agent. So I need to set the user agent to something like MSIE, I've tried these WebRequest.Headers.Set("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)") WebRequest.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)")...
3
1359
by: Raventhorn | last post by:
I am having problems that I also saw people having in the ASP.NET forums with menus and people coming to a site with weird user agent values. Is there a way to access the user agent before the user hits the site so we can determine which ones are screwing up our site? Also, is there a list of user agents available for different browsers?
0
2005
by: =?Utf-8?B?TWlrZTEz?= | last post by:
Sorry for the cross post from dontnet.framework but I'm hoping this may see more activity here. I have written an application to access a web service. The application runs at multiple sites and must cross a proxy server. The application runs fine through all proxy servers but one that is configured to require a "User-Agent" header. I have added the user agent string, however it does not seem to be passed during the HTTP CONNECT to...
0
2383
by: =?Utf-8?B?TWlrZTEz?= | last post by:
Sorry for the cross post; I'm hoping this will get a response here... I have written an application to access a web service. The application runs at multiple sites and must cross a proxy server. The application runs fine through all proxy servers but one that is configured to require a "User-Agent" header. I have added the user agent string, however it does not seem to be passed during the HTTP CONNECT to utilize the proxy server. ...
35
2499
by: RobG | last post by:
Seems developers of mobile applications are pretty much devoted to UA sniffing: <URL: http://wurfl.sourceforge.net/vodafonerant/index.htm > -- Rob
0
9704
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10558
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10318
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10302
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10069
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9130
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7608
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5503
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5636
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.