User agent fingerprinting

Gordon

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

Jul 24 '08 #1

Subscribe Reply

7060

Erwin Moller

Gordon schreef:

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller

Jul 24 '08 #2

Jerry Stuckle

Gordon wrote:

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

Won't help much. For instance, everyone at a corporation is probably
running the same user agent, but since they're all behind one server,
they'll all have the same ip address.

Or, they may have an ISP (such as AOL) which uses round-robin servers,
which may route any particular request through one of several servers.
Whichever IP address you get will be solely dependent on the server used
for *that request*.

In short, there is no good way to prevent duplicates without requiring
things like singups. And even that isn't foolproof.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Jul 25 '08 #3

Gordon

On Jul 24, 5:15 pm, Erwin Moller
<Since_humans_r ead_this_I_am_s pammed_too_m... @spamyourself.c omwrote:

Gordon schreef:

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller

Two main reasons for the MD5ing, first, like you say, it reduces the
size of data that needs storing. Second, all I need to know is
whether or not the browser is unique(ish), I don't need to know the IP
address, user agent, or other data that might identify a user. I'm
not entirely sure where the Data Protection Act stands on storing such
data but as I don't need it I figure I'm better off not storing it.

It is only intended as a rough estimate, just to flag some responses
for further scrutiny. We're not going to use the fingerprints to
prevent users from submitting if their fingerprint matches a stored
one.

I seem to remember seeing an article a while back that advised using a
few more browser sent fields to increase the possible uniqueness of
the fingerprint, but I can't find it again now.

Jul 25 '08 #4

C. (http://symcbean.blogspot.com/)

On Jul 25, 9:26 am, Gordon <gordon.mc...@n tlworld.comwrot e:

On Jul 24, 5:15 pm, Erwin Moller

<Since_humans_r ead_this_I_am_s pammed_too_m... @spamyourself.c omwrote:
Gordon schreef:

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

Hi Gordon,

Should work.
Allthough I do not see the benefit of storing them as a md5.
You can also you glue the 3 parts together, and store that.
(Saves a bit CPU)
Or maybe you are trying to make the strings to store a little smaller?

Some possible problems:
- Many people behind 1 IP , all using the same browser.
- false contents for USER_AGENT
- Proxies that strip things out

But if you use it for a rough estimate, I don't see why this shouldn't
work. :-)

Regards,
Erwin Moller

Two main reasons for the MD5ing, first, like you say, it reduces the
size of data that needs storing. Second, all I need to know is
whether or not the browser is unique(ish), I don't need to know the IP
address, user agent, or other data that might identify a user. I'm
not entirely sure where the Data Protection Act stands on storing such
data but as I don't need it I figure I'm better off not storing it.

It is only intended as a rough estimate, just to flag some responses
for further scrutiny. We're not going to use the fingerprints to
prevent users from submitting if their fingerprint matches a stored
one.

I seem to remember seeing an article a while back that advised using a
few more browser sent fields to increase the possible uniqueness of
the fingerprint, but I can't find it again now.

The UA can change over time and as Jerry points out, the other details
are not unique identifiers either. If you need unique identifiers a
better solution would be a validated email address - but how easy is
it to get more than one email address.

Really the most prgmatic solution is to check the browser accepts
cookies and drop one that won't expire for a long time. If you need
something more sophisticated than that then you'll need to go down the
road of authenticated identity management and neither most governments
nor financial institutions have got that one sorted yet.

C.

Jul 25 '08 #5

Chris Jones

On 24 Jul, 16:52, Gordon <gordon.mc...@n tlworld.comwrot e:

As part of a data collecting system (basically a system that collects
submissions from forms), I was planning on logging a bit of data about
the submitting user agent for the purposes of spotting things like
multiple submissions.

We don't want to block multiple submissions altogether, we just want
to be able to easily spot them.

My idea was to use the $_SERVER fields to get some data about the user
agent, md5 the collected data and store it as a field in the
database. *I know that this isn't 100% guaranteed to spot all multiple
submissions but it should be able to catch the majority of them.

I'm planning to use REMOTE_ADDR, USER_AGENT and ACCEPT_LANGUAGE ,
concatinate them and store the result as an MD5 hash.

Are there any other fields that it would be useful to use as well?

You could serialize the $_SERVER array (or selected elements), and
store that in your DB.

As for DPA, the contents of $_SERVER (with the possible exception of
the IP address) is primarily data about the browser client and the
host OS, not the user of the browser, and hence i expect wouldn't fall
within the remit of the Data Protection Act anyway.

Chris

Jul 25 '08 #6

Similar topics

1944

user identification

by: d.schulz81 | last post by:

Hi all, We have about 10 different domains that are linked very closely and we want to identify and track every single user that surfs our websites. Later we want to analyse user paths and find out the search robots with the referring search words. What are the possibilities? Cookies are not accepted by 40 % of our users and in addition to that for each domain a different cookie is created what makes it really

PHP

5827

urllib - changing the user agent

by: Fuzzyman | last post by:

I'm writing a function that will query the comp.lang.python newsgroup via google groups....... (I haven't got nntp access from work..) I'm using urllib (for the first time)..... and google don't seem very keen to let me search the group from within a program - the returned pages all tell me 'you're not allowed to do that' :-) I read in the urllib manual pages : class URLopener( ])

Python

7314

User Agent Detection Logic

by: Fotios | last post by:

Hi guys, I have put together a flexible client-side user agent detector (written in js). I thought that some of you may find it useful. Code is here: http://fotios.cc/software/ua_detect.htm The detector requires javascript 1.0 to work. This translates to netscape 2.0 and IE 3.0 (although maybe IE 2.0 also works with it)

Javascript

3131

Mozilla User Agent ID

by: Don Adams | last post by:

Does the Mozilla browser even support CSS? Very few of my stylesheet formats work in Mozilla. Is there a local stylesheet that is overriding my stylesheet? If so, is there an option to turn off the local stylesheet? Also, I would treat Mozilla like Netscape 4.x in my PHP code; however, I can't see in the user agent (shown below) any way to determine that the browser is Mozilla. I've seen other browser user agents with "Gecko" in them...

HTML / CSS

11723

Set user agent header with webrequest

by: Jon Spivey | last post by:

Hi, I'm using webrequest to scrape some urls, works fine, however one url seems to be throwing an error when it doesn't recognise the user agent. So I need to set the user agent to something like MSIE, I've tried these WebRequest.Headers.Set("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)") WebRequest.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)")...

ASP.NET

1359

User Agents and Menus

by: Raventhorn | last post by:

I am having problems that I also saw people having in the ASP.NET forums with menus and people coming to a site with weird user agent values. Is there a way to access the user agent before the user hits the site so we can determine which ones are screwing up our site? Also, is there a list of user agents available for different browsers?

ASP.NET

2005

User Agent String not passed during HTTP CONNECT w/ proxy

by: =?Utf-8?B?TWlrZTEz?= | last post by:

Sorry for the cross post from dontnet.framework but I'm hoping this may see more activity here. I have written an application to access a web service. The application runs at multiple sites and must cross a proxy server. The application runs fine through all proxy servers but one that is configured to require a "User-Agent" header. I have added the user agent string, however it does not seem to be passed during the HTTP CONNECT to...

.NET Framework

2383

Q: User Agent String not passed during HTTP CONNECT w/ proxy

by: =?Utf-8?B?TWlrZTEz?= | last post by:

Sorry for the cross post; I'm hoping this will get a response here... I have written an application to access a web service. The application runs at multiple sites and must cross a proxy server. The application runs fine through all proxy servers but one that is configured to require a "User-Agent" header. I have added the user agent string, however it does not seem to be passed during the HTTP CONNECT to utilize the proxy server. ...

.NET Framework

2499

Lots of noise about user agent strings

by: RobG | last post by:

Seems developers of mobile applications are pretty much devoted to UA sniffing: <URL: http://wurfl.sourceforge.net/vodafonerant/index.htm > -- Rob

Javascript

9704

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10558

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10318

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10302

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10069

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

9130

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7608

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5503

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5636

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET