473,386 Members | 1,908 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

How to store and compare fingerprint scans?

MMcCarthy
14,534 Expert Mod 8TB
I have an upcoming project which requires the recording of thumbprint/fingerprint scans in a database (currently proposed as sql server). Now I have a few questions I need to explore:
  • How does sql server handle the embedding of images? Is it better to embed the scaned images in the database or to embed the path and store the images in a folder? My concern is the number of files could eventually reach 7 million and the data would need to be queried.
  • What software or otherwise would I need to be able to "read" a thumbprint/fingerprint image and compare it to all records in the database for duplication?

I am open to any suggestions from storage datatype to software to solve these problems. I should mention that I haven't finalised the hardware method of getting the scans. I'm currently exploring the capacity of laptops with thumbprint security capacity as to whether it can be used to read and store multiple thumbprints.

All suggestions welcome.

Mary
Nov 4 '10 #1
22 34647
Although I haven't touched on fingerprint comparison yet I have recently thought about it. Obviously I can't help much with that portion yet. I think you may be hitting on something with the laptop since they have that capability and may have an API you can tie into.

For the images you should take a look at the following paragraph/article. Personally for many images I generally use a pointer to the image due to size constraints.

Although this article references VB as a language the concept and database information does pertain to SQL server as well.

http://msdn.microsoft.com/en-us/libr...58(VS.60).aspx
The obvious advantage to storing images as a file pointer is that only the file path is saved. As a result, your database won't grow as dramatically as it would if you stored the image in a BLOB field. In the example described earlier, with 100 records of 50K images stored in BLOB fields, the database grew to more than 4 MB. The same database using file pointers instead was under 100K. In speed comparisons, the file pointer method is the winner, completing the test in five seconds. hese advantages generally make file pointers the preferred method of saving images.
Nov 4 '10 #2
Mariostg
332 100+
Couple of things that come to my mind for image comparison would be md5 checksum and numpy.
Nov 4 '10 #3
MMcCarthy
14,534 Expert Mod 8TB
From further research it seems I need to look at a middleware solution. The images of the fingerprints could be stored in some folder with a pointer only in the database. However, the comparisons depend on "minutia" and pattern data which would be stored in the database.

So the comparisons are not of the actual images but rather the patterns and minutia points unique to each image. Extracting that data and converting it to a format for storage would require some kind of middleware solution. So that's what I'm concentrating on at the moment.
Nov 5 '10 #4
mshmyob
904 Expert 512MB
In relation to storage of the image files. I am a big believer of storing files outside the database.

If you are using a SQL Server 2008 installation they have a new feature called FILESTREAM that is specifically optimized for working with images.

If you want more info on the pros and cons of the three ways I can write something up for you.

cheers.
Nov 5 '10 #5
NeoPa
32,556 Expert Mod 16PB
That sounds as if it might be interesting Mshmyob.

Bearing in mind the number of files could possibly be so large that the file system would not be an appropriately efficient way to access them (Windows does a reasonably efficient job for reasonable numbers of files but is not designed to handle numbers in the millions too well).
Nov 6 '10 #6
mshmyob
904 Expert 512MB
True Neo but keep in mind that a database is even less efficient at handling large image (or media files). Also a proper SQL installation will be using Windows Server and probably a RAID 10. I would also probably incorporate a seperate RAID 0 for striping these large amounts of files and thereby increase read/write performance just for these files. (fault tolerance would have to be planned out of course)

The file system should also be set to NTFS.

Comparison of the 3 techniques:

1. Storing the image file (BLOB) in the SQL Database

SQl has an 8k page size (which limits the maximum size of each record).
Therefore SQL Server cannot store simple image files in a row like normal records.
Therefore SQL Server is forced to break the BLOB into 8K chunks and store them in a B-Tree structure with pointers.
Databases can become extremely large and unmanagable
BLOB max size is 2G
Advantage: The BLOBs are transactionally consistent with the data.(this has to do with backing up - transaction logs etc.)

2. Storing the image file (BLOB) in the file system
Just add a link in the record where the BLOB is.
Gives storage simplicity and good performance.
Disadvantage: Not transactionally consistent. ie: The BLOB is not syncronized with the data. Not good for backups.

3. Using FileStream
Combines the benefits of each above.
Stores in the file system
BLOB size is limited by file system
Full transactional consistency exists between the BLOB and the database record
to which it’s attached.
BLOBs are included in backup and restore
BLOB objects are accessible via both T-SQL and NTFS streaming APIs
Great streaming performance is provided for large BLOB types
The Windows system cache is used for caching the BLOB data, thus freeing up
the SQL Server buffer cache required for in-database BLOB storage
Disadvantage: Database mirroring cannot be enabled
Snapshots cannot include Filestream data
Filestream data cannot be encrpypted (TDE)


The SQL Server Books Online (BOL) has more details. This is just a quick summary.

cheers,
Nov 6 '10 #7
MMcCarthy
14,534 Expert Mod 8TB
Very interesting information mshmyob. I had forgotten to consider how the storage would affect the backup. As you say the database will be on windows server. Anyway I will do further research on these three methods as you suggest.
Nov 6 '10 #8
NeoPa
32,556 Expert Mod 16PB
Mshmyob:
True Neo but keep in mind that a database is even less efficient at handling large image (or media files). Also a proper SQL installation will be using Windows Server and probably a RAID 10. I would also probably incorporate a separate RAID 0 for striping these large amounts of files and thereby increase read/write performance just for these files. (fault tolerance would have to be planned out of course)
First let me say thank you for the response. It's very helpful.

I would question some of the statements though. Not because they're not true generally, but because I expect the sheer number (rather than size) of the files would make this quite an unusual scenario.

In a file system, the indexing is basically linear. This works very well for smaller numbers, and with the caching involved in machines with much more RAM, even pretty darn well for quite large numbers. When humongous numbers are used though, I would expect (theoretically, as I have never had anything quite like this to deal with) that the performance would drop off sharply. If each entry must be checked until a match is found then this would suffer, particularly when the limits of the caching were reached. A database, on the other hand, far from being less efficient at this, could index the filenames and bring to bear all the optimisations that have been developed over the years to make such a search as quick as possible. Caching would also come into play here of course, but I would expect the searching capabilities of a database system to out-perform those of a file system, particularly when scaled up to the extremely large loads anticipated. Maybe my understanding of how things work is off somewhere, but I would expect to see things as I describe if my ideas are borne out.

For reasons just described, I would question the suggested benefit of performance in point #2.

All that said, your post was still very helpful and I really don't want this to sound like I'm being ungrateful. I'm actually quite interested in anything you may say to indicate where some of my basic thinking may be awry. I need to know if I'm on the wrong lines.

PS. I almost forgot the RAID comments quoted. Everything you say about it is true, but I would expect this to leave the playing field still level as the benefits would apply equally to all possible solutions. Again, let me know if I'm missing a point here. It's perfectly possible.
Nov 6 '10 #9
mshmyob
904 Expert 512MB
Neo you are right in everything you say. I have never worked with this many files either but I have worked with over 130,000 audio files and the issues you mention are valid.

I have found that by creating directories for related sets of files make for a massive performance boost. So for instance there could be a directory for each letter of the alphabet and any file starting with each letter goes into the appropriate directory.

In my case I created a directory for each musical genre and then each of those directories had a directory for each artist and then came the actual audio files.

This scenario gave responses as instataneous as a regular size directory.

Without researching it I assumed that Windows works by each directory level, so therefore the root had about 20 directories (for genres), the next set of directories was artists (so the largest sub directory had a couple hundred artists directories at most), then cames the files themselves (largest a few hundred fiels).

Therefore by the time it got to the largest directory in terms of files it only needed to index a few hundred files.

I could be wrong with my thinking but it worked for me (lol).

You could also spread the files over multiple physical disks so you don't have 7 million on one disk therefore Windows would only need to look at subsets of files.


About the backup - it is to me a very important part of a mission critical application and using the file system without the new Filestream may generate transaction inconsistancies which makes roll forwarding and roll backs useless and therefore in my opinion the backup process is almost useless.

cheers,
Nov 6 '10 #10
mshmyob
904 Expert 512MB
I would also like to build on the so called disadvantages I said about using the FileStream.

No database mirroing allowed - not an issue since you are using RAID. The data is still redundant and recoverable just can't mirror the individual database using SQL itself.

No snapshots allowed - who cares (lol)

Can't use TDE - Only the Filestream data is not encrypted - other data is.

cheers,
Nov 6 '10 #11
NeoPa
32,556 Expert Mod 16PB
I very much agree with what you say Mshmyob. I was considering earlier the solution of using a directory structure, but couldn't think of a suitable approach. The more I think of that the less impressed I am with that thinking. Even assuming worst case scenario there would need to be a pretty long number associated as a PK for all this data and a very basic string of the first (and for other levels subsequent) digits could be used for subfolder names. Certainly it should only be the individual folders that need to be kept to some maximum of files.

Also, whatever RAID system is used (other than simple striping of course) will, as you quite rightly say, provide the requisite level of data redundancy for the project.

Thanks for all your help :-)
Nov 6 '10 #12
mshmyob
904 Expert 512MB
We'll let Mary decide the way to go since she is the one being paid the BIG bucks to come up with the proper solution :-)

cheers,
Nov 6 '10 #13
MMcCarthy
14,534 Expert Mod 8TB
OK Guys

To be a bit more precise.

If I want to store 1 million records with 1 thumbprint and 1 photo each. What is the best way to store those images to achieve the fastest search speed on the database.

In other words, putting database size to one side would storing the images in SQL Server rather than in a file give me greater search speed.

I believe the biggest problem I have is that as each record is added to SQL Server the database would first have to be searched for existing fingerprints.

So all data transfer (importation) and searching would be done by SQL server on the server. Consider the fingerprint data as a unique index.

I hope this make sense.

Mary
Nov 8 '10 #14
mshmyob
904 Expert 512MB
How big you think the image files will be?

New thought: Is this database mostly used for queries? If so would an OLAP database fit the bill better than a transaction database. By using Analysis Services you would probably eliminate problems of performance even with the images in the database (depending on size of image).

cheers,
Nov 9 '10 #15
NeoPa
32,556 Expert Mod 16PB
Let me just clarify that this work is related to a project that Mary and I are working on together.

mshmyob:
How big you think the image files will be?
We would think that each image (thmbprints and faces) would be in the region of 1-2 MB each. As the unique data is only the thumbprint, this is the only important item as far as we can see. Personally, I struggle to see here much relevance of the size except in the actual comparison itself. Maybe I'm missing a trick though.

mshmyob:
New thought: Is this database mostly used for queries? If so would an OLAP database fit the bill better than a transaction database. By using Analysis Services you would probably eliminate problems of performance even with the images in the database (depending on size of image).
The biggest problem we perceive will be the addition of new records. Each time a new record is inserted, the database system (SQL Server) would need to check through all the existing records to ensure none existed already with a matching thumbprint. It is this overhead that is going to be severe, as each value in this index will be particularly vast.

The print-matching software itself will determine what is stored for each - or maybe an API call would need to be made for the comparison of each image - we're not sure which yet, but this is the situation that we see as a potential system-killer.
Nov 9 '10 #16
mshmyob
904 Expert 512MB
The reason I asked the size is that MS recommends using Filestream if the image file is larger than 1Meg.

cheers,
Nov 9 '10 #17
mshmyob
904 Expert 512MB
I guess it depends on how the print matching software works. Are you writing that or is that already done by a third party?


Not knowing much about prints but a quick search shows prints are catagorized into a minimum of five classes - this would mean not all 7 million prints have to be searched for each time. With a little more research I would assume prints have even more identifying features to look for and can therefore narrow down the search even more thereby eliminating even more records from being searched.

cheers,
Nov 9 '10 #18
MMcCarthy
14,534 Expert Mod 8TB
Well my understanding is the minimum dpi requirement for fingerprint images is 500 dpi so I'm guessing they could go over 1mb
Nov 9 '10 #19
MMcCarthy
14,534 Expert Mod 8TB
I guess it depends on how the print matching software works. Are you writing that or is that already done by a third party?
We're planning on using a middleware solution for that. It's possible that they have their own requirements for image storage. We're checking that out over the next couple of days.
Nov 9 '10 #20
NeoPa
32,556 Expert Mod 16PB
That's very helpful actually. I am tasked with checking with the provider for a better understanding of how their software works. I'm inclined to guess that what I learn may well make this discussion moot (although far from uninteresting and well worth the effort to get a better understanding of some of the issues).

As far as file sizes go, I'm easily convinced that, as a general rule, better results are achieved using a filestream for larger files. As I don't see our current situation as one that is covered by the general rule though (I see it a s quite far outside the ordinary), I don't feel the reasons and rationale they were basing the comment on are appropriate here. This is why I'm hesitant to take such advice on board. Not because I think it is generally wrong or stupid, but simply because this situation is so non-standard.

Anyway, I rather tend now towards your idea that the API for the software may save us from having to approach this as a problem at all. We let their software take the strain. Sounds good to me :-)
Nov 9 '10 #21
Frinavale
9,735 Expert Mod 8TB
Guess I'm a bit late to join the party.
The company I work for integrates with biometric readers and we are looking at implementing a very similar solution. Currently we do not store any finger prints (they are stored at the device level) but we are aiming to use a USB finger print scanner in the future (for logging into our software and for other business logic) so, we are planning on storing finger prints within our software's database.

You have already concluded what I was going to recommend: Use the software that the manufactures provide for retrieving and comparing finger prints. Sometimes it's expensive to do this...for example, in order for us to integrate with the hardware that we are considering it will costs $1000 to become a "member" and then the purchase of the SDK (API) is an additional $300. Ever additional year, you have are charged the $1000 membership fee just to remain current.

As for storing the images in the database...well I'm interested in what you come up with because this is further than we've got.

If you're using SQL Server 2008, then you will have the added benefits of encryption and security that are built into the database. It will help to secure the data.

-Frinny
Nov 16 '10 #22
NeoPa
32,556 Expert Mod 16PB
Thanks for this Frinny.

I'm sure we'll chat about anything we discover :-)
Nov 17 '10 #23

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: lewixlabs | last post by:
Hello, I have to develop an application which use Microsoft Fingerprint Reader. This device can be handled like a scanner? So can i use a class inside Platform SDK or API? There are resources?...
14
by: ford_desperado | last post by:
Why isn't ALLOW REVERSE SCANS the default? Why do we have to - drop PK - create an index - recreate PK What are the advantages of indexes that do not allow reverse scans?
23
by: Roel Melchers | last post by:
My ACCESS-database contains all members of my association. When the members attend to a meeting I want to record their presence. When they enter they identify themselves by putting their finger...
1
by: Roel Melchers | last post by:
How to connect my new bought fingerprint reader to my database. My fingerprint reader came with a SDK (Software Development Kit). With that SDK, it said, it is very easy to connect the reader to...
3
by: Javier Camacho | last post by:
Hi , We want to deploy a Microsoft Fingerprint solution in order to authenticate our ASP.NET applications. I want to know if exists a SDK to Microsoft FingerPrint Reader or if We have to use the...
8
by: Jon Vaughan | last post by:
Has anyone had any exposure to a fingerprint SDK ? If so what did you use and how and how did you find it ? Thanks
0
by: durumdara | last post by:
Hi! I have an application (Python + wx) somewhere. The users use they fingerprint to log. But we have a problem that in this time the fingerprint logon is do REAL windows logon, so every user...
0
by: Esmael | last post by:
Hi to all... Am new here... am developing a sort of TimeKeeping System... Here is the information.: OS -Win XP Prog. Language- VB6 Database- MS Access2000 and Finger Print Scanner...
4
by: dlfshweta | last post by:
I am doing work on FINGERPRINT RECOGNITION SYSTEM, I am using VB.NET AS FRONT END SQL SERVER 2000 AS BACK END, i had saved fingerprint images in sqlserver using image datatype, images are saving...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.