472,126 Members | 1,452 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,126 software developers and data experts.

New to MySQL......

I have a client that desires a new application for use at several stores in
different cities. He wants "real time" access to all store data and the
ability to continue to run locally even if the internet connection goes
down.

It occurs to me that a local MySQL instance should definitely be
in each store to continue work if the net is down for any reason. But, what
is the best way to synchronize/share data among all of the stores?

From what I have read on the MySQL site, you need MySQL Cluster to do
synchronous replication, but it won't work over a network connection less
than 100 mb.

Is there any hope of synchronous replication using the free version of
MySQL. And, just how stable if the free edition (with it's bleeding-edge
code and such)?

Any help that you give will be greatly appreciated.

Thanks!
Apr 18 '06 #1
13 2173
>I have a client that desires a new application for use at several stores in
different cities. He wants "real time" access to all store data and the
ability to continue to run locally even if the internet connection goes
down.

It occurs to me that a local MySQL instance should definitely be
in each store to continue work if the net is down for any reason. But, what
is the best way to synchronize/share data among all of the stores?

From what I have read on the MySQL site, you need MySQL Cluster to do
synchronous replication, but it won't work over a network connection less
than 100 mb.


That's not the only form of replication. Plain old MySQL can do
replication although it doesn't have all the features of the cluster.
You might have to manually switch masters if the net goes down.
Depending on the query volume and how up to date the slaves have to
be, a 1200bps modem connection might be sufficient.

My employer tried a mail server setup with MySQL Cluster, using
MySQL to maintain info on where the mailbox was and user options,
across several machines. It didn't work very well. We'd get
inconsistent data and storage node crashes that would essentially
take out the whole setup. Eventually we used conventional replication,
with changes going to the master and each machine using its local
copy for reads. Perhaps MySQL Cluster has improved since then.

What is your required data setup? Each store is master of its own
data and the head office also wants a real-time (read-only?) copy
of it? Or does the head office need to make changes, and stores
need to share each other's data? I don't know how to do conventional
replication with, for example, the head office machine slaving one
database off of each store master.

Gordon L. Burditt
Apr 18 '06 #2

"Gordon Burditt" <go***********@burditt.org> wrote in message
news:12*************@corp.supernews.com...
I have a client that desires a new application for use at several stores
in
different cities. He wants "real time" access to all store data and the
ability to continue to run locally even if the internet connection goes
down.

It occurs to me that a local MySQL instance should definitely be
in each store to continue work if the net is down for any reason. But,
what
is the best way to synchronize/share data among all of the stores?

From what I have read on the MySQL site, you need MySQL Cluster to do
synchronous replication, but it won't work over a network connection less
than 100 mb.


That's not the only form of replication. Plain old MySQL can do
replication although it doesn't have all the features of the cluster.
You might have to manually switch masters if the net goes down.
Depending on the query volume and how up to date the slaves have to
be, a 1200bps modem connection might be sufficient.

My employer tried a mail server setup with MySQL Cluster, using
MySQL to maintain info on where the mailbox was and user options,
across several machines. It didn't work very well. We'd get
inconsistent data and storage node crashes that would essentially
take out the whole setup. Eventually we used conventional replication,
with changes going to the master and each machine using its local
copy for reads. Perhaps MySQL Cluster has improved since then.

What is your required data setup? Each store is master of its own
data and the head office also wants a real-time (read-only?) copy
of it? Or does the head office need to make changes, and stores
need to share each other's data? I don't know how to do conventional
replication with, for example, the head office machine slaving one
database off of each store master.


The concept that they came up with was that they wanted real-time data
access to do store-to-store inventory lookups and transfers and to track the
store activities from the central office.

They also want to share customer data (in case a customer visits another
store or in the event that a customer has not finished paying for
merchandise and is behind on payments).

They also want all store data backup up in case of a catastrophic event at a
store (like theft of IT resources or fire).

They also want a local store of (at least) that store's data - or (if
possible) all store data until the connection to the central database was
lost.

And, let's not forget a way to post back to a central database (if one is
used) any local transactions done at a disconnected store once the
connection to the group or central database is restored.

Simple, huh?

JH
Apr 18 '06 #3
Jim Hubbard wrote:
They also want a local store of (at least) that store's data - or (if
possible) all store data until the connection to the central database was
lost.

And, let's not forget a way to post back to a central database (if one is
used) any local transactions done at a disconnected store once the
connection to the group or central database is restored.

Simple, huh?


Here's how I would design this system:

1. They perform write operations to a database connection to the master
database, which is at the home office.

2. The local sites keep a copy of the master database, which is updated
periodically with standard replication.

3. Read operations are performed on this local database.

4. If the connection to the master database goes down, the apps need to
detect this error and instead write their changes to a local database.
I'd recommend this _not_ be their local copy of the master database, but
a separate instance. I'll call this the failover write database.

5. Thus when they do queries against their local read-only copy of the
master, they should combine the query results with any changes stored in
their local failover write database. Hopefully this will be minimal,
but the application code has to account for it.

7. When the connection is restored, the contents of the failover write
database are sent to the master database. Then the local system deletes
the contents of the failover write database. Wait a few moments for the
master to process the new data, update keys, etc. Then re-synchronize
the read-only copy of the master. Thus the failover changes have moved
to the master database, and back again via replication. But now they
have had their auto-generated primary keys assigned, and other data
cleanup tasks. You might want to make the local app deny the user from
making any further changes during these moments. Also note that other
sites may have queued up changes, so the initial replication sync after
an outage may contain more volume of data than usual.

This still leaves the problem of what happens if two separate sites
create records for the same real-world entity (like a customer who calls
two separate sites). Resolving these cases is something which is not
100% automatable, in any RDBMS system.

You must plan for some living person to review new data and performs
cleanup at the home office. Some data cleanup is automatable, but it is
simply not possible to handle all cases without a human.

Regards,
Bill K.
Apr 18 '06 #4
"Bill Karwin" <bi**@karwin.com> wrote in message
news:e2********@enews2.newsguy.com...
Jim Hubbard wrote:
They also want a local store of (at least) that store's data - or (if
possible) all store data until the connection to the central database was
lost.

And, let's not forget a way to post back to a central database (if one is
used) any local transactions done at a disconnected store once the
connection to the group or central database is restored.

Simple, huh?
Here's how I would design this system:

1. They perform write operations to a database connection to the master
database, which is at the home office.

2. The local sites keep a copy of the master database, which is updated
periodically with standard replication.


This is where I hit my first "bump in the road". The client wants to use
the FREE version of MSDE or MySQL. Neither version offers 2-way
replication.

Same things I quit working in corporate America 3 years ago over. They want
it FREE and they want it NOW.

I'm about to tell him to get someone else. This is starting the same way
all of the pains in my ass did in corporate America. I won't do it for any
amount if it is going to be one of THOSE applications/experiences.

3. Read operations are performed on this local database.
Only in case of connection failure, right? They need to do immediate saves
and reads of that data....so writing to a central store that would not
update the local store for say 5 minutes even would not be acceptable.

4. If the connection to the master database goes down, the apps need to
detect this error and instead write their changes to a local database. I'd
recommend this _not_ be their local copy of the master database, but a
separate instance. I'll call this the failover write database.

5. Thus when they do queries against their local read-only copy of the
master, they should combine the query results with any changes stored in
their local failover write database. Hopefully this will be minimal, but
the application code has to account for it.

7. When the connection is restored, the contents of the failover write
database are sent to the master database. Then the local system deletes
the contents of the failover write database. Wait a few moments for the
master to process the new data, update keys, etc. Then re-synchronize the
read-only copy of the master. Thus the failover changes have moved to the
master database, and back again via replication. But now they have had
their auto-generated primary keys assigned, and other data cleanup tasks.
You might want to make the local app deny the user from making any further
changes during these moments. Also note that other sites may have queued
up changes, so the initial replication sync after an outage may contain
more volume of data than usual.

This still leaves the problem of what happens if two separate sites create
records for the same real-world entity (like a customer who calls two
separate sites). Resolving these cases is something which is not 100%
automatable, in any RDBMS system.

You must plan for some living person to review new data and performs
cleanup at the home office. Some data cleanup is automatable, but it is
simply not possible to handle all cases without a human.


Actually, I was thinking of a more simple scenario and I'd be greatful for
your input on it.....

What if the local app always writes and reads from the local db. Every
record change has an associated timestamp. A seperate process handles
replicating the data to a central store (used to restore local databases in
case data is lost due to a catastrophic event at the store) by updating
records changed in either the central db or local db by selecting records
with timestamps greater than the last time/date that the records were
synchronized. This could be real-time, as long as a connection is
available.

As it will be virtually unheard of that any 2 changes will have the exact
same timestamp, the newer change always overrides the older changes, but the
older chnages are recorded anyway (for historical data - in this case they
may be customer notes, etc.).

The central office can attach to any database (including the central db) to
get their data - thus the same app used by the clerks can be used by central
office personnel with the proper security clearances for different
functions.

IF any store is offline (or the central db is not online) the UI should
indicate that the data is valid as of the last synchronization.

Does my simple solution miss anything major?

Thanks!

Jim
Apr 18 '06 #5
Jim Hubbard wrote:
"Bill Karwin" <bi**@karwin.com> wrote in message
2. The local sites keep a copy of the master database, which is updated
periodically with standard replication.
This is where I hit my first "bump in the road". The client wants to use
the FREE version of MSDE or MySQL. Neither version offers 2-way
replication.


I'm not aware that there is any two-way replication solution for MySQL,
regardless of free vs. commercial. Can you cite a reference?
Same things I quit working in corporate America 3 years ago over. They want
it FREE and they want it NOW.

I'm about to tell him to get someone else. This is starting the same way
all of the pains in my ass did in corporate America. I won't do it for any
amount if it is going to be one of THOSE applications/experiences.
Indeed. What I would advise is to put together a proposal of what it
will cost in time and money, call attention to the parts of their
requirements that are the biggest cost drivers, and offer alternatives.

Example anecdote: I went to a pre-sales meeting for a company that made
custom trophy parts. They wanted to have an online storefront, and one
of the managers insisted that he wanted a guarantee of "five nines" of
availability (I assume he read it in a magazine). In other words,
99.999% uptime, not best-case, not average-case, but *guaranteed*. But
he didn't want to spend more than, say, $3000 to $5000.

My eyebrows raised, and I offered the comment as respectfully and calmly
as I could that this level of service doesn't cost a few thousands, it
costs tens of millions. Multiple co-lo facilities in different parts of
the US. Full set of redundant servers at each site. Continuous,
synchronous data mirroring between sites. Dynamic DNS failover.
Round-the-clock staff. Not even banks and stock exchanges need this
much availability.

I further offered that if his business really was the #1 vendor of
custom trophy parts, that if a high school coach wants to order online
at 11pm on Sunday night, and the website says "performing maintenance,
come back in 30 minutes," the customer is very likely to come back,
instead of going to the competitor's website, which is likely to be
non-existant anyway.

Anyway, the point is that as a consultant, you should be very agreeable
to their requirements at first, and make an accurate and well-supported
estimate for what it would take to implement as they have described.
Then offer a few alternatives for the features that were the primary
cost drivers, saving them tons of money while preserving the value to
their business. That's the way to be thought of as a very valuable
consultant.

On the other hand, if you already hate this client, you may be better
off finding some open-source package on sourceforge.net that does most
of what they need, pitch it to them, and bill them for your time. Then
let them find a cheaper consultant to do a bit of customization and
deployment.

See also:
http://sourceforge.net/search/?words...of_search=soft
http://www.daffodildb.com/replicator/index.html
3. Read operations are performed on this local database.

Only in case of connection failure, right? They need to do immediate saves
and reads of that data....so writing to a central store that would not
update the local store for say 5 minutes even would not be acceptable.


Hmm, I had in mind all reads go to the local database, for speed and to
avoid overburdening the central database. The main problem is with
things like auto-generated primary key values. The main site and the
local site aren't synchronized with respect to the next auto-gen key
value per table. So if you insert values to the local database during
an outage, and then you try to replicate to the central database when
the connection is restored, how to you prevent your records from
conflicting with the records entered simultaneously at other sites?

One answer is never to use auto-generated "pseudokeys"; always use some
combination of real attributes in the table as the key. Like firstname,
lastname, date-of-birth. Or SSN.

Another answer is to enter new data with an interim primary key value,
let the central database override this value to make sure it's unique,
and then make sure the local system knows how to reconcile the data when
the same record comes back from the central system with a different
primary key.

Yet another solution is to make the primary key "pseudokey" be a
two-column key, one using an auto-generated value per site, and another
a constant "site id".

The problem with any of these problems is that you can get two people at
different sites entering data for the same entity. Then you need some
way to merge these records and dependent data too. This can be tricky,
since people can make spelling mistakes, or enter "Jim Brown" instead of
"James R. Brown" for example. How can you tell that these refer to the
same real-world entity? In general, you can automate some of this, but
there are always going to be more types of ambiguity than those for
which you can predict and code a resolution.

Actually, I was thinking of a more simple scenario and I'd be greatful for
your input on it.....

What if the local app always writes and reads from the local db. Every
record change has an associated timestamp. A seperate process handles
replicating the data to a central store (used to restore local databases in
case data is lost due to a catastrophic event at the store) by updating
records changed in either the central db or local db by selecting records
with timestamps greater than the last time/date that the records were
synchronized. This could be real-time, as long as a connection is
available.
Do you use auto-generated primary keys? How do you tell the client site
to delete the entry they made, in favor of the newer entry? How do you
resolve multiple client sites generating the same value in the primary key?
As it will be virtually unheard of that any 2 changes will have the exact
same timestamp, the newer change always overrides the older changes, but the
older chnages are recorded anyway (for historical data - in this case they
may be customer notes, etc.).
Right, or in case the earlier entry actually did have the correct
information.
The central office can attach to any database (including the central db) to
get their data - thus the same app used by the clerks can be used by central
office personnel with the proper security clearances for different
functions.

IF any store is offline (or the central db is not online) the UI should
indicate that the data is valid as of the last synchronization.

Does my simple solution miss anything major?


This seems like as good a solution as any, if you can resolve the
problems that are common in bi-directional replication.

Bi-directional replication will always have some potential for
ambiguously conflicting entries, or duplicate entries. A human being
must review these and decide on the correct resolution.

Regards,
Bill K.
Apr 19 '06 #6
Bill Karwin wrote:
I'm not aware that there is any two-way replication solution for MySQL,
regardless of free vs. commercial. Can you cite a reference?


Obviously from the links I included, there are such solutions.
Just not offered by MySQL AB.

Regards,
Bill K.
Apr 19 '06 #7
>>>2. The local sites keep a copy of the master database, which is updated
periodically with standard replication.


This is where I hit my first "bump in the road". The client wants to use
the FREE version of MSDE or MySQL. Neither version offers 2-way
replication.


I'm not aware that there is any two-way replication solution for MySQL,
regardless of free vs. commercial. Can you cite a reference?


According to the manual for MySQL 5.0.19 (NOT cluster) on the
dev.mysql.com site, a given server can have one master and multiple
slaves. You are allowed to connect them in a circular setup. There
is little provision for resolving conflicts in near-simulataneous
updates. This is probably not acceptable for what the original
poster wants. But it is arguably two-way replication, particularly
with the setup on two servers.

The server variables auto_increment_increment and auto_increment_offset
are provided to deal with the assignment of auto_increment values.
From the descriptions I see, it looks like they work with multiple
masters, but I don't know how you have multiple masters. The idea
is that each server assigns its own set of numbers, e.g. with 10
servers, one server assigns 11, 21, 31, 41, ... and another server
assigns 13, 23, 33, 43, ... .

Gordon L. Burditt
Apr 19 '06 #8

"Bill Karwin" <bi**@karwin.com> wrote in message
news:e2********@enews4.newsguy.com...
Jim Hubbard wrote:
"Bill Karwin" <bi**@karwin.com> wrote in message
2. The local sites keep a copy of the master database, which is updated
periodically with standard replication.
This is where I hit my first "bump in the road". The client wants to use
the FREE version of MSDE or MySQL. Neither version offers 2-way
replication.


I'm not aware that there is any two-way replication solution for MySQL,
regardless of free vs. commercial. Can you cite a reference?


No I can't. This is what is needed for this scenario, but I can find no db
that includes it as a feature.
Same things I quit working in corporate America 3 years ago over. They
want it FREE and they want it NOW.

I'm about to tell him to get someone else. This is starting the same way
all of the pains in my ass did in corporate America. I won't do it for
any amount if it is going to be one of THOSE applications/experiences.
Indeed. What I would advise is to put together a proposal of what it will
cost in time and money, call attention to the parts of their requirements
that are the biggest cost drivers, and offer alternatives.

Example anecdote: I went to a pre-sales meeting for a company that made
custom trophy parts. They wanted to have an online storefront, and one of
the managers insisted that he wanted a guarantee of "five nines" of
availability (I assume he read it in a magazine). In other words, 99.999%
uptime, not best-case, not average-case, but *guaranteed*. But he didn't
want to spend more than, say, $3000 to $5000.

My eyebrows raised, and I offered the comment as respectfully and calmly
as I could that this level of service doesn't cost a few thousands, it
costs tens of millions. Multiple co-lo facilities in different parts of
the US. Full set of redundant servers at each site. Continuous,
synchronous data mirroring between sites. Dynamic DNS failover.
Round-the-clock staff. Not even banks and stock exchanges need this much
availability.

I further offered that if his business really was the #1 vendor of custom
trophy parts, that if a high school coach wants to order online at 11pm on
Sunday night, and the website says "performing maintenance, come back in
30 minutes," the customer is very likely to come back, instead of going to
the competitor's website, which is likely to be non-existant anyway.

Anyway, the point is that as a consultant, you should be very agreeable to
their requirements at first, and make an accurate and well-supported
estimate for what it would take to implement as they have described. Then
offer a few alternatives for the features that were the primary cost
drivers, saving them tons of money while preserving the value to their
business. That's the way to be thought of as a very valuable consultant.

On the other hand, if you already hate this client, you may be better off
finding some open-source package on sourceforge.net that does most of what
they need, pitch it to them, and bill them for your time. Then let them
find a cheaper consultant to do a bit of customization and deployment.

See also:
http://sourceforge.net/search/?words...of_search=soft
http://www.daffodildb.com/replicator/index.html


I like the client very much. But, he is getting pressure from above him to
push the project forward. I will simply give them the timeline and they can
accept it or refuse it.

3. Read operations are performed on this local database.

Only in case of connection failure, right? They need to do immediate
saves and reads of that data....so writing to a central store that would
not update the local store for say 5 minutes even would not be
acceptable.


Hmm, I had in mind all reads go to the local database, for speed and to
avoid overburdening the central database. The main problem is with things
like auto-generated primary key values. The main site and the local site
aren't synchronized with respect to the next auto-gen key value per table.
So if you insert values to the local database during an outage, and then
you try to replicate to the central database when the connection is
restored, how to you prevent your records from conflicting with the
records entered simultaneously at other sites?

One answer is never to use auto-generated "pseudokeys"; always use some
combination of real attributes in the table as the key. Like firstname,
lastname, date-of-birth. Or SSN.

Another answer is to enter new data with an interim primary key value, let
the central database override this value to make sure it's unique, and
then make sure the local system knows how to reconcile the data when the
same record comes back from the central system with a different primary
key.

Yet another solution is to make the primary key "pseudokey" be a
two-column key, one using an auto-generated value per site, and another a
constant "site id".

The problem with any of these problems is that you can get two people at
different sites entering data for the same entity. Then you need some way
to merge these records and dependent data too. This can be tricky, since
people can make spelling mistakes, or enter "Jim Brown" instead of "James
R. Brown" for example. How can you tell that these refer to the same
real-world entity? In general, you can automate some of this, but there
are always going to be more types of ambiguity than those for which you
can predict and code a resolution.


I agree about not using auto-generated keys. I will be sure to avoid them.

As fas as mis-spellings are concerned, the client uses a social security
number and/or driver's license number as a part of the identification
process. No 2 clients can have the same SS# or DL#. If the SS# or DL# is
mis-typed, there is really nothing I can do about that.

If the SS #s or DL #s are the same, but other data differs, I can either
flag the record(s) for a user to look at, or simply accept the latest data
as a "correction" to the data entered before it. When they search for user
data, they require the SS# or DL# from the customer (I think). Otherwise,
we can do a soundex search to list all possible spellings or they can search
by last name and close in from there.

I am sure that they will want multiple lookups to be a part of the system.

Actually, I was thinking of a more simple scenario and I'd be greatful
for your input on it.....

What if the local app always writes and reads from the local db. Every
record change has an associated timestamp. A seperate process handles
replicating the data to a central store (used to restore local databases
in case data is lost due to a catastrophic event at the store) by
updating records changed in either the central db or local db by
selecting records with timestamps greater than the last time/date that
the records were synchronized. This could be real-time, as long as a
connection is available.
Do you use auto-generated primary keys?


I will not.
How do you tell the client site to delete the entry they made, in favor of
the newer entry?
It won't. It will simply change the data, recording the change author and
date/time.
How do you resolve multiple client sites generating the same value in the
primary key?
I will enter the multiple records in the order in which they were entered
according to date/time. Newer data always overwrites older data (even if
the difference in only be a fraction of a second). This "update assumption"
will be explained to the client.

I could (although it would increase the database size quite a bit) log all
data changes for a data history log. They would have to have the ability to
archive or delete history logs to keep the database size (and performance)
within reasonable limits.
As it will be virtually unheard of that any 2 changes will have the exact
same timestamp, the newer change always overrides the older changes, but
the older chnages are recorded anyway (for historical data - in this case
they may be customer notes, etc.).
Right, or in case the earlier entry actually did have the correct
information.


Right.
The central office can attach to any database (including the central db)
to get their data - thus the same app used by the clerks can be used by
central office personnel with the proper security clearances for
different functions.

IF any store is offline (or the central db is not online) the UI should
indicate that the data is valid as of the last synchronization.

Does my simple solution miss anything major?


This seems like as good a solution as any, if you can resolve the problems
that are common in bi-directional replication.

Bi-directional replication will always have some potential for ambiguously
conflicting entries, or duplicate entries. A human being must review
these and decide on the correct resolution.


I agree. This canot be completely eliminated without losing some control of
the data entered into the system.

I could generate a log of "assumed alterations" (where the business logic
made assumptions about conflicting entries) and show that as a part of the
UI, so that they can let it ride.....or tweak it themselves. This would
take a bit of work......but, this while project will take a bit of
work......

Thanks so much for your thoughts on this db project!

Jim
Apr 20 '06 #9
>As fas as mis-spellings are concerned, the client uses a social security
number and/or driver's license number as a part of the identification
process. No 2 clients can have the same SS# or DL#. If the SS# or DL# is
mis-typed, there is really nothing I can do about that.


I don't think there is any guarantee that the Social Security
Administration assigns unique SSNs to different people. Or that
they even *try* to do that. (Somewhere I remember a quote from
someone at the SSA saying there was no need for that. Plus, there
may be illegal immigrants just making one up.)

There's no guarantee that someone's DL# won't match someone else's SSN,
or that two DL#'s from different states won't match.

Gordon L. Burditt
Apr 20 '06 #10

"Gordon Burditt" <go****@hammy.burditt.org> wrote in message
news:12*************@corp.supernews.com...
As fas as mis-spellings are concerned, the client uses a social security
number and/or driver's license number as a part of the identification
process. No 2 clients can have the same SS# or DL#. If the SS# or DL# is
mis-typed, there is really nothing I can do about that.
I don't think there is any guarantee that the Social Security
Administration assigns unique SSNs to different people. Or that
they even *try* to do that. (Somewhere I remember a quote from
someone at the SSA saying there was no need for that. Plus, there
may be illegal immigrants just making one up.)


Social Securit numbers are supposed to be unique according to the Social
Security Administration....see http://www.gao.gov/new.items/d03941t.pdf .

The alien thing.....well, citizens can do that too. This chain installs
specialized car parts and collects SS photocopy, DL photocopy, car photo,
VIN number, tag number from the vehicle being worked on.

If they need it, there is a service that checks SS numbers in real time and
reports back info on the # so you can check for SS # of dead people, fakes,
and verify simple info like birthday, state of birth and name.

There's no guarantee that someone's DL# won't match someone else's SSN,
or that two DL#'s from different states won't match.


You have to use the State + DL # to be unique. SSN are ###-##-#### format.
I am not aware of any states with that format. GA used to allow the use of
the SSN for the license number, but only if you showed adequate proof of
identity and a SS card.

No system is absolutely fool-proof......but we can make it unprofitable to
break it.

Thanks!

Jim
Apr 20 '06 #11
Jim Hubbard wrote:
Social Securit numbers are supposed to be unique according to the Social
Security Administration....see http://www.gao.gov/new.items/d03941t.pdf .


Jim, you are doing an excellent job of designing this system, and I
respect the thought you have put into it.

For what it's worth, here's a FAQ page with some counterpoints to the
practice of using SSN as a database key:

http://www.cpsr.org/prevsite/cpsr/pr...-addendum.html

Regards,
Bill K.
Apr 20 '06 #12
Thanks for the link! That was very interesting. I never knew a lot of that
info, and I am certain that my client is also unaware of the problems with
relying solely on SS #s as identification. I will be sure to use several
fields as identification and not only the SS # and I will inform my client
as to the draw-backs.

I can also do very simple checks on SS #s entered to filter out known some
bad numbers according to the info provided in your link.

I'd like to thank you all for the thoughtful and informative posts in this
thread. It's nice to know that I am not completely off-base as I tackle
this beast.

JH

"Bill Karwin" <bi**@karwin.com> wrote in message
news:e2*********@enews2.newsguy.com...
Jim Hubbard wrote:
Social Securit numbers are supposed to be unique according to the Social
Security Administration....see http://www.gao.gov/new.items/d03941t.pdf .


Jim, you are doing an excellent job of designing this system, and I
respect the thought you have put into it.

For what it's worth, here's a FAQ page with some counterpoints to the
practice of using SSN as a database key:

http://www.cpsr.org/prevsite/cpsr/pr...-addendum.html

Regards,
Bill K.

Apr 21 '06 #13
>> Social Securit numbers are supposed to be unique according to the Social
Security Administration....see http://www.gao.gov/new.items/d03941t.pdf .


Jim, you are doing an excellent job of designing this system, and I
respect the thought you have put into it.

For what it's worth, here's a FAQ page with some counterpoints to the
practice of using SSN as a database key:

http://www.cpsr.org/prevsite/cpsr/pr...-addendum.html


In addition to the issues mentioned there, I believe the Social
Security Administration has managed to assign the same Social
Security Number to two unrelated people (a patient and her dentist)
with names totally different, ages quite different, and no evidence
that one or both got fooled by a sample card in a wallet. Unfortunately,
I don't have a reference for it now, and I heard of this several
decades ago. The theory at the time was that someone goofed and
two different offices (or perhaps two different clerks) each assigned
the same series of numbers. Someone from the SSA claimed that there
wasn't any need for uniqueness (and probably didn't know what they
were talking about - it would be a lot cheaper to assign a SSN of
1 to everyone if uniqueness isn't needed).

I believe there is a reason the IRS forms ask you to make sure the
names you put down on tax forms match the Social Security card (Form
1040 instructions, page 63): there are multiple names belonging to
different people referencing the same SSN, and not all of the problem
is from sample cards in wallets or illegal immigrants or identity
thieves. Some of it is administrative incompetence on the part of
the SSA. And the IRS is very familiar with the problem.

Gordon L. Burditt
Apr 21 '06 #14

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by mikey | last post: by
reply views Thread by Yun Guan | last post: by
reply views Thread by Mike Chirico | last post: by
2 posts views Thread by Saqib Ali | last post: by
2 posts views Thread by trihanhcie | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.