On Mon, 11 Aug 2008 22:32:49 -0700, Peter <cz****@nospam.nospamwrote:
Thanks Ken for your help!
But if the socket server goes down all of the clients are down - single
point of failure.
I understand your desire to not have a single point of failure. But
distributing the logic to deal with coordinating processing amongst all
the peers could be very costly network-wise. For example, an alternative
solution would be for each server running to query the others to see if
they've processed a given record yet. But as the numebr of servers goes
up, the network traffic to support that goes up dramatically.
I think that at some point, you may need a central repository for the
information related to distributing the work.
Now, that said, that doesn't mean you have to live without redundancy.
For example, one reasonably common technique is to design your servers to
nominate a central manager, either to farm out the work, or simply to
track which data has been processed by a server already. Either way, each
of the other servers would communicate with that central manager server to
figure out what work to do.
If the server acting as central manager went offline for some reason, the
remaining servers could detect that and elect a new central manager.
So, that's a complicated suggestion. Here's a simple one: if your
database has some inherent ordering of the records, then perhaps a more
appropriate solution would be to assign each server some pre-defined
subset of records. For example, one server could handle the first third
of the database, another the second third, and the third the last third.
Or, one server could handle the 0th, 3rd, 6rd, etc., another the 1st, 4th,
7th, etc. and so on. Or you could blend the two and have one server
handle the 0th, 3rd, 6th, etc. N records (e.g. if N is 10, then that
server handles records 0-9, 30-39, 60-69, etc.), etc.
The simple approach runs the risk of one server completing early and
winding up idle for part of the time. But assuming that the records all
take about the same time to process, or at least are randomly distributed
with respect to their cost to process, any large set of data should result
in a reasonably uniformly distributed workload for your servers.
I seem to recall some work being done on clustered Windows servers. It's
possible that all of the above is pointless, and that there's already some
sort of API or framework on Windows that allows you to run multiple,
identical servers and where Windows itself handles distributing the work
evenly, in a failure-resistant manner. You might try Googling on that and
see what you come up with. I did a quick search and turned up this:
http://www.microsoft.com/windowsserv...rview/san.mspx
It wasn't clear to me at first glance whether that's something appropriate
to your needs (it's possible that all that does is provide for
functionally identical servers to be allocated dynamically to independent
client requests), but it's worth a look I think.
One last note: you mentioned in your original post that you start a new
thread for each record. I don't think that's a great approach, unless
you've got a custom thread pool and you limit its size to the number of
CPUs on the computer. You definitely don't want more threads running than
you have CPUs, and ideally you won't have fewer either. But either way,
this means you should be managing your threads according to the workload
the computer is capable of, not just arbitrarily creating a new thread for
each record.
Pete