Efficient Uniqueness Check

Alan Little

I have affiliates submitting batches of anywhere from 10 to several
hundred orders. Each order in the batch must include an order ID,
originated by the affiliate, which must be unique across all orders in
all batches ever submitted by that affiliate. I'm trying to figure out
the most efficient way to check the uniqueness of the order ID.

Order data is being submitted to Zen Cart, and also stored in custom
tables. I have created a unique key in the custom tables, consisting of
affiliate ID and order ID, but that's not workable because the custom
tables need data from ZC, and I don't really want to submit the data to
ZC until I've checked the order IDs, plus part of the batch might already
be stored before I hit a dup, and would then have to be deleted.

There are currently about a half dozen affiliates; I have no idea how
many there will be ultimately. The client I'm developing this for thinks
1-200. Each affiliate can potentially have several thousand order IDs
that need to be checked.

The two ways I'm considering are: 1) a single query to load all the
affiliate's order IDs into an array for checking as I process the batch;
2) creating a temporary table with just the affiliate's IDs, querying it
for each order in the batch.

Any ideas which would be more efficient? Would there be any significant
difference either way?

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/

Oct 11 '06 #1

Subscribe Reply

2587

petersprc

Perhaps this could work: If you keep the submissions in a set of
staging tables, then your unique key would work. In this scenario, you
wouldn't delete the submissions after they've been processed. Keep them
around to check uniqueness of future submissions. You might be able to
add a bit to the staging table (submission.com mitted) which would let
you select the new items only and pass them along to ZC.

Of the things you're considering, a single query would be more
scalable, but it's like you won't notice much difference at all until
the volume went up.

Alan Little wrote:

I have affiliates submitting batches of anywhere from 10 to several
hundred orders. Each order in the batch must include an order ID,
originated by the affiliate, which must be unique across all orders in
all batches ever submitted by that affiliate. I'm trying to figure out
the most efficient way to check the uniqueness of the order ID.

Order data is being submitted to Zen Cart, and also stored in custom
tables. I have created a unique key in the custom tables, consisting of
affiliate ID and order ID, but that's not workable because the custom
tables need data from ZC, and I don't really want to submit the data to
ZC until I've checked the order IDs, plus part of the batch might already
be stored before I hit a dup, and would then have to be deleted.

There are currently about a half dozen affiliates; I have no idea how
many there will be ultimately. The client I'm developing this for thinks
1-200. Each affiliate can potentially have several thousand order IDs
that need to be checked.

The two ways I'm considering are: 1) a single query to load all the
affiliate's order IDs into an array for checking as I process the batch;
2) creating a temporary table with just the affiliate's IDs, querying it
for each order in the batch.

Any ideas which would be more efficient? Would there be any significant
difference either way?

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/

Oct 11 '06 #2

Alan Little

Carved in mystic runes upon the very living rock, the last words of
<pe*******@gmai l.comof comp.lang.php make plain:

Alan Little wrote:
>I have affiliates submitting batches of anywhere from 10 to several
hundred orders. Each order in the batch must include an order ID,
originated by the affiliate, which must be unique across all orders
in all batches ever submitted by that affiliate. I'm trying to figure
out the most efficient way to check the uniqueness of the order ID.

The two ways I'm considering are: 1) a single query to load all the
affiliate's order IDs into an array for checking as I process the
batch; 2) creating a temporary table with just the affiliate's IDs,
querying it for each order in the batch.

Any ideas which would be more efficient? Would there be any
significant difference either way?

Perhaps this could work: If you keep the submissions in a set of
staging tables, then your unique key would work.

Thanks, but that isn't really practical; all orders are maintained in the
orders table, so inserting them into an intermediate table would just be
another step. Using a single table, I would have to flag the records as
temporary, as I added them, then remove the flag upon successful
completion, or delete the flagged records upon failure. With an
intermediate table, I wouldn't have the deletion problem, but I would
still then have to copy them to the permanent table upon success, which
(one would hope) would be the majority of cases.

Of the things you're considering, a single query would be more
scalable, but it's like you won't notice much difference at all until
the volume went up.

That's kind of what I figured, that even with an order of magnitude of
tens of thousands of past records, it wouldn't make much difference
either way. I just wondered about the efficiency of having to loop over
all those records from the single query, and load them into an array, but
I think that would be the most efficient.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/

Oct 12 '06 #3

Anonymous

Alan Little wrote:

>
Carved in mystic runes upon the very living rock, the last words of
<pe*******@gmai l.comof comp.lang.php make plain:

Alan Little wrote:
I have affiliates submitting batches of anywhere from 10 to several
hundred orders. Each order in the batch must include an order ID,
originated by the affiliate, which must be unique across all orders
in all batches ever submitted by that affiliate. I'm trying to figure
out the most efficient way to check the uniqueness of the order ID.

The two ways I'm considering are: 1) a single query to load all the
affiliate's order IDs into an array for checking as I process the
batch; 2) creating a temporary table with just the affiliate's IDs,
querying it for each order in the batch.

Any ideas which would be more efficient? Would there be any
significant difference either way?
Perhaps this could work: If you keep the submissions in a set of
staging tables, then your unique key would work.

Thanks, but that isn't really practical; all orders are maintained in the
orders table, so inserting them into an intermediate table would just be
another step. Using a single table, I would have to flag the records as
temporary, as I added them, then remove the flag upon successful
completion, or delete the flagged records upon failure. With an
intermediate table, I wouldn't have the deletion problem, but I would
still then have to copy them to the permanent table upon success, which
(one would hope) would be the majority of cases.

Of the things you're considering, a single query would be more
scalable, but it's like you won't notice much difference at all until
the volume went up.

That's kind of what I figured, that even with an order of magnitude of
tens of thousands of past records, it wouldn't make much difference
either way. I just wondered about the efficiency of having to loop over
all those records from the single query, and load them into an array, but
I think that would be the most efficient.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/

Nope, the most efficient way is to let the database do the checking.

I would solve your problem like this: Put a new row into the orders
table, character, as long as you need (which is the max. length of
affiliate ID + max. length of order ID + 1), which will become your new
primary key.

You create your new primary keys by concatenating the affiliate ID with
the order ID, separated by a dash for better readability.

Example: You have three affiliates, "Jack", "Johnny" and "Jim", each
having three orders in the table. Let's assume that while Jack and Jim
use simple numbering for their IDs, Johnny uses some obscure letter code
for order IDs. Then you would have nine entries in the order table and
their primary keys would look similar to this:

"Jack-0001"
"Jack-0002"
"Jack-0003"
"Johnny-qwertz"
"Johnny-asdfgh"
"Johnny-yxcvbn"
"Jim-0001"
"Jim-0002"
"Jim-0003"

As you can see, every affiliate has his own 'namespace' within your new
primary key, because every affiliate has his own unique ID, so the
primary keys generated this way will never overlap between different
affiliates. However, since the part up to the dash is always constant
for any affiliate it would lead to a primary key collision if any
affiliate screws up his order ID scheme and submits an ID he has
submitted before.

So you don't need to do any checking beforehand at all. Just insert all
the orders you receive into the orders table creating your new key like
$ID = $AffiliateID . "-" . $OrderID; and insert it along with the data
into your table. Only when the insert fails will you have to call
mysql_error() or mysql_errno() to find out whether the INSERT query
failed because of a duplicate key. But you are doing some proper error
checking anyway, right? At least I hope so.

You don't need any arrays, any temporary tables or other things and no
duplicate checking code. The database does it all for you. And it needs
to check for dulpicate keys anyway, so we are not even wasting one
single CPU cycle here. :-) About 2-3 lines of code added to your error
checking to check if the reason for an insert failure was a duplicate
index would be sufficient plus one new database column.
Further optimizing the above solution:

*) You don't even need to create a new variable for $ID, you can just
concatenate the strings when you put your query together. But you
probably thought of that yourself already.

*) After creating your new column, populating it with
"AffiliateI D-OrderID" for all database entries that already exist and
declaring it the primary key for the table you can even delete these two
columns from your table. They are redundant now because their combined
information is already contained within your new primary key. And it can
be easily extracted again if needed and also filtered on. If you want to
see for example only orders from Jim, you just filter for all IDs which
start with "Jim-". I don't see a reason why we would need to keep these
around. Except perhaps for comfort, because we wouldn't have to extract
the information from the combined string when we need it, but that's not
really difficult at all.
If I understood your problem correctly this should be the optimal way to
do it. Or did I get your problem wrong?

Bye!

Oct 12 '06 #4

Alan Little

Carved in mystic runes upon the very living rock, the last words of
Anonymous of comp.lang.php make plain:

Alan Little wrote:
>>
Carved in mystic runes upon the very living rock, the last words of
<pe*******@gma il.comof comp.lang.php make plain:

Alan Little wrote:
I have affiliates submitting batches of anywhere from 10 to
several hundred orders. Each order in the batch must include an
order ID, originated by the affiliate, which must be unique across
all orders in all batches ever submitted by that affiliate. I'm
trying to figure out the most efficient way to check the
uniqueness of the order ID.

The two ways I'm considering are: 1) a single query to load all
the affiliate's order IDs into an array for checking as I process
the batch; 2) creating a temporary table with just the affiliate's
IDs, querying it for each order in the batch.

Any ideas which would be more efficient? Would there be any
significant difference either way?

Perhaps this could work: If you keep the submissions in a set of
staging tables, then your unique key would work.

Thanks, but that isn't really practical; all orders are maintained in
the orders table, so inserting them into an intermediate table would
just be another step. Using a single table, I would have to flag the
records as temporary, as I added them, then remove the flag upon
successful completion, or delete the flagged records upon failure.
With an intermediate table, I wouldn't have the deletion problem, but
I would still then have to copy them to the permanent table upon
success, which (one would hope) would be the majority of cases.

Of the things you're considering, a single query would be more
scalable, but it's like you won't notice much difference at all
until the volume went up.

That's kind of what I figured, that even with an order of magnitude
of tens of thousands of past records, it wouldn't make much
difference either way. I just wondered about the efficiency of having
to loop over all those records from the single query, and load them
into an array, but I think that would be the most efficient.

Nope, the most efficient way is to let the database do the checking.

[snip]

Thanks for your detailed response. However, it doesn't work, because I
don't want to wait until I've already attempted the insert, to know if it
fails. The issue of two different affiliates having the same order ID
isn't a problem, as there is an affiliate ID as well, so what needs to be
unique is AID + OID, as you've described.

The affiliates are providing their own order IDs, since they're going to
be using whatever systems they use to collect the orders. When they
submit the orders for fufillment, here's the sequence:

1) Affiliate submits a batch of (say) 100 orders
2) Check all the order data
3) Submit the batch as a single order to ZC
4) Store the sub-orders in custom tables

The problem with doing the checking at insert time is that the first 90
orders may be OK, with a problem on order #91. The whole batch has to be
rejected and re-submitted, so that means I have to delete the 90 sub-
orders already entered, *and* the order submitted to ZC. Thus I want to
do the ID check at step 2.

Hmmm..... I just had a thought, though. If I do a bulk insert, the whole
thing would fail if there was a dup, and then I wouldn't have to delete
the already-processed sub-orders, but I'd still have to delete the ZC
order. I'll have to think about it some more.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/

Oct 13 '06 #5

Anonymous

Alan Little wrote:

Thanks for your detailed response. However, it doesn't work, because I
don't want to wait until I've already attempted the insert, to know if it
fails. The issue of two different affiliates having the same order ID
isn't a problem, as there is an affiliate ID as well, so what needs to be
unique is AID + OID, as you've described.

The affiliates are providing their own order IDs, since they're going to
be using whatever systems they use to collect the orders. When they
submit the orders for fufillment, here's the sequence:

1) Affiliate submits a batch of (say) 100 orders
2) Check all the order data
3) Submit the batch as a single order to ZC
4) Store the sub-orders in custom tables

The problem with doing the checking at insert time is that the first 90
orders may be OK, with a problem on order #91. The whole batch has to be
rejected and re-submitted, so that means I have to delete the 90 sub-
orders already entered, *and* the order submitted to ZC. Thus I want to
do the ID check at step 2.

Hmmm..... I just had a thought, though. If I do a bulk insert, the whole
thing would fail if there was a dup, and then I wouldn't have to delete
the already-processed sub-orders, but I'd still have to delete the ZC
order. I'll have to think about it some more.

If all inserts have to fail if one fails use a transaction. That's what
they are here for. :-)

If you have never worked with transactions before read them up in the
MySQL manual.

Bye!

Oct 13 '06 #6

Similar topics

2974

Indexing - Uniqueness vs Highly uplicate

by: Puvendran Selvaratnam | last post by:

Hi, First of all my apologies if you have seen this mail already but I am re-sending as there were some initial problems. This query is related to defining indexes to be unique or not and consequences thereof. Some documented facts that I am aware of include

Microsoft SQL Server

1563

About Efficient c++

by: sandeep | last post by:

Hi i am new to this group and to c++ also though i have the knowledge of "c" and now want to learn c++ and data structure using c/c++ . so could nebody please suggest me some tips(books,links,&experiences) so that i can be an EFFICIENT programmer of c++. Also i want to ask that how can we develope efficient codes and what are various techniques for writing code sin efficient manner. Please help me.

C / C++

2164

Uniqueness based on Attribute-value

by: Dirk Declercq | last post by:

Hi, Is it possible in Xml to enfore the uniqueness of an element based on his attribute value. Say I have this schema : <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://www.egemin.com/Epia/StringResources" xmlns:xs="http://www.w3.org/2001/XMLSchema"

.NET Framework

1565

uniqueness schema problem

by: Mr. Almenares | last post by:

Hello: Iâ€™m trying to do a schema with recurrent structure for a Book like a Node can have many Nodes inside or One leave. So, the leaves have an attribute that is Identifier. My goal is define Uniqueness that guarantees to the attribute Identifier his uniqueness. That I donâ€™t know the depth of levels I have to put in the xpath attribute of the selector something like this TOC/descendant::Tree/Data, but this is not allow. How I can...

.NET Framework

3618

fastest way to parse a file; Most efficient way to store the data?

by: hoopsho | last post by:

Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a million lines. b) I need to store the contents into a unique hash. c) I need to then sort the data on a specific field. d) I need to pull out certain fields and report them to the user.

C# / C Sharp

1940

Efficient data structures

by: Christian Christmann | last post by:

Hi, I'm looking for a data structure where I can store arbitrary elements and than efficiently check if an element (of same type as the stored elements) is contained in the list. The latter operation is performed pretty frequently, so the data structure I require must handle it with low komplexity. My idea was to use a STL set and then do something like

C / C++

2024

converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

by: py_genetic | last post by:

Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv file, eg. height weight income etc.) to determine the most efficient 'type' to convert the attribute coll into for further processing and efficient storage... Example row from sampled file data: , ....]

Python

1340

how to maintain uniqueness in a mysql table

by: selvialagar | last post by:

i have blockname as a field. but i want it to be a unique one. i tried with this code..$check0=mysql_query("select count(*) from pms_block"); $num_rows0=mysql_num_rows($check0); if($num_rows0>=1) { $check=mysql_query("select count(*) from pms_block where block_name='$bname'")or die('Unable to Select '.mysql_error()); $num_rows=mysql_num_rows($check); if ($num_rows>0) {

PHP

3734

How to make this program more efficient?

by: Bill David | last post by:

SUBJECT: How to make this program more efficient? In my program, a thread will check update from server periodically and generate a stl::map for other part of this program to read data from. Let's name the update method as doUpdate and stl::map read methods as getData and copyData. Since stl::map is not thread-safe, we should do synchronization by ourselves. A usable solution is to create a boost::mutex::scoped_lock object in all above...

C / C++

10211

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9994

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9863

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8872

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7409

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5299

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5447

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3959

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2815

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General