By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,679 Members | 2,047 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,679 IT Pros & Developers. It's quick & easy.

Many data entry tables - which ones hold records?

P: n/a
We have an asp.net app with about 200 data entry forms. Customers may
enter data into any number of forms. Each form's data is persisted in
a corresponding sql table. When data entry is complete, it needs to be
processed. Here's where the questions start.

How can we easily determine in which tables a customer has data and how
best to select that data?

We're not opposed to putting all the data in a single table. This
table would wind up having ~15 million records and constantly have CRUD
operations performed against it by up to 5000 users simultaneously.
With sufficient hardware, is this too much to ask of the db?

Oct 23 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
heromull (he******@gmail.com) writes:
We have an asp.net app with about 200 data entry forms. Customers may
enter data into any number of forms. Each form's data is persisted in
a corresponding sql table. When data entry is complete, it needs to be
processed. Here's where the questions start.

How can we easily determine in which tables a customer has data and how
best to select that data?

We're not opposed to putting all the data in a single table. This
table would wind up having ~15 million records and constantly have CRUD
operations performed against it by up to 5000 users simultaneously.
With sufficient hardware, is this too much to ask of the db?


Whether the 200 tables should be 1, 10, 74, or 200 is impossible to tell
from without knowledge about what's in them.

But from a performance point of view, it would not really matter whether
it's one or two hundred tables. Provided, that is, the the single table
has proper indexing.

There are a couple a ways of finding to find data to process:

1) Timestamp column. A timestamp column is automatically updated by SQL
Server with a database-unique value that is monotonically increasing.
(Binary, completely unrelated to date and time). The process that looks
for data would keep track of the most recent timestamp per table, and
retrieve the rows with higher timestamp value. If the process updates the
rows itself, it needs to combine the lookup with a status column. The
drawback with this solution is that the timestamp column must be indexed,
and since it's updated each time the row is updated, there will be a lot
of shuffling around in that index.

2) IDENTITY column. All tables would have an identity column, and then
the process would keep track of the most recently processed value. With
this solution you can only handle inserts, not if users update existing
data.

3) Having triggers on the that enters data about rows to process into a
table. Again, you may need a mechanism to differentiate between user-entered
changes and changes from your processing.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Oct 23 '05 #2

P: n/a
Thanks, the data to select for processing is not time based, it's based
on whatever tables a customer has data in at the time of processing,
which is determined by the customer. So, option three would be the
best and is similar to our current process. Also, the data is not
updated by the processing, only selected. Even with a table that tells
us which tables have data, how would we build the select statement(s)
to only include those tables with data?

Oct 23 '05 #3

P: n/a
heromull (he******@gmail.com) writes:
Thanks, the data to select for processing is not time based, it's based
on whatever tables a customer has data in at the time of processing,
which is determined by the customer. So, option three would be the
best and is similar to our current process.
Not that see what time has to do with it. I did say timestamp, but
the timestamp datatype has nothing to do with time. Then again, if you
already have a process similar to option three, then go with that.
Also, the data is not updated by the processing, only selected. Even
with a table that tells us which tables have data, how would we build
the select statement(s) to only include those tables with data?


I can not say that. I don't know your tables. I don't know in which context
this process runs etc. I would assume that since there are 200 tables,
that you would have a stored procedure or a SELECT statement for that
table, as I would assume that all tables would generally have a different
set of columns. If they all have the same schema, then there is a strong
indication of that you should have one single table.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Oct 23 '05 #4

P: n/a
"heromull" <he******@gmail.com> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
We have an asp.net app with about 200 data entry forms. Customers may
enter data into any number of forms. Each form's data is persisted in
a corresponding sql table. When data entry is complete, it needs to be
processed. Here's where the questions start.

How can we easily determine in which tables a customer has data and how
best to select that data?

We're not opposed to putting all the data in a single table. This
table would wind up having ~15 million records and constantly have CRUD
operations performed against it by up to 5000 users simultaneously.
With sufficient hardware, is this too much to ask of the db?


Don't design a database schema around a user interface. Design your database
schema around your DATA and BUSINESS RULES, then build a data access layer
that supports the UI. You say "Each form's data is persisted in a
corresponding sql table". This make little sense to me as a description of a
data model. I suspect (admittedly on the basis of too little information)
that with a better design your perceived problems would disappear. The
stataement "We're not opposed to putting all the data in a single table"
also suggests a very arbitrary approach to database design. Is your current
design a normalized one? If so, I don't understand your confusion about "How
can we easily determine in which tables a customer has data". The answer is
presumably that you use the Customer key - whatever that is in your model.

--
David Portas
SQL Server MVP
--
Oct 23 '05 #5

P: n/a
Erland Sommarskog wrote:
....
If they all have the same schema, then there is a strong
indication of that you should have one single table. ....

Yes, this is the case. They share the same schema.
David Portas wrote:
Don't design a database schema around a user interface.
I think this is our first mistake.
Design your database
schema around your DATA and BUSINESS RULES, then build a data access layer
that supports the UI. You say "Each form's data is persisted in a
corresponding sql table". This make little sense to me as a description of a
data model. I suspect (admittedly on the basis of too little information)
that with a better design your perceived problems would disappear.
I think you're right. It's that "better design" that we're
researching.
The
stataement "We're not opposed to putting all the data in a single table"
also suggests a very arbitrary approach to database design.
It does and I say that only to mean that it's not too late for us to
implement a better design.
Is your current
design a normalized one? If so, I don't understand your confusion about "How
can we easily determine in which tables a customer has data". The answer is
presumably that you use the Customer key - whatever that is in your model.


Here's DDL that describes our current "model" (I hear you laughing
already). Notice that all 200 talbes have the same schema. If we
determine a single table approach would be a better design, what would
be some areas of concern? Obviously indexing is on the list.

CREATE TABLE [Customer] (
[CustomerId] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (100),
CONSTRAINT [PK_Customer] PRIMARY KEY NONCLUSTERED
(
[CustomerId]
)
)
--we have ~200 tables with this table's schema
CREATE TABLE [Table1] ( --Table1 thru Table200
[Table1Id] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerId] [int] NOT NULL ,
[Value] [nchar] (1024),
CONSTRAINT [PK_Table1] PRIMARY KEY NONCLUSTERED
(
[Table1Id]
) ,
CONSTRAINT [FK_Table1_Customer] FOREIGN KEY
(
[CustomerId]
) REFERENCES [Customer] (
[CustomerId]
)
)
GO
--each of the 200 tables have a trigger
--similar to this (i'll omit the delete trigger from the DDL)
create trigger
Table1CreateInterview
on
Table1
for insert
as
begin
declare @CustomerId int
declare @RecordId int
select
@RecordId = Table1Id,
@CustomerId = CustomerId
from
inserted

insert into
interview
(
CustomerId,

TableId,--represents the table caused the trigger eg. 1=Table1
RecId--the value of the primary key of the record causing the trigger
)
values
(
@CustomerId,
1,
@RecordId
)
end
GO
--a record is inserted into this table each
--time a record is written to any of the 200 tables
--a record is deleted from this table each time
--a record is deleted from any of the 200 tables
CREATE TABLE [Interview] (
[InterviewId] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerId] [int] NULL ,
[TableId] [int] NOT NULL ,
[RecId] [int] NULL
CONSTRAINT [PK_Interview] PRIMARY KEY NONCLUSTERED
(
[InterviewId]
),
CONSTRAINT [FK_Interview_Customer] FOREIGN KEY
(
[CustomerId]
) REFERENCES [Customer] (
[CustomerId]
)
)
GO
insert into customer(name) values ('some name')
insert into table1(customerid, value) values (1, 'my value in table1')

Oct 24 '05 #6

P: n/a
heromull (he******@gmail.com) writes:
Don't design a database schema around a user interface.
I think this is our first mistake.


An unusually candid confession!
If they all have the same schema, then there is a strong
indication of that you should have one single table.

...

Yes, this is the case. They share the same schema.


You should make them one table, adding one more column that specifies
the entity that is now hidden in the table name. That table would look
like:

CREATE TABLE [Answers] ( --Table1 thru Table200
[CustomerId] [int] NOT NULL ,
TableNo int NOT NULL,
RowNo smallint NOT NULL,
[Value] [nchar] (1024),
PRIMARY KEY (CustomerID, TableNo, RowNo)

I've added RowNo, beause I don't know if one customer can add more
than one value in the same form. If he can't RowNo should not be
there. I've very deliberate taken out the IDENTITY column, because
this table should have a composite key. My RowNo is indeed a surrogate,
but only in the sub-realm of CustomerId, TableNo. (And TableNo is just
a name I use, in lack of knowledge about the business domain.)

I put the key on CustomerID, TableNo, but depending how you use the
table, you may also have a need for an index on (TableNo, CustomerID).
CustomerID first is good for queries like "what is customer 1233 up to"?,
but not for "What do we have in table 12?".
It does and I say that only to mean that it's not too late for us to
implement a better design.
That's great to hear!
begin
declare @CustomerId int
declare @RecordId int
select
@RecordId = Table1Id,
@CustomerId = CustomerId
from
inserted
Uh-uh, classic mistake. A trigger fires once per statement, not once
per row. Yes, as long as data through that form, it will only come
one by one, but then suddenly there is a batch processing loading lots
of data at the same time. So write your trigger as:

INSERT interview (...)
SELECT ...
FROM inserted
insert into
interview


And INSERT without a column list is very poor practice in production code.
Someone adds a column to the table, and the statement blows up. That's
bad.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Oct 24 '05 #7

P: n/a
Erland Sommarskog wrote:
An unusually candid confession!


Acceptance is the first step right?

Anyway, thank you so much for the support! I think the suggestions are
going to help us out a lot. A lightbulb went off over my head when I
read your composit key explaination.

Oct 25 '05 #8

P: n/a
Thank you so much for the support! I think this will work for us. My
only other question would be about managing the surrogate column
(RowNo). During Inserts, is there a way to manage it within SQL Server
or would my client apps need to select max(RowNo) and increase it by 1?

Again, thanks!

Oct 26 '05 #9

P: n/a
On 26 Oct 2005 07:55:17 -0700, heromull wrote:
Thank you so much for the support! I think this will work for us. My
only other question would be about managing the surrogate column
(RowNo). During Inserts, is there a way to manage it within SQL Server
or would my client apps need to select max(RowNo) and increase it by 1?


Hi heromull,

A typical INSERT statement would roughly look like this:

INSERT INTO Answers (CustomerId, TableNo, RowNo, [Value])
SELECT @CustomerId, @TableNo, COALESCE(MAX(RowNo),0) + 1, @Value
FROM Answers
WHERE CustomerId = @CustomerId
AND TableNo = @TableNo

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Oct 26 '05 #10

P: n/a
Thanks! Works perfectly.

Oct 26 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.