472,986 Members | 2,871 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,986 software developers and data experts.

Dynamic data elements for a data collection application

What is the better table design for a data collection application.
1. Vertical model (pk, attributeName, AttributeValue)
2. Custom columns (pk, custom1, custom2, custom3...custom50)

Since the data elements collected may change year over year, which
model better takes of this column dynamicness

Sep 20 '05 #1
7 3878
(mi************@gmail.com) writes:
What is the better table design for a data collection application.
1. Vertical model (pk, attributeName, AttributeValue)
2. Custom columns (pk, custom1, custom2, custom3...custom50)

Since the data elements collected may change year over year, which
model better takes of this column dynamicness


The vertical model is certainly cleaner from a relational perspective.
It also requires less maintenance.

But admittedly queries can be more complex. If attributes can be of
different data types, you need some triggers to check this. A tip
is that the sql_variant data type is good in this case.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Sep 20 '05 #2

1. But the data collection and reporting is in horizontal format. If
collected data is edited vertically, won't there be a extra steps of
converting horizontally obtained data to vertical and then vertical to
horizontal reports. In custom column model data always remains
horizontal. Won;t performance not be a issue in taking care of two
extra steps required in vertical model.
2. Won;t the concurrency be a issue, considering the fact that a
logical single horizontal row is edited as say 10 rows. Two people
might be changing same Primary key's different attributes at the same
time.
Erland Sommarskog wrote:
(mi************@gmail.com) writes:
What is the better table design for a data collection application.
1. Vertical model (pk, attributeName, AttributeValue)
2. Custom columns (pk, custom1, custom2, custom3...custom50)

Since the data elements collected may change year over year, which
model better takes of this column dynamicness


The vertical model is certainly cleaner from a relational perspective.
It also requires less maintenance.

But admittedly queries can be more complex. If attributes can be of
different data types, you need some triggers to check this. A tip
is that the sql_variant data type is good in this case.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp


Sep 21 '05 #3
(mi************@gmail.com) writes:
1. But the data collection and reporting is in horizontal format. If
collected data is edited vertically, won't there be a extra steps of
converting horizontally obtained data to vertical and then vertical to
horizontal reports. In custom column model data always remains
horizontal. Won;t performance not be a issue in taking care of two
extra steps required in vertical model.
If you are to present ten of those custom values as columns in a report,
you get a 10-way self-join. Certainly bulky in code. Performance is
probably not top-notch, but I don't see that it would be absymal.
2. Won;t the concurrency be a issue, considering the fact that a
logical single horizontal row is edited as say 10 rows. Two people
might be changing same Primary key's different attributes at the same
time.


Good point. This can be handled fairly easily, but it requires more
careful programming than the horizontal method.

Overall, there certainly is a tradeoff. If the set of custom fields are
faily stable, only change once per year or so, you might be prepared to
take the extra maintenance cost. But if users asks for new fields every
week, then the horizontal method could be a nightmare.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Sep 21 '05 #4
Thanks a lot for the reply.
1. I am not able to understand why i require a 10 way join though. If i
have a mapping between custom column and actual column, all i need is a
dyanamic sql generated from the mapping.

E.g.
table
-----
pk, custom1, custom2, custom3...custom10
map
---
customColName ActualColName
custom1 ActualName1
custom2 ActualName2
....

Now I can generate dynamic sql using map.

2. As long as number of custom columns is enough to take care data
element additions which happen in a year. All that is needed is
addition of new elements to mapping table to decifer newly assigned
custom columns


Erland Sommarskog wrote:
(mi************@gmail.com) writes:
1. But the data collection and reporting is in horizontal format. If
collected data is edited vertically, won't there be a extra steps of
converting horizontally obtained data to vertical and then vertical to
horizontal reports. In custom column model data always remains
horizontal. Won;t performance not be a issue in taking care of two
extra steps required in vertical model.


If you are to present ten of those custom values as columns in a report,
you get a 10-way self-join. Certainly bulky in code. Performance is
probably not top-notch, but I don't see that it would be absymal.
2. Won;t the concurrency be a issue, considering the fact that a
logical single horizontal row is edited as say 10 rows. Two people
might be changing same Primary key's different attributes at the same
time.


Good point. This can be handled fairly easily, but it requires more
careful programming than the horizontal method.

Overall, there certainly is a tradeoff. If the set of custom fields are
faily stable, only change once per year or so, you might be prepared to
take the extra maintenance cost. But if users asks for new fields every
week, then the horizontal method could be a nightmare.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp


Sep 22 '05 #5
Neither. The design flaw you are calling a vertical model is actually
known as "EAV" or "Entity-Attribute-Value" because it is a common
newbie mistake. I have no idea what your #2 means.

As your data elements change, you need to re-design the schema --
constraints, keys, data types, etc. Learn RDBMS and do it right.

I found an old "cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);

INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');

CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think
I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is a From: Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.

The column names are seriously painful. Beyond the fact that I
personally hate underscores in column names, using underscores at the
end of the column name is really non-intuitive. I removed them for my
example and came across the next column name faux pas. Don't use "key"
and "value" for column names. It means that the developer *has*
surround the column name with square brackets for everything which is a
serious pain.

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

ry to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.

ry to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

For those who are interested, there are couple of links to articles I
found on the net:

Generic Design of Web-Based Clinical Databases
http://www.jmir.org/2003/4/e27*/

The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadka*rni/eav_CR_contents.htm

An Introduction to Entity-Attribute-Value Design for Generic
Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadka*rni/I...20*systems.htm
Data Extraction and Ad Hoc Query of an Entity- Attribute- Value
Database
http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...
Exploring Performance Issues for a Clinical Database Organized Using
an Entity-Attribute-Value Representation
http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...

Sep 22 '05 #6
(mi************@gmail.com) writes:
Thanks a lot for the reply.
1. I am not able to understand why i require a 10 way join though. If i
have a mapping between custom column and actual column, all i need is a
dyanamic sql generated from the mapping.

E.g.
table
-----
pk, custom1, custom2, custom3...custom10
map
---
customColName ActualColName
custom1 ActualName1
custom2 ActualName2
....

Now I can generate dynamic sql using map.
When I said 10-way join I was thinking of the vertical solution. For the
horisontal solution it's a simple join - once you have gone through all
that SQL building. To me, this sounds more complex to implement. Then
again, if the user selects dynamically which columns he wants to see,
the horizontal solution would require dynamic SQL as well.
2. As long as number of custom columns is enough to take care data
element additions which happen in a year. All that is needed is
addition of new elements to mapping table to decifer newly assigned
custom columns


I didn't realise that you had this mapping table. One could say that
this is a kind of compromise between the horizonal model and an entirely
static vertical model.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Sep 22 '05 #7
CELKO,

If I change the data elements 10 times during the year, I cannot change
the front end 10 to accomodate the same. I need to come up with a
dyanamic solution to take care of same. Vertical approach is the best
way to get the same. Using custom columns is a compromise, as vertical
approach is harder to implement.
--CELKO-- wrote:
Neither. The design flaw you are calling a vertical model is actually
known as "EAV" or "Entity-Attribute-Value" because it is a common
newbie mistake. I have no idea what your #2 means.

As your data elements change, you need to re-design the schema --
constraints, keys, data types, etc. Learn RDBMS and do it right.

I found an old "cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);

INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');

CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think
I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is a From: Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.

The column names are seriously painful. Beyond the fact that I
personally hate underscores in column names, using underscores at the
end of the column name is really non-intuitive. I removed them for my
example and came across the next column name faux pas. Don't use "key"
and "value" for column names. It means that the developer *has*
surround the column name with square brackets for everything which is a
serious pain.

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

ry to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.

ry to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

For those who are interested, there are couple of links to articles I
found on the net:

Generic Design of Web-Based Clinical Databases
http://www.jmir.org/2003/4/e27*/

The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadka*rni/eav_CR_contents.htm

An Introduction to Entity-Attribute-Value Design for Generic
Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadka*rni/I...0*systems..htm
Data Extraction and Ad Hoc Query of an Entity- Attribute- Value
Database
http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...
Exploring Performance Issues for a Clinical Database Organized Using
an Entity-Attribute-Value Representation
http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...


Sep 26 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: D Witherspoon | last post by:
I am developing a Windows Forms application in VB.NET that will use .NET remoting to access the data tier classes. A very simple way I have come up with is by creating typed (.xsd) datasets. For...
4
by: pizzy | last post by:
INTRO: I tried to clean it up for easy reading. I hope I didn't make any mistakes. PROBLEM: WOW, this is some crazy sh!t. I can't get my checkbox (see "TAGSELECTED") to print my textboxes (see...
5
by: Oberon | last post by:
I want to create a collection of records which will be permanently available at the application level in an ASP.NET application. How should I go about that? Lets say that the record structure...
1
by: sleigh | last post by:
Hello, I'm building a web application that will build a dynamic form based upon questions in a database. This form will have several different sections that consist of a panel containing one to...
4
by: mom_newbie | last post by:
Hello all, I want to perform the following task: 1. Create an array 2. Fill it up (number of elements unknown in advance) 3. Iterate through it using For Each loop (cannot do this in the in...
1
by: Russell Mangel | last post by:
Sorry about the Cross-Post, I posted my question in the wrong group. Hello, What is the simplest way to create a dynamic collection (during run-time), using basic C (Struct data types). Since...
9
by: pbd22 | last post by:
Hi. This is just a disaster management question. I am using XMLHTTP for the dynamic loading of content in a very crucial area of my web site. Same as an IFrame, but using XMLHTTP and a DIV. I...
2
by: Peted | last post by:
Can anyone suggest a best practice approach to building a dynamic winforms UI. Just as an example somehting like a billing application where you enter a customer billing data and the billing...
0
by: JamesOo | last post by:
I have the code below, but I need to make it searchable in query table, below code only allowed seach the table which in show mdb only. (i.e. have 3 table, but only can search either one only,...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.