Database design for large numbers of attributes

John

I'm developing an application for medical use that will be used to
capture patient background and visit data. The application will have
approximately 50 forms, with an average of about 20 fields each-- so a
total of 1000+ fields. Almost always, the fields are unique to a
particular form--and it will be pretty unlikely that new forms and
fields will be added--although not out of the realm of possibility.
There will also be a search feature that will allow physicians to
query for certain forms based on as many attributes as they want to
use. Another relevant piece of info is that each form will have a
status, and that it will be necessary to aggregate all of the form
statuses for a patient to determine a patient status. The target
database is ORacle 9i (9.2.0.2). Total number of patients will
probably be around 10,000-- each will probably have an average of 50
forms (including multipe visit forms)-- so about 500,000 forms.
I'm trying to decide how to design the database:
SHould I
1. use a "survey" type design, where every attribute will be a row in
a response table?
Pros:
-most normalized
-most flexible and easy to make "schema" changes
Cons
-Large number of rows in response table
-cumbersome to work with
-reduced performance
2. create a separate table for each form w/ its unique attributes, and
a master table that has the common attributes and the primary keys
Pros
-Easier to work with
-updates and selects will have better performance
Cons
-Not normalized?
-The "ad-hoc" search queries and reports may be difficult to write and
maintain (possibly solvable w/ a large view?)
3. Store all the data for each form as XML in an XMLType column (non
schema based)
Pros:
-Easy to work with
-updates and selects will have good performance (since i'm always
updating or selecting an entire form)-- may actually have the best
performance of all 4 options for updating and retrieving form data
-normalized
Cons
-XML bloats storagce space requirements (but not really that big a
deal)
-Will make ad-hoc search and reporting queries perform(and probably
use lots of RAM) more slowly since XML will have to be parsed in order
for Xpath queries to run
4. USe oracle 9i's schema based XML feature
Pros
-Faster Xpath queries than option 3
COns
-If shcema changes, all generated objects have to be dropped and
re-created-- seems like a big pain
-If I was going to do tihs, why not just go with option 2.
I'm most tempted by options 2 and 3-- and would really prefer 3 if
there was a way to make sure that the ad-hoc queries would perform
decently.

Jul 19 '05 #1

Subscribe Post Reply

3453

Dave

rj***@yahoo.com (John) wrote in message news:<63**************************@posting.google. com>...

I'm developing an application for medical use that will be used to
capture patient background and visit data. The application will have
approximately 50 forms, with an average of about 20 fields each-- so a
total of 1000+ fields. Almost always, the fields are unique to a
particular form--and it will be pretty unlikely that new forms and
fields will be added--although not out of the realm of possibility.
There will also be a search feature that will allow physicians to
query for certain forms based on as many attributes as they want to
use. Another relevant piece of info is that each form will have a
status, and that it will be necessary to aggregate all of the form
statuses for a patient to determine a patient status. The target
database is ORacle 9i (9.2.0.2). Total number of patients will
probably be around 10,000-- each will probably have an average of 50
forms (including multipe visit forms)-- so about 500,000 forms.
I'm trying to decide how to design the database:
SHould I
1. use a "survey" type design, where every attribute will be a row in
a response table?
Pros:
-most normalized
-most flexible and easy to make "schema" changes
Cons
-Large number of rows in response table
-cumbersome to work with
-reduced performance
2. create a separate table for each form w/ its unique attributes, and
a master table that has the common attributes and the primary keys
Pros
-Easier to work with
-updates and selects will have better performance
Cons
-Not normalized?
-The "ad-hoc" search queries and reports may be difficult to write and
maintain (possibly solvable w/ a large view?)
3. Store all the data for each form as XML in an XMLType column (non
schema based)
Pros:
-Easy to work with
-updates and selects will have good performance (since i'm always
updating or selecting an entire form)-- may actually have the best
performance of all 4 options for updating and retrieving form data
-normalized
Cons
-XML bloats storagce space requirements (but not really that big a
deal)
-Will make ad-hoc search and reporting queries perform(and probably
use lots of RAM) more slowly since XML will have to be parsed in order
for Xpath queries to run
4. USe oracle 9i's schema based XML feature
Pros
-Faster Xpath queries than option 3
COns
-If shcema changes, all generated objects have to be dropped and
re-created-- seems like a big pain
-If I was going to do tihs, why not just go with option 2.
I'm most tempted by options 2 and 3-- and would really prefer 3 if
there was a way to make sure that the ad-hoc queries would perform
decently.

John -

The right answer really depends on your specific needs. Here are some
things to think about....

Using the row based approach really hinders your ability to query
effectively both from a performance standpoint and complexity of the
query syntax. Using a column based approach really makes querying
easier...for instance, "select * from tab1 where col1 > 2 and col2 <=
col3+col4 or col5 is not null", etc... try doing this with a row based
approach.

In the row-based approach, you also lose type checking since you have
to store everything as a string and your model becomes less
self-describing.

With a column-based approach, you lose some of the flexibility you
mention of adding and removing attributes, but it doesn't sound like
you need it. You could separate volatile vs. fixed attributes in
separate tables if you find the need to support adding/removing
attributes. Then you wouldn't have to rebuild everything when things
change.

Nothing says you need to go with one approach only. Perhaps you can
separate columns that are likely to be queried from ones that will
not be queried and go with both a column and row-based approach. Or
put all string based attributes in a row-based table if that would
help.

If many of the attributes are simple yes/no answers, perhaps you could
look into encoding them as a bit string within one column. For
instance, Q1=Yes, Q2=No, Q3=Yes, could be represented as 101 in one
table column and could be queried using the BITAND function. If you
have 100 Yes/No answers, you've just compressed 100 columns into 1,
but you need to make sure the BITAND function will support your
queries and perform well.

I'm not that familiar with the XML functionality in Oracle yet, but I
suspect if you have aggregate query requirements, then relational will
perform better. But again, you may find a hybrid approach is best.
Perhaps some of your attributes are best stored as XML.

These aren't answers, but I hope it gives you something to think
about.

Good luck,
Dave

Jul 19 '05 #2

by: Andrea Griffini | last post by:

I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but...

Python

Database design

by: Anders | last post by:

We need to create a lookup structure that can contain a flexible amount of attributes. We have discussed different design solutions for lookup-functionality, and come up with this suggestion: ...

Oracle Database

Is this possible? please help (MS access query of sql database)

by: anthonyberet | last post by:

I work for an organisation that uses a bespoke document imaging system, the database of which is an MS sql server. We have MS Access and already use it for some querying of the database. The...

Microsoft SQL Server

Database design question

by: Tim Mavers | last post by:

I have a series of database objects that represent things such as people, accounts, etc. I have a set of options (boolean) that I need to add to these objects. Normally I would just create a bit...

Microsoft SQL Server

Best way to model hybrid single/multiple logical database

by: Steve Jorgensen | last post by:

I've run across this issue several times of late, and I've never come up with a satisfactory answer to the best way to handle this schema issue. You have a large section of schema in which a...

Microsoft Access / VBA

Database Design for e-commerce shopping Application

by: CAradhana | last post by:

I am checking online shopping sites like amazon. Can anybody tell me, is there any way to get Product database structure. i.e. database design to store product category and corresponding attributes...

ASP.NET

Multiple modules with database access + general app design?

by: Robin Haswell | last post by:

Hey people I'm an experience PHP programmer who's been writing python for a couple of weeks now. I'm writing quite a large application which I've decided to break down in to lots of modules...

Python

HTML form design tip - font size of text input elements - IE7 vsFirefox

by: Diogenes | last post by:

Hi All; I, like others, have been frustrated with designing forms that look and flow the same in both IE and Firefox. They simply did not scale the same. I have discovered, to my chagrin,...

HTML / CSS

OOP database tables <-> php interface (semi LONG)

by: amygdala | last post by:

Hi, I'm trying to grasp OOP to build an interface using class objects that lets me access database tables easily. You have probably seen this before, or maybe even built it yourself at some...

PHP

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Database design for large numbers of attributes

Similar topics