468,119 Members | 1,904 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,119 developers. It's quick & easy.

How should you represent discrete states?

Banfa
9,033 Expert Mod 8TB
Right I am actually using SQL Server 2005, however this is a general question about table design that is probably relevant to any database engine so I am going to ask it here since this is the busiest DB forum on the site.

To start with I make no bones about the fact that the area I am really expert in is C/C++ programming on a variety of micro controls and computers.

And the problem is this, what is the best (or correct) way to store a number of discrete states in a database table, for instance

DOWNLOAD_READY
DOWNLOAD_FLAGGED_TO_START
DOWNLOAD_IN_PROGRESS
DOWNLOAD_COMPLETE

The thing is that programmatically speaking use of strings is something that we (C/C++ programmers) avoid where possible, they are just not efficient so we like to use numbers. However we also like to avoid magic numbers, that is numbers in the code that appear without there meaning being clear, so in this case I would use a enum

enum download_state {
DOWNLOAD_READY
DOWNLOAD_FLAGGED_TO_START
DOWNLOAD_IN_PROGRESS
DOWNLOAD_COMPLETE
};

In C/C++ this is a way of associating the text that the programmer sees with a number, the compiler automatically assigns numbers (starting at 0) to each item in the enum so the programmer can use the symbols defined to make the code make sense and avoid magic numbers but once compiled the computer is actually using numbers which are efficient for it.

Back to database tables, suppose I wanted to store the same information in a column of a table (and I will want to get able to sort on it or at least pick out entries at the DOWNLOAD_READY state)?

Should that column be text but this seems to have the issue of probably being less efficient and also not really limiting the entries to valid values.

Or should the column be a number, I can probably constrain the number into the right range if I wish, however the table then contains a column of numbers whose meaning is not immediately obvious, i.e. it contains magic numbers.

I am aware that some SQLs actually contain an enum type solving the problem in more or less the same way as the C/C++ programmer would but I believe that is not a standard SQL type.

So there is my question, what is the best/correct standard way to store this type of data in an SQL database?
May 9 '08 #1
2 1254
Stewart Ross
2,545 Expert Mod 2GB
Hi Banfa. There are no absolute answers to your question, as what should and should not be in any particular table is a matter of who does the design more than anything else. Ask two people and get three views...

I would store the states in a small table with an integer primary key (for efficiency of access later) and a string state name. The users would not see the key - it is just a means of identifying the rows internally.

Although it can be argued that it is meaningless to introduce a separate integer primary key when the states themselves are candidate keys this ignores the very real efficiency problem of string indexing/lookups vs integer indexing. Like you, I would consider it highly inefficient to be accessing the string values as keys instead of much simpler integers.

(As an aside, enumerated types in VB are also defined using integer values underneath just as you describe for C++.)

There are occasions when I would add one more field to such a reference table - an integer sort order. I do this when I wish to provide users with a specific order for the list of values (in drop-down lists and so on) in a way that makes most sense to the users, should this be required. A query on the base table sorted on the manually-defined sort order field provides a view of the data which is independent of the integer primary key and of the alphanumeric ordering of the strings.

In my view it always makes sense to record reference data in a small table using an efficient primary key indexing method, even if this means introducing a numeric key value that is not that altogether meaningful to humans but is very efficient for machine storage (and, more importantly, for fast data retrieval).

I am sure there will be other views here, but it's the method I use all the time.

-Stewart
May 9 '08 #2
Scott Price
1,384 Expert 1GB
Correct me if Im wrong, but looking at what you have I assume you have one subject; Downloads, with various characteristics: Ready, Flagged_To_Start, etc.

That would mean one table called Downloads, with a primary key column and four boolean columns.

Each Download instance gets a separate record that records the boolean values of each characteristic (or Date-Time if you are intending this to be a logging table).

As Stewart pointed out, two people will give three opinions :)

Good luck, Ben...

Regards,
Scott
May 9 '08 #3

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

3 posts views Thread by Sim Nanda | last post: by
2 posts views Thread by Francois Vanderseypen | last post: by
10 posts views Thread by jack | last post: by
reply views Thread by Alison Givens | last post: by
reply views Thread by lrobo01 | last post: by
15 posts views Thread by didacticone | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.