Need some help with design of tables/views

I have an applicaton in which I collect data for different parameters
for a set of devices. The data are entered into a single table, each
set of name, value pairs time-stamped and associated with a device.

The definition of the table is as follows:
CREATE TABLE devicedata
(
device_id int NOT NULL REFERENCES devices(id), -- id in the device
table
datetime datetime PRIMARY KEY CLUSTERED, -- date created
name nvarchar(256) NOT NULL, -- name of the attribute
value sql_variant NOT NULL -- value
)
For example, I have 3 devices, and each is monitored for two attributes
-- temperature and pressure. Data for these are gathered at say every
20 minute and every 15 minute intervals.

The table is filled with records over a period of time, and I can
perform a variety of SQL queries.
I have another requirement which requires me to retrieve the *latest*
values of temperature and pressure for each device.

Ideally, I'd like to use the data I have collected to get this
information, and I suppose I can.

What I need is the SELECT statement to do this.
I'd appreciate it very much, if someone can help provide that.
Conceivably, I could use a SQL server View for making this easier for
some of my users.

One alternate technique I thought was to create another table which I
*update* with the latest value, each time I *insert* into the above
table. But it seems like a waste to do so, and introduces needless
referential integrity issues (minor). Maybe for fast access, that is
the best thing to do.

I have requirements to maintain this data for several months/year or
two, so I am dealing with a large number of samples.

Any help would be appreciated.

(I apologize if this post appears twice)

Jul 23 '05 #1

Subscribe Post Reply

2897

Chandra

Here is the view that you are lookig for.

Please let me know if you require anything else

CREATE VIEW Latest_Data
AS
SELECT device_id, [datetime], name, value
FROM devicedata
INNER JOIN
(
SELECT device_id, max([datetime]) [datetime], name
FROM devicedata
GROUP BY device_id, name
) as DerTab
WHERE
devicedata.device_id = DerTab.device_id AND
devicedata.[datetime] = DerTab.[datetime] AND
devicedata.name = DerTab.name
GO
best Regards,
Chandra
http://groups.msn.com/SQLResource/
http://chanduas.blogspot.com/
---------------------------------------

*** Sent via Developersdex http://www.developersdex.com ***

Jul 23 '05 #2

Thanks, I'll try that out.

Another question, performance-wise, is there a near-linear increase in
cost for accessing latest values since the max seems to be computed
each time, or is the cost 'not so bad' since the table is indexed by
datetime?

Thanks again for your help.

Jul 23 '05 #3

Erland Sommarskog

sk (sh*************@hotmail.com) writes:

Another question, performance-wise, is there a near-linear increase in
cost for accessing latest values since the max seems to be computed
each time, or is the cost 'not so bad' since the table is indexed by
datetime?

Performance-wise a clustered index on (device_id, attribute, datetime)
would be to prefer.

It could also be worth considering of defining the attributes in a
seprate table, and only include the attribute in this table to
hold it down in size, which would also improve performance.

Finally, using [datetime] as the primary key seems funny. What if you
get two values close to each other in time? (Beware that datetime in
SQL Server does only have a resolution of 3.33 milliseconds.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #4

Thanks, all that makes sense now. Making datetime the primary key was
sheer thoughtlessness.

Jul 23 '05 #5

--CELKO--

>> Any help would be appreciated. <<

Always a dangerous thing to say; I hope you meant it and were not
fishing for kludges.

This non-relational disaster is called a EAV design and you can Google
the details of why and how it fails. Or just try to write a grouped
query or a simple relational division with your currrent non-table.
The basic problem is that it mixes data and metadata in the table. It
means you never even got to 1NF

What are you using either reserved words or vague words for data
element names? Why did device_id suddenly change its name from table
to table? In an RDBMS, design is at the schema level, not
table-by-table; that is a 1950's file system. Why did you use the
non-relational, proprietary sql_variant data type? Is this make
absolutely sure that this is as non-relational as possible or simply to
destroy portability and all hope of normalization? SQL is a strongly
typed language, not like lower level or OO languages. Try a table
more like this:

CREATE TABLE DeviceReadings
(device_id INTEGER NOT NULL
REFERENCES Devices (device_id),
reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
temperature DECIMAL(5,2) DEFAULT 0.00 NOT NULL,
pressure DECIMAL(5,2) DEFAULT 0.00 NOT NULL),
PRIMARY KEY (device_id, reading_timestamp));

I don't know all the attributes, the scales used, reasonable defaults
and all the other basic research you did before you wrote one line of
code. But did you really have something with a NVARCHAR(256) name (I
also see that it can be in Chinese, too)? Something like this leads me
to believe that you jumped on a EAV design because you have no idea
what the problem is or what the data is like at all. The "Fire! Ready!
Aim!" or agile school of data modeling?

The data are entered into a single table, ..<<
Perhaps we can name this table the
"Automobiles_and_BritneySpears_and_Squids" table in the data model
since it can hold anything? What is the very definition of a table?
It is a set of things of the same kind entities. If two things are
logically different, they are modeled with different tables that have
the proper attributes.

In an RDBMS, we design at the schema level and not at the table level.
You connect to the entire schema; this is not like the old days when
data from a device was collected on a casette tape drive under RTOS on
DEC equipment in the lab.
.. needless referential integrity issues <<
Love that phrase! How would you define a "needful referential
integrity issue"?
I have another requirement which requires me to retrieve the *latest* values of temperature and pressure for each device <<

Put this in a VIEW, so it is always current and so that you do not have
to keep updating physical storage. Again, you are still stuck in a
1950's file system mindset.

CREATE VIEW CurrentReadings(device_id, temperature, pressure)
AS
SELECT R1.device_id, R1.temperature, R1.pressure
FROM DeviceReadings AS R1
WHERE reading_timestamp
= (SELECT MAX (reading_timestamp)
FROM DeviceReadings AS R2
WHERE R1.device_id = R2.device_id);

See how easy that is to write with a normalized schema?

Stop over, do it right and get some help if you need it. Also,
remember there are often good reasons to use files to stage and scrub
data.

Jul 23 '05 #6

--CELKO-- wrote:

Any help would be appreciated. <<
Always a dangerous thing to say; I hope you meant it and were not
fishing for kludges.

Yes, I meant it. Particularly since a response to my query does not in
any way behoove me to take it on faith.

This non-relational disaster is called a EAV design and you can Google
the details of why and how it fails. Or just try to write a grouped
query or a simple relational division with your currrent non-table.
The basic problem is that it mixes data and metadata in the table. It
means you never even got to 1NF

What are you using either reserved words or vague words for data
element names?
It did not matter in the query I was posing, and I have since modified
it to not use reserved words -- they worked fine, but I see the point.
I do not understand your comment on vague words for data element names.
Why did device_id suddenly change its name from table
to table?
I am trying to pose a question in a ng as simply as possible, so that I
can get help on a specific question -- that of creating a VIEW. I
should have named it device_id in the referenced table as well --
thanks.
In an RDBMS, design is at the schema level, not
table-by-table; that is a 1950's file system. Why did you use the
non-relational, proprietary sql_variant data type?
I have no requirements to use anything but Microsoft SQL Server 2000
and >. Ever, on this project. So the issue of portability does not
arise. It is however, a very valid point if you point out that
"sql_variant is evil" for other reasons, but portability is not a
concern for me.

Maybe I should have just posed the question in
comp.databases.ms-sqlserver

I have a host of attributes (depending on the device: 23 to 37
currently, no two devices necessarily having the same set) that I need
to collect at varying rates, often, user-specified rates. Often, the
attributes are not collected concurrently -- some are manually entered,
some are collected thru pseudo device drivers, others a combination.

One approach was to have the complete set of variables--the superset,
appear as columns. Strongly typed. That certainly is still under
consideration. However, the nature of the project is such that I need
to accomodate additional devices in the future, and each likely brings
in a different set. I did not know of a solution to handle that.
Is this make
absolutely sure that this is as non-relational as possible or simply to
destroy portability and all hope of normalization?
No, it certainly was not the intent to make it as non-relational as
possible. I do not understand your comment on portability at all. I
only need to use Mirosoft SQL Server 2000 and >.
SQL is a strongly
typed language, not like lower level or OO languages.
Try a table
more like this:

CREATE TABLE DeviceReadings
(device_id INTEGER NOT NULL
REFERENCES Devices (device_id),
reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
temperature DECIMAL(5,2) DEFAULT 0.00 NOT NULL,
pressure DECIMAL(5,2) DEFAULT 0.00 NOT NULL),
PRIMARY KEY (device_id, reading_timestamp));

I don't know all the attributes, the scales used, reasonable defaults
and all the other basic research you did before you wrote one line of
code.
I could go into that, but that would mean describing my entire project.
I need help with a VIEW which was the reason I posed my question as I
did. The list of variables in my example were temperature and
pressure. In my application, as I mentioned previously, is much larger
(23 to 37 per device, and not necessarily the same set) and collection
is at different rates, some rates 10 to 100 times faster than others.

But did you really have something with a NVARCHAR(256) name (I
also see that it can be in Chinese, too)?
Yes, it can be in Simplified Chinese, Japanese, German, and French
(only). The name of the variable is described by the data collection
entity, not up to me.

Is nvarchar not the correct sql type for that?
Something like this leads me
to believe that you jumped on a EAV design because you have no idea
what the problem is or what the data is like at all. The "Fire! Ready!
Aim!" or agile school of data modeling?

I commend your ability to form this insight from a question on creating
a view.

The data are entered into a single table, ..<<
Perhaps we can name this table the
"Automobiles_and_BritneySpears_and_Squids" table in the data model
since it can hold anything?

We could, but then we would be wrong, since it only holds device data.
Has nothing to do with BritneySpears or Squids, and in this instance
nothing with Automobiles either.
What is the very definition of a table?
This table holds time series data of a number of attributes (of type
datetime, int, double, bit, and text) of devices.
It is a set of things of the same kind entities. If two things are
logically different, they are modeled with different tables that have
the proper attributes.

In an RDBMS, we design at the schema level and not at the table level.
You connect to the entire schema; this is not like the old days when
data from a device was collected on a casette tape drive under RTOS on
DEC equipment in the lab.
.. needless referential integrity issues <<
Love that phrase! How would you define a "needful referential
integrity issue"?

One that would result from performance considerations when NOT using
normalized forms. For example, in this case, I could have created
another table which would hold only the latest values. That appears to
make additional impositions on referential integrity, which prompted my
question in the first place.

I hold no patents on that phrase, nor do I insist that it is
unambiguous. It is kinda like saying "do the right thing, it is
crucial that you do not do it wrong"

I have another requirement which requires me to retrieve the *latest* values of temperature and pressure for each device <<

Put this in a VIEW, so it is always current and so that you do not have
to keep updating physical storage. Again, you are still stuck in a
1950's file system mindset.

CREATE VIEW CurrentReadings(device_id, temperature, pressure)
AS
SELECT R1.device_id, R1.temperature, R1.pressure
FROM DeviceReadings AS R1
WHERE reading_timestamp
= (SELECT MAX (reading_timestamp)
FROM DeviceReadings AS R2
WHERE R1.device_id = R2.device_id);

See how easy that is to write with a normalized schema?

Exceptionally easy, even I had it figured out which is probably why I
did not ask that question. Looks a lot like one of the other designs I
am entertaining at the moment. (Yet another is having 23 to 37 tables
representing the attributes, but all that was not relevant when I
asked, what I thought was a simple question)

I appreciate it, thanks for the advice. I'll reconsider if this
approach fits my problem better than the ones discussed here.

Incidentally, in my application, temperature and pressure are not
necessarily recorded at the same time. The above VIEW appears to
return the latest record -- it may or may not correspond to latest
value of the pressure.

If I had to query for "What's the latest value of <one of 23 to 37
attributes> for this device, and when was it last recorded?", the above
may or may not give me the correct result.

Maybe I am missing something.

Stop over, do it right and get some help if you need it. Also,
remember there are often good reasons to use files to stage and scrub
data.

Thanks, that is very helpful.

Jul 23 '05 #7

Erland Sommarskog

sk (sh*************@hotmail.com) writes:

In an RDBMS, design is at the schema level, not
table-by-table; that is a 1950's file system. Why did you use the
non-relational, proprietary sql_variant data type?

I have no requirements to use anything but Microsoft SQL Server 2000
and >. Ever, on this project. So the issue of portability does not
arise. It is however, a very valid point if you point out that
"sql_variant is evil" for other reasons, but portability is not a
concern for me.

Maybe I should have just posed the question in
comp.databases.ms-sqlserver

Well, you did, but Joe Celko thinks he has a mission to fill here.
No, please don't ask me what that mission would be.

Anyway, from what I could guess from the table description, sql_variant
looks like a good choice to me.

If there are a whole bunch of parameters, and they are dynamcially
added all the time, you cannot have a column for everyone of them.
Instead it's better to have row for each of them. In previous versions
of SQL Server, you would then have a couple of value columns - one for
each data type. sql_variant makes it possible to have a single column.

What is a good thing is to define the attributes somewhere, and with
their definition also keep track of their data type. For an
attributes like temperature, you don't really want to find a
datetime value.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #8

--CELKO--

>> I do not understand your comment on vague words for data element names. <<

Names like "name" beg the question "name of what?" A data element
should tell you what it is at the schema level, without the need for a
local context. The ISO-11179 Standard or my SQL PROGRAMMING STYLE give
some rules for this; it is basically a "<noun><modifier>" pattern --
"client_name", "client_status", etc. where the second part tells the
reader what the scale of the attribute is.

I do not understand your comment on portability at all. <<
That is what I said the first ten years I programmed, too :) I have an
essay at DBAzine entitled "I will never port this code" that goes into
the short-sighted attitude of new programmers. One way to assure that
you will not port your code is to write it so poorly or with so many
proprietary extensions that it is cheaper to throw it out and start
over. Wre saw a lot of this with the *= extension when infixed joins
came into SQL Server.
other designs I am entertaining at the moment. (Yet another is having 23 to 37 tables representing the attributes, .. Incidentally, in my application, temperature and pressure are not necessarily recorded at the same time.<<

If they are truly independent measurement events, then you need to have
23 to 37 tables with the proper constraints, defaults and keys to model
them. I assumed that several of the measurements are made at the same
time by the same device, rather than by separate devices for each
measurement. Temperature and pressure are a common combination in lab
equipment, various chemical values can be detected with one probe, etc.
sql_variant is a proprietary way to destory First Normal Form and data
integrity. Have you tried to write a CHECK() constraint assure that
this "magical one size fits all" pseudo-column allows the right
datatype and the proper range for all of the 23 to 37 measurement?
That is, what stops me from having a pH reading of "purple", a
temperature of -15 Absolute, etc.

There is an old joke about a teacher quizzing a class of really dumb
kids:

"Billy, what is 3 times 6?"
"ahhhh, Tueday, Teacher!"
"Wrong! Sally, what is 3 times 6?"
"Red!"
"Wrong! James, what is 3 times 6?"
"18!"
"Right! Tell the class how you got the answer."
"I divided Tuesday by red."

That is how I feel about weakly typed data models.

Jul 23 '05 #9

--CELKO-- wrote:

I do not understand your comment on vague words for data element names. <<
Names like "name" beg the question "name of what?" A data element
should tell you what it is at the schema level, without the need for a
local context. The ISO-11179 Standard or my SQL PROGRAMMING STYLE give
some rules for this; it is basically a "<noun><modifier>" pattern --
"client_name", "client_status", etc. where the second part tells the
reader what the scale of the attribute is.

Thanks, that helps me understand your comment.

I do not understand your comment on portability at all. <<
That is what I said the first ten years I programmed, too :) I have an
essay at DBAzine entitled "I will never port this code" that goes into
the short-sighted attitude of new programmers.

I am not wise enough to presume short-sightedness or the relative
vintage of programmers based on a ng question or two, but have been in
business long enough that I can dare presume if the application/product
*I* am working on will ever require to be ported, and the costs in case
I was wrong.

I do understand that portability is highly desireable, but it is just
another project/application consideration as are costs, market needs,
time-to-market, i18n concerns, development time, testing time, design
and verification time, etc. In my experience, the last three listed,
influence portability more than anything else, and are often severely
underestimated.

That does not automatically translate into portability being an
overarching requirement that trumps all other considerations. If my
customers are strictly Microsoft Windows users, portability to
non-Microsoft solutions is low or more likely, non-existent in terms of
requirements.

In fact, whether I even use a database or not, is of no consequence to
them in *this* application.

Not all applications have the same requirements, which ironically, is
eerily similar to the mantra that is being pummelled: "one size does
not fit all".
One way to assure that
you will not port your code is to write it so poorly or with so many
proprietary extensions that it is cheaper to throw it out and start
over.
I see your confusion: The requirement is not to assure that the code or
the database will not be ported, instead there is simply no requirement
to port my application to use another database.
Wre saw a lot of this with the *= extension when infixed joins
came into SQL Server.
other designs I am entertaining at the moment. (Yet another is having 23 to 37 tables representing the attributes, .. Incidentally, in my application, temperature and pressure are not necessarily recorded at the same time.<<

If they are truly independent measurement events, then you need to have
23 to 37 tables with the proper constraints, defaults and keys to model
them.

Thanks, I'll devote more time and energies in considering this
approach, further. Any suggested reading on strategies for efficiently
organizing time-series data?
I assumed that several of the measurements are made at the same
time by the same device, rather than by separate devices for each
measurement. Temperature and pressure are a common combination in lab
equipment, various chemical values can be detected with one probe, etc.
sql_variant is a proprietary way to destory First Normal Form and data
integrity. Have you tried to write a CHECK() constraint assure that
this "magical one size fits all" pseudo-column allows the right
datatype and the proper range for all of the 23 to 37 measurement?
That is, what stops me from having a pH reading of "purple", a
temperature of -15 Absolute, etc.

Nothing, at declaration time. (With due respect to Sommarskog's
pointers on this aspect)

And what if I do not need that declarative constraint? In other words,
why I am bound to use that aspect of a RDBMS? What if my application
can live adequately without declarative constraint checks?

One other way I can do this application is to not use a RDBMS at all --
for the purposes of this application, that too is perfectly acceptable.
There is no mandate. One reason I am entertaining MS SQL server is
that I can see value (read time savings) in using SQL to perform the
kind of queries needed in the application. I still need to validate my
design ideas, run it through sufficient benchmarks, etc.

Why does it need to be an all-or-nothing with using an RDBMS? Why
can't I simply log all my raw data into MS SQL server, and then use SQL
for doing some analysis? If I find that constructing SQL queries for
searching data between times with simple filters is adequate for my
application, why would I care if I can write that CONSTRAINT at all?

It is true that I could have bought a nail clipper and a corkscrew
separately, instead of a swiss army knife, but if the swiss army knife
was cheaper, quicker to obtain, and was adequate for my purpose, why
must I be compelled to use the knife instrument in the swiss army knife
at all?

[Insert "use flat files, this application does not need a RDBMS"
soapbox here]

There is an old joke about a teacher quizzing a class of really dumb
kids:

"Billy, what is 3 times 6?"
"ahhhh, Tueday, Teacher!"
"Wrong! Sally, what is 3 times 6?"
"Red!"
"Wrong! James, what is 3 times 6?"
"18!"
"Right! Tell the class how you got the answer."
"I divided Tuesday by red."

That is how I feel about weakly typed data models.

:-)

Sommarskog's and Chandra's responses gave me enough pointers on what I
was looking for. So did some of yours. I remain thankful for all
responses.

Since we are indulging in experiences, here's how I feel about some Q&A
on usenet

Question: "What's the time?"

Helpful answer: "3 o'clock"

Helpful answer: "3 o'clock in the afternoon, PST"

Another helpful answer: "3 o'clock in the afternoon, PST. Say, why
don't you buy one of those digital watches, they are on sale at Dell
online!"

An occasional response: "Here's how you build a clock..."

A common response: "If you have to ask for the time in this day and
age, then you are stuck in the 50s, and making dumb mistakes by firstly
asking a stranger for the time. How do you know that his watch is
accurate, or that he even has a watch? How do you know that the
stranger is not lying to you? You'll probably be late for that client
meeting and lose your livelihood if you relied on his answer! I have
seen many friends of mine losing their jobs because they asked
strangers for the time. You rarely ever need to ask the time. You are
better off buying a nice watch -- look at the one I am wearing, for
instance. Be sure that you buy one that will simultaneously show times
in 7 countries in the world. It is important, nay, life-threateningly
important, that the watch does indeed show times in 7 different
countries, because the watch that I used for the first ten years only
kept time in one just time zone, and it is dumb to assume that you'll
never be visiting those 7 countries. And right here in the ISO standard
I am carrying for this watch, it states clearly 'The cost of a watch
showing times in 7 different countries, is nothing, or next to nothing,
provided that you use the watch in the right manner'. Also, do not buy
a digital LCD watch, LCDs were never meant to show time--they are just
an unholy by-product of the moral decay caused by the Nintendo
generation who do nothing but waste time; LCDs were primarily designed
for something else. If an LCD breaks, as it'll in a year or actual
usage, you'll be left with no choice but to replace it. Do the right
thing, and buy an analog watch today -- easy to replace parts if it
breaks. And boy, was I dumb enough in my formative years to actually
believe that just knowing the time in Walla Walla was sufficient!"

Jul 23 '05 #10

Need some help with design of tables/views

Similar topics