Connecting Tech Pros Worldwide Help | Site Map

DB Structure Advice - lots of measurements in DB or file refs?

RgeeK
Guest
 
Posts: n/a
#1: Jul 7 '08
A project I'm thinking about, could use some advice from those who
understand the dimensional limits of an SQL-like DB.

Imagine, for example, a database containing detailed data for many
cities during thunderstorms. Each time there is a storm, data
collection begins, gathering "rain-drops-per millisecond" maybe 5 times
per second (don't worry about the sanity of this, its just an analogy).

The logistical data is no concern for me (cities, dates, storm duration
etc). I anticipate a few hundred cities and but also thousands of storm
events associated with those various cities.

However the storage of rain storm data is my question.

Should "rain-drop-rate" data be fields in a rain-drop-rate measurements
table, with columns of "rate-drop-rate" and a "storm event" foreign key
for the storm to which the event belongs? That seems like it would be
a poor use of DB resources, as there would be billions of entries in
that table after many hundreds or thousands of storms are captured.

The alternatives that come to mind are

a) each storm event has a file reference, and the file contains
thousands of pairs of time and rain-drop-rate. Then I have directories
full of files.

b) each storm event is a new table in the database with the columns of
"rain-drop-rate" and "time of measurement" (The database still gets
filled with billions of entries, but spread among thousands of tables)

Is there some other option I should consider? It seems like a is the
right answer but your input is appreciated...

Ross.
Erland Sommarskog
Guest
 
Posts: n/a
#2: Jul 7 '08

re: DB Structure Advice - lots of measurements in DB or file refs?


RgeeK (Ross@no.thanks.spammers) writes:
Quote:
Imagine, for example, a database containing detailed data for many
cities during thunderstorms. Each time there is a storm, data
collection begins, gathering "rain-drops-per millisecond" maybe 5 times
per second (don't worry about the sanity of this, its just an analogy).
>
The logistical data is no concern for me (cities, dates, storm duration
etc). I anticipate a few hundred cities and but also thousands of storm
events associated with those various cities.
>
However the storage of rain storm data is my question.
>
Should "rain-drop-rate" data be fields in a rain-drop-rate measurements
table, with columns of "rate-drop-rate" and a "storm event" foreign key
for the storm to which the event belongs? That seems like it would be
a poor use of DB resources, as there would be billions of entries in
that table after many hundreds or thousands of storms are captured.
>
The alternatives that come to mind are
>
a) each storm event has a file reference, and the file contains
thousands of pairs of time and rain-drop-rate. Then I have directories
full of files.
>
b) each storm event is a new table in the database with the columns of
"rain-drop-rate" and "time of measurement" (The database still gets
filled with billions of entries, but spread among thousands of tables)
>
Is there some other option I should consider? It seems like a is the
right answer but your input is appreciated...
Since we don't know the real business problem and what you will use
the data for, it's difficult to say.

But your desciption reminds me of the problem the racing team McLaren
faced. They implemented, together with people from Conchango, a solution
where used the new Filestream feature in SQL 2008. You can read about
it on
http://www.microsoft.com/casestudies...yid=4000001476

McLaren wanted to work with their data from Excel, why the Filestream
solution suited them well. If you want to run SELECT queries against the
data, this is less practical. Your logical model should certainly be
(storm event, raim-drop-rate, time-of-measurement), but there are some
alternatives to organise it. SQL Server offers to partitioning solutions:
partition views available already in SQL 2000 and in all editions, and
partitioned tables, added in SQL 2005, and available only in Enterprise
Edition.

Partitioned tables are more solid, but both solutions permit you to
add and drop partitions as needed. Partitioned tables can have up 999
partitions if memory serves, whereas partitioned views has a limit of
256 tables.





--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
--CELKO--
Guest
 
Posts: n/a
#3: Jul 8 '08

re: DB Structure Advice - lots of measurements in DB or file refs?


>Is there some other option I should consider? <<

Kx (http://kx.com/) and Streambase (http://www.streambase.com/); they
were designed from the ground up for capturing this kind of moving
data streams and have been in use for years.
RgeeK
Guest
 
Posts: n/a
#4: Jul 10 '08

re: DB Structure Advice - lots of measurements in DB or file refs?


Thanks for the suggestions. Happy to see the words "billions of
records" in there.



--CELKO-- wrote:
Quote:
Quote:
Quote:
>>Is there some other option I should consider? <<
>
Kx (http://kx.com/) and Streambase (http://www.streambase.com/); they
were designed from the ground up for capturing this kind of moving
data streams and have been in use for years.
Closed Thread