473,326 Members | 2,196 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Questions about building a data engine

Occasionally I try my hand at a simple data storage engine. Today I ran
across an article on the web
http://msdn2.microsoft.com/en-us/lib...51(vs.71).aspx, and it got me
thinking again. There doesn't seem to be a definative source, at least on
in layman terms, on this subject.

Here is what I believe to know so far. (As if I was to model after SQL
Server).

Data is stored in pages in blocks of 8Kb in a heap manner.

A single page of data contains a header section, rows of data, and a footer
section

In the header, you will find the page number, its previous page number and
the next page number.

This is a linked list for the owning "table" or object.

The header also contains the pointer the owning object, the pointer to
where the first row of data starts and

how much free space there is for row data.

The footer contains an array of row lengths of each row of data.

A single row of data is varying in length but has specific pieces.

Row Number

Row Type

Row Redirection Pointer

Field Count

Null Rows

Field Lengths

Field Data

As pages of data are needed, they are created in extents (8 pages at a time)
* the growth rate (chunk size) of the data file.

There are two types of indexes, clustered and non-clustered.

A clustered index is the data, meaning that the data is not stored in a heap
but rather stored as the index is specified. Because of the data storage
requirement, there can be only one clustered index per table.

Meanwhile with non-clustered indexes can be numerous, and they themselves
take up data row space. A single index may consist of up to 16 columns in
ascending or descending order and may optionally be case sensitive. Certain
columns are precluded from being part of an index such as Bit, Text, NText,
Binary and so on.

The index itself is supposedly a hybrid AVL tree. Meaning that it is self
balancing, but it can use a redirection pointer to another row of data in
the tree rather than continually compacting the data rows and pages. This
redirection is know as fragmentation.

Essentially the way the index works is by creating a node of a (key and a
pointer value) and adding it to an AVL tree. The key is comprised of some
or all of each column's value. The value is the pointer to the row of data
in the table of data.

Armed with that much information, here is a great place to get started
playing. Then I realize, I don't know where to start. I get stuck at the
same place each time.

Obviously I cannot house all the potential data in memory so some of this
data has to live on the hard drive. That said, how does one balance a tree
that is not in memory?

Since there are potentially multiple tasks going on in this data file at
once, especially if it is for multiple users, how does one perform
simultaneous reads and writes against a single file while managing the all
important locking?

Since there is potentially need for a lot of data to be available to the
users, it would be prudent to load and retain as much of the being accessed
into memory. But how does one limit the amount of resources the data engine
actually consumes?

At some point I would like to use SQL queries to actually work against the
data engine. I know that for this I should consider BNF parser. Does
anyone have ideas on how to get the SQL92 syntax in full BNF form? I have
found a parsing solution "Gold Parser", anyone know if there is a SQL92 BNF
script for Gold, or have another solution to look at?

Anyway, this is a long post and really a curiosity that I am toying with. I
appreciate an assistance any of you may offer. My goal is to stick with
VB.Net as this whole thing started in VB 5 a long, long time ago. In a
pinch, should I need to multicast delegates, I am willing to employ C#.
Jan 19 '07 #1
1 1233
I can't help you but I admire your curiosity. My approach has been to use
what there is (like sql server) and not ry to reinvent the wheel, but I
guess if everyone did that we'd still be living in caves :-).
Best of luck.
Bob
"AMDRIT" <am****@hotmail.comwrote in message
news:uv****************@TK2MSFTNGP02.phx.gbl...
Occasionally I try my hand at a simple data storage engine. Today I ran
across an article on the web
http://msdn2.microsoft.com/en-us/lib...51(vs.71).aspx, and it got
me thinking again. There doesn't seem to be a definative source, at least
on in layman terms, on this subject.

Here is what I believe to know so far. (As if I was to model after SQL
Server).

Data is stored in pages in blocks of 8Kb in a heap manner.

A single page of data contains a header section, rows of data, and a
footer section

In the header, you will find the page number, its previous page number
and the next page number.

This is a linked list for the owning "table" or object.

The header also contains the pointer the owning object, the pointer to
where the first row of data starts and

how much free space there is for row data.

The footer contains an array of row lengths of each row of data.

A single row of data is varying in length but has specific pieces.

Row Number

Row Type

Row Redirection Pointer

Field Count

Null Rows

Field Lengths

Field Data

As pages of data are needed, they are created in extents (8 pages at a
time) * the growth rate (chunk size) of the data file.

There are two types of indexes, clustered and non-clustered.

A clustered index is the data, meaning that the data is not stored in a
heap but rather stored as the index is specified. Because of the data
storage requirement, there can be only one clustered index per table.

Meanwhile with non-clustered indexes can be numerous, and they themselves
take up data row space. A single index may consist of up to 16 columns in
ascending or descending order and may optionally be case sensitive.
Certain columns are precluded from being part of an index such as Bit,
Text, NText, Binary and so on.

The index itself is supposedly a hybrid AVL tree. Meaning that it is self
balancing, but it can use a redirection pointer to another row of data in
the tree rather than continually compacting the data rows and pages. This
redirection is know as fragmentation.

Essentially the way the index works is by creating a node of a (key and a
pointer value) and adding it to an AVL tree. The key is comprised of some
or all of each column's value. The value is the pointer to the row of
data in the table of data.

Armed with that much information, here is a great place to get started
playing. Then I realize, I don't know where to start. I get stuck at the
same place each time.

Obviously I cannot house all the potential data in memory so some of this
data has to live on the hard drive. That said, how does one balance a
tree that is not in memory?

Since there are potentially multiple tasks going on in this data file at
once, especially if it is for multiple users, how does one perform
simultaneous reads and writes against a single file while managing the all
important locking?

Since there is potentially need for a lot of data to be available to the
users, it would be prudent to load and retain as much of the being
accessed into memory. But how does one limit the amount of resources the
data engine actually consumes?

At some point I would like to use SQL queries to actually work against the
data engine. I know that for this I should consider BNF parser. Does
anyone have ideas on how to get the SQL92 syntax in full BNF form? I have
found a parsing solution "Gold Parser", anyone know if there is a SQL92
BNF script for Gold, or have another solution to look at?

Anyway, this is a long post and really a curiosity that I am toying with.
I appreciate an assistance any of you may offer. My goal is to stick with
VB.Net as this whole thing started in VB 5 a long, long time ago. In a
pinch, should I need to multicast delegates, I am willing to employ C#.


Jan 19 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: PatchFactory Support | last post by:
Description: Professional and easy-to-use patch building environment that can help you to create instant patch packages for software and file updating. Generated patch packages are small size...
3
by: Birt | last post by:
1. class Engine; class Car { Engine& e; .... }; How to initialize a reference member, e?
20
by: Olav.NET | last post by:
I am a .NET/C++ developer who is supposed to do some work with Access. I do not know much about it except for the DB part. Questions: *1* I am looking for INTENSIVE books to get quickly up to...
0
by: Henry | last post by:
Using ideas provided by some of you I was able to figure out how to get the names of the parameters fields of a crystal report specified at run time. The code below just basically puts the...
2
by: dbuchanan52 | last post by:
Hello, I am building an application for Windows Forms using. I am new to SQL Server 'Views'. Are the following correct understanding of their use? 1.) I believe a view can be referenced in a...
4
by: Java Challenge | last post by:
I am trying to work hard to become a programmer and eventually get a job as a programmer, I have a low paying job at the moment as technical support and a family to maintain. 1 - I started to...
13
by: royaltiger | last post by:
I am trying to copy the inventory database in Building Access Applications by John L Viescas but when i try to run the database i get an error in the orders form when i click on the allocate...
12
by: Sam Kong | last post by:
Hi, JavaScript hides its memory structure. I know that numbers, booleans, null and undefined are value types (value is directed saved in a variable). I want to know: - How JavaScript...
4
by: AzizMandar | last post by:
C++ Event Coding Questions I have done some simple programs in C++ and read a lot of good C++ books (Including The C++ Programing Language, and C++ Primer) I am trying to understand and...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.