473,569 Members | 2,676 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Questions about building a data engine

Occasionally I try my hand at a simple data storage engine. Today I ran
across an article on the web
http://msdn2.microsoft.com/en-us/lib...51(vs.71).aspx, and it got me
thinking again. There doesn't seem to be a definative source, at least on
in layman terms, on this subject.

Here is what I believe to know so far. (As if I was to model after SQL
Server).

Data is stored in pages in blocks of 8Kb in a heap manner.

A single page of data contains a header section, rows of data, and a footer
section

In the header, you will find the page number, its previous page number and
the next page number.

This is a linked list for the owning "table" or object.

The header also contains the pointer the owning object, the pointer to
where the first row of data starts and

how much free space there is for row data.

The footer contains an array of row lengths of each row of data.

A single row of data is varying in length but has specific pieces.

Row Number

Row Type

Row Redirection Pointer

Field Count

Null Rows

Field Lengths

Field Data

As pages of data are needed, they are created in extents (8 pages at a time)
* the growth rate (chunk size) of the data file.

There are two types of indexes, clustered and non-clustered.

A clustered index is the data, meaning that the data is not stored in a heap
but rather stored as the index is specified. Because of the data storage
requirement, there can be only one clustered index per table.

Meanwhile with non-clustered indexes can be numerous, and they themselves
take up data row space. A single index may consist of up to 16 columns in
ascending or descending order and may optionally be case sensitive. Certain
columns are precluded from being part of an index such as Bit, Text, NText,
Binary and so on.

The index itself is supposedly a hybrid AVL tree. Meaning that it is self
balancing, but it can use a redirection pointer to another row of data in
the tree rather than continually compacting the data rows and pages. This
redirection is know as fragmentation.

Essentially the way the index works is by creating a node of a (key and a
pointer value) and adding it to an AVL tree. The key is comprised of some
or all of each column's value. The value is the pointer to the row of data
in the table of data.

Armed with that much information, here is a great place to get started
playing. Then I realize, I don't know where to start. I get stuck at the
same place each time.

Obviously I cannot house all the potential data in memory so some of this
data has to live on the hard drive. That said, how does one balance a tree
that is not in memory?

Since there are potentially multiple tasks going on in this data file at
once, especially if it is for multiple users, how does one perform
simultaneous reads and writes against a single file while managing the all
important locking?

Since there is potentially need for a lot of data to be available to the
users, it would be prudent to load and retain as much of the being accessed
into memory. But how does one limit the amount of resources the data engine
actually consumes?

At some point I would like to use SQL queries to actually work against the
data engine. I know that for this I should consider BNF parser. Does
anyone have ideas on how to get the SQL92 syntax in full BNF form? I have
found a parsing solution "Gold Parser", anyone know if there is a SQL92 BNF
script for Gold, or have another solution to look at?

Anyway, this is a long post and really a curiosity that I am toying with. I
appreciate an assistance any of you may offer. My goal is to stick with
VB.Net as this whole thing started in VB 5 a long, long time ago. In a
pinch, should I need to multicast delegates, I am willing to employ C#.
Jan 19 '07 #1
1 1244
I can't help you but I admire your curiosity. My approach has been to use
what there is (like sql server) and not ry to reinvent the wheel, but I
guess if everyone did that we'd still be living in caves :-).
Best of luck.
Bob
"AMDRIT" <am****@hotmail .comwrote in message
news:uv******** ********@TK2MSF TNGP02.phx.gbl. ..
Occasionally I try my hand at a simple data storage engine. Today I ran
across an article on the web
http://msdn2.microsoft.com/en-us/lib...51(vs.71).aspx, and it got
me thinking again. There doesn't seem to be a definative source, at least
on in layman terms, on this subject.

Here is what I believe to know so far. (As if I was to model after SQL
Server).

Data is stored in pages in blocks of 8Kb in a heap manner.

A single page of data contains a header section, rows of data, and a
footer section

In the header, you will find the page number, its previous page number
and the next page number.

This is a linked list for the owning "table" or object.

The header also contains the pointer the owning object, the pointer to
where the first row of data starts and

how much free space there is for row data.

The footer contains an array of row lengths of each row of data.

A single row of data is varying in length but has specific pieces.

Row Number

Row Type

Row Redirection Pointer

Field Count

Null Rows

Field Lengths

Field Data

As pages of data are needed, they are created in extents (8 pages at a
time) * the growth rate (chunk size) of the data file.

There are two types of indexes, clustered and non-clustered.

A clustered index is the data, meaning that the data is not stored in a
heap but rather stored as the index is specified. Because of the data
storage requirement, there can be only one clustered index per table.

Meanwhile with non-clustered indexes can be numerous, and they themselves
take up data row space. A single index may consist of up to 16 columns in
ascending or descending order and may optionally be case sensitive.
Certain columns are precluded from being part of an index such as Bit,
Text, NText, Binary and so on.

The index itself is supposedly a hybrid AVL tree. Meaning that it is self
balancing, but it can use a redirection pointer to another row of data in
the tree rather than continually compacting the data rows and pages. This
redirection is know as fragmentation.

Essentially the way the index works is by creating a node of a (key and a
pointer value) and adding it to an AVL tree. The key is comprised of some
or all of each column's value. The value is the pointer to the row of
data in the table of data.

Armed with that much information, here is a great place to get started
playing. Then I realize, I don't know where to start. I get stuck at the
same place each time.

Obviously I cannot house all the potential data in memory so some of this
data has to live on the hard drive. That said, how does one balance a
tree that is not in memory?

Since there are potentially multiple tasks going on in this data file at
once, especially if it is for multiple users, how does one perform
simultaneous reads and writes against a single file while managing the all
important locking?

Since there is potentially need for a lot of data to be available to the
users, it would be prudent to load and retain as much of the being
accessed into memory. But how does one limit the amount of resources the
data engine actually consumes?

At some point I would like to use SQL queries to actually work against the
data engine. I know that for this I should consider BNF parser. Does
anyone have ideas on how to get the SQL92 syntax in full BNF form? I have
found a parsing solution "Gold Parser", anyone know if there is a SQL92
BNF script for Gold, or have another solution to look at?

Anyway, this is a long post and really a curiosity that I am toying with.
I appreciate an assistance any of you may offer. My goal is to stick with
VB.Net as this whole thing started in VB 5 a long, long time ago. In a
pinch, should I need to multicast delegates, I am willing to employ C#.


Jan 19 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1729
by: PatchFactory Support | last post by:
Description: Professional and easy-to-use patch building environment that can help you to create instant patch packages for software and file updating. Generated patch packages are small size self-extracting executable update programs in a famous installer style with adjustable user-friendly interface and multilingual support. Enhanced with...
3
1230
by: Birt | last post by:
1. class Engine; class Car { Engine& e; .... }; How to initialize a reference member, e?
20
3299
by: Olav.NET | last post by:
I am a .NET/C++ developer who is supposed to do some work with Access. I do not know much about it except for the DB part. Questions: *1* I am looking for INTENSIVE books to get quickly up to speed. I like books with practical exercises, and also with test questions (like cert books) *2*
0
2056
by: Henry | last post by:
Using ideas provided by some of you I was able to figure out how to get the names of the parameters fields of a crystal report specified at run time. The code below just basically puts the data into a comboBox. One thing I noticed, however, is that this method shows me both the parameters used by the main report and parameters used by...
2
1944
by: dbuchanan52 | last post by:
Hello, I am building an application for Windows Forms using. I am new to SQL Server 'Views'. Are the following correct understanding of their use? 1.) I believe a view can be referenced in a stored procedure something like this; Select * from View1 Is that true?
4
1452
by: Java Challenge | last post by:
I am trying to work hard to become a programmer and eventually get a job as a programmer, I have a low paying job at the moment as technical support and a family to maintain. 1 - I started to studying "Programming C#" by Jesse Liberty at home in the evening and "Inside C#" at work (I go slowly because I also have to work, but that's better...
13
3968
by: royaltiger | last post by:
I am trying to copy the inventory database in Building Access Applications by John L Viescas but when i try to run the database i get an error in the orders form when i click on the allocate button "Unexpected Error":3251 operation is not supported for this type of object.The demo cd has two databases, one is called inventory and the other...
12
2411
by: Sam Kong | last post by:
Hi, JavaScript hides its memory structure. I know that numbers, booleans, null and undefined are value types (value is directed saved in a variable). I want to know: - How JavaScript distinguishes value types from reference types by seeing the variable content?
4
6687
by: AzizMandar | last post by:
C++ Event Coding Questions I have done some simple programs in C++ and read a lot of good C++ books (Including The C++ Programing Language, and C++ Primer) I am trying to understand and implement an Event based program and Message system. I have a very basic event engine that I'm feeling works a bit backwards. I'm looking for documents,...
0
7612
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7924
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
6283
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5512
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5219
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1212
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
937
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.