On Hash Index of AntDB-M

26 16bit

One of the key technologies to speed up the access of database is index, and the design and usage of index greatly affect the performance of database. AntDB-M supports two types of indexes, Hash and BTree. This article mainly explains the design of Hash index and gives some suggestions for using it.

1.Concepts

Bucket

A container used to locate index records. Each element in the container records the unique number of a table record, and an empty element indicates that there is no table record at the current index location.

Bucket Size

The number of unique record numbers that can be stored directly in a bucket.

Bucket Subscript

The bucket subscript is obtained by index column modulo the Hash value and the bucket size after Hash calculation.

Index Record Chaining Table

Due to the existence of Hash conflict, the bucket subscripts obtained from different table records after Hash calculation may be the same. The same bucket subscripts of index records are stored in a two-way chain table, that is, the index record chain table, the element on the bucket is the head of index record chain table.

2.Memory Structure

AntDB-M's memory structure of Hash index is designed to be very simple and efficient. Each Hash index has two parts: bucket, index record chain table.

Bucket

Bucket is a one-dimensional array, each element in the array is a unique number of table record, and the unique number of index record is the same as the unique number of table record.

Figure 1: Bucket

Index Record Chaining Table

AntDB-M index record chain table is used to record index records with the same hash value. The memory structure is a three-level structure: 1) primary address; 2) secondary address; 3) chain table nodes; the organization of this structure is similar to that of table data (refer to "AntDB-M’s Memory Structure"), and the records in the chain table can be quickly located by the unique number of the records.
The nodes of the index record chain table have three parts: the previous record number, the next record number, and the transaction (only the unique index has it). The record number of the record in the bucket is the record number of the head node of the chain table.

Figure 2: Index record chain table

As you can see above, the entire structure of the Hash index involves only the record number and transaction information, which is a very lightweight structure that scales memory on demand and does not consume too much memory.

3.Index Processing

3.1 Persistence

AntDB-M index includes two parts of data: index metadata, index records. When doing the persistence of data, only the index metadata will be persisted.

3.2 Initialization

When AntDB-M database service starts, first load the table data records, then load the index metadata, and finally initialize the index records in memory according to the index metadata.

3.3 Insert Index Records

When a table inserts a record, the table record will be inserted first, and then the index record will be inserted. The new record is added to the head of the index chain table to improve the insertion speed.

General Index

Figure 3: insert general index

For general indexes, because there is no need to determine the uniqueness of records, concurrent inserts do not affect each other and can be inserted directly. The newly inserted index record is effective immediately, i.e., it can be accessed immediately when other transactions query it, but the visibility of the table record is controlled by the transaction information on the table record as well as the lock. On transaction commit, the general index does not require any further processing. The transaction removes the index record itself from the index record chain table when rolling back.

Unique index

Figure 4: insert unique index

For unique indexes, it is necessary to detect the presence of a unique key conflict. When the same record already exists in the chain table of index records with the same bucket subscript and the transaction has not been committed (the transaction field on the index record has a value), it blocks and waits for other transactions to commit or roll back to detect the unique key conflict. If the transaction is committed, the transaction information on the index record is cleared. If the transaction is rolled back, the index record is deleted. Transaction commit and rollback both notify the transaction that is blocking and waiting.

3.4 Index Query

When querying by hash index, the hash bucket size is first modulo the hash value calculated by the index column to get the bucket subscript, then get the list of index records under the subscript, get the record number of the matching record, and put the number into the query context allocated for the current transaction for subsequent query traversal. The index record has the same record number as the data record, and the data record can be located immediately based on the record number. The main overhead of the whole process is the calculation of the hash value and the comparison of the indexed record items, and the performance is very high.

3.5 Deleting Index Records

Figure 5-General index deletion

Figure 6 - Unique Index Deletion

When a table deletes a record, it first deletes the table data record (marks the delete mark and updates the transaction information), and then updates the transaction on the unique index record to the current transaction. Since ordinary indexes do not have transaction information, no operation is done. The transaction on the updated index record is for conflict detection when the data record uniqueness is concurrent in multiple transactions. Since the current transaction has already acquired the mutex lock on the record, updating the index record does not affect the consistency of the index record and can be updated directly.

When the transaction commits, the index record is not immediately deleted, and the index record is still found by the index during lookup. The visibility of the table record is still controlled by the transaction information and lock information on the table record (i.e. MVCC). A separate cleanup thread will later perform the actual deletion of the data record and index record when the data record is no longer accessed by the transaction.

When the transaction is rolled back, only the transaction information on the index record needs to be cleared for unique indexes.

In the case of unique indexes, the transaction commit and rollback will notify the transaction that is blocking and waiting.

3.6 Update Index Entries
Updates to table records update the indexes only when index column updates are involved. When an index update is involved, the modification of table records is converted to deleting the old records first and inserting the new ones later. The operations for indexes are similarly converted to inserting index entries and deleting index entries as described above. The commit and rollback of transactions are also the same as inserting and deleting indexes as described above.

3.7 Node-Chain Table Extensions
The space size (number of records) of the index node chain table is the same as the space size of the table records. When the new record in the table exceeds the table space size and the table space needs to be extended, the index node chain table is extended at the same time.

3.8 Bucket reconstruction (rehash)
When creating an index, the initial value of the index bucket size can be specified by the index attribute block_size, or by the current number of table records if not specified, with a minimum value of 100000. If the number of records exceeds the bucket size, the bucket will be expanded and the hash index will be rebuilt.

New Bucket Size

The new bucket size is the smallest prime number larger than the current bucket size.

Bucket access lock

For each thread, the first access to the index is assigned a bucket access node with the current bucket address and a mutex lock. When a thread starts accessing the index, it first requests the lock on the access node and releases the lock at the end of the access. The lock ensures that the bucket resources are not released due to rehash during the index access.

Asynchronous Rebuilding

Bucket rebuild is an asynchronous process, where the bucket rebuild process does not affect index queries or updates. The process affects whether the old bucket is accessed or the new bucket by recording the current migration location. When data migration is complete and there is no access to the old bucket (no thread holds a bucket access lock on the bucket), the old bucket resources are released.

3.9 Indexes and Locks

To avoid too many locks taking up too many system resources and to maintain high concurrency, each Hash index of AntDB-M database is assigned 131 locks, and the lock of the current index record is obtained by bucket subscript with 131 taking modulo. Locks are added first during accessing indexes (query or modification).

4.Index Features

4.1 Prefixed Indexes

A prefix index is one that applies a specified column, or a specified prefix length of a specified column, as an index.
This characteristic is applicable in the following two cases:
1.Index column data is too long
2.The data of the index column part is applicable to the scenario of specific business query.

AntDB-M can also specify the prefix length of each column, not just the prefix length of the last column, which provides a very flexible indexing capability for business and facilitates customers to achieve more efficient and responsive business landing.

4.2 Data Distribution

For each Hash index, the number of records with the same Hash value under the current position is counted.
This statistic is mainly used to determine whether there is a large Hash conflict in the current data. It is convenient for the operation and maintenance managers of AntDB-M to quickly locate whether it is suitable to build a Hash index and whether it is necessary to further optimize the Hash index.

5.Restriction Constraints

5.1 Do not support range query

For Hash indexes, range queries are not supported because the indexed records themselves are not sorted.
Note: Range queries on indexed column fields are performed with full table traversal.

5.2 Fuzzy query is not supported

AntDB-M supports fixed-length left prefix matching. However, fuzzy matching is not supported because the Hash calculation itself requires exact values.

5.3 Index columns should not have too many identical values

If there are more columns with the same value, these records will be located in the same index record chain table, and each query will iterate through all the records in this list for filtering, the more data, the lower the performance.

6.Conclusion

It is the simplicity of Hash indexing that allows AntDB-M database to provide efficient indexing capability with less memory and computation, which improves database access performance while significantly reducing the system resource overhead. However, there are some limitations on the use of Hash indexing algorithm, mainly due to its own characteristics.

Understanding these characteristics can help business model designers to choose the right index type to maximize the performance of the database.

Jun 27 '23 #1

Subscribe Post Reply

281

by: Alban Hertroys | last post by:

Hello all, I have a table with about 400,000 records and a btree index (numeric). A simple SELECT * FROM table WHERE id = ... takes more than a second for every query and I need to query each...

PostgreSQL Database

how can I change a btree index into a hash index?

by: Mark Harrison | last post by:

I create a table like so: create table types ( typeid integer unique not null, typename varchar(255) unique not null ); and I get the expected messages: NOTICE: CREATE TABLE / UNIQUE...

PostgreSQL Database

Index on TEXT versus CHAR(32)... fast exact TEXT matching

by: Jon Lapham | last post by:

I have a table that stores TEXT information. I need query this table to find *exact* matches to the TEXT... no regular expressions, no LIKE queries, etc. The TEXT could be from 1 to 10000+...

PostgreSQL Database

My hash table is in need of peer review

by: Johan Tibell | last post by:

I would be grateful if someone had a minute or two to review my hash table implementation. It's not yet commented but hopefully it's short and idiomatic enough to be readable. Some of the code...

C / C++

Hash index creation problem

by: Spacen Jasset | last post by:

I am running postgresql at version 8.1 on a windows xp desktop machine. I then loaded 40 million rows into a table of 6 varchar fields ( The sum total length of a row is on average about 100...

PostgreSQL Database

139

How to implement a Hash Table in C

by: ravi | last post by:

Hi can anybody tell me that which ds will be best suited to implement a hash table in C/C++ thanx. in advanced

C / C++

if db2 udb can create hash index?

by: db2db2db2 | last post by:

hello all, we know that db2 create index using B-Tree,but if i want to create index using hash structure,how can i do it? in my opinion, db2 usually create index using B-tree,my question is if db2...

DB2 Database

Hash Function

by: boolat | last post by:

hi,im doing my assignment,,but sadly i dnt knw how to do...help me please b4 19april2010 Code a hash function, which computes the index number based on the MMU ID, which is a 10-digit key. The...

C / C++

'hash index' in DB2 V10.

by: srilata83 | last post by:

Hi All, Could anyone please explain me about 'hash index' in DB2 V10 with the examples.Any amerials or materilas can be shared with me will be highly appreciated. Also, someone has expalined...

DB2 Database

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

On Hash Index of AntDB-M

Similar topics