473,327 Members | 2,012 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 473,327 developers and data experts.

AntDB’s implementation of DML data flashback based on WAL logs

26 16bit
1.Overview
AntDB database is a home-grown distributed database with MPP architecture, highly compatible with Oracle syntax, which is widely used in many industries such as communication, finance and transportation. In the process of using AntDB database, users often delete data or update data by mistake due to misoperation or application bugs, which affects the normal use of business. Almost all databases will encounter similar problems, and most databases will provide a “Data Flashback” tool, which can be used to quickly recover misoperated data.
According to the type of misoperation, data flashback can be divided into DML data flashback and DDL data flashback. DML data flashback is mainly for scenarios like insert, update and delete , while DDL data flashback is mainly for scenarios like drop table and truncate table, and AntDB can support both DML and DDL data flashback. This article will mainly introduce the implementation details of DML data flashback.
2.Solution Selection
There are two mainstream ways to implement DML data flashback in AntDB. The first one is WAL log-based data flashback, which has two prerequisites: the table structure has not been changed and enough WAL logs are kept, and the WAL logs are parsed for data recovery with the table structure metadata. The second one is implemented based on MVCC mechanism, AntDB’s update and delete records are not really deleted, but generate new records and mark the old ones, so that the old records can be retrieved by using the multi-version information of the records and transaction submission information before vacuum execution. Both approaches have their own applicable scenarios and drawbacks.
WAL log-based data flashback requires keeping a large number of WAL log files, which can take up a lot of storage space. When the number of WAL files is large, it needs to traverse all WAL logs, which is time consuming, and the table cannot do DDL operations. It has the advantage of simple logic, easier in technical implementation, and there are readily available open source implementations in the community for reference.
The MVCC-based mechanism, on the other hand, needs to ensure that vacuum does not clear unused multi-version tuples, and vacuum-related parameters need to be set carefully. Since the multi-version tuple cannot be cleared and released in time, it will also bring the problem of taking up a lot of space. However, the MVCC-based implementation is more complex and requires deeper kernel code changes, so after investigating the advantages and disadvantages of the two solutions, we choose the WAL log-based DML data flashback implementation.
3.Implementation Principle
3.1 WAL Log Composition
The WAL log consists of a number of Records, which is divided into different types, as shown in Figure 1:


Figure 1: types of WAL Record
The underlying storage of AntDB is a Heap table. From the perspective of data flashback, only the Heap table and transaction-related types need to be concerned, other Record types do not affect the result of data flashback. The transaction-related Record records the status of the transaction, commit or abort, and the data of the transaction in abort state does not need to be flashed back, so only the transaction in commit state needs to be concerned.
Heap type Record records the transaction data generated by insert, update, and delete operations, which can be further subdivided according to the DML type, as shown in Figure 2:


Figure 2: subtypes of WAL Record

3.2 Parsing Process
DML data flashback supports flashback by table name, time interval, transaction number XID and LSN interval. From the internal implementation, all these flashback methods are eventually converted into an LSN interval containing the starting LSN and ending LSN, and if a table name is specified, it will be filtered by table name when parsing WAL Record records. In Figure 3, lsn3~lsn4 are the LSN interval to be parsed.


Figure 3: Diagram of LSN interval
The WAL log-based data flashback requires two important parameters: wal_level and full_page_writes.The first parameter wal_level which needs to be set to replica or logical, minimal level saves less information and is not enough to recover data.The second parameter full_page_writes needs to be set to on. AntDB will write the full page for the first update to the WAL log after each checkpoint. This mechanism can ensure no data loss under abnormal scenarios such as downtime, for example, if a crash occurs during a dirty page swipe and the original page is found to be corrupted during recovery, then the data can be recovered through full page inside the WAL. The Full Page Write (FPW) page holds the raw data of the page, which is essential for parsing the WAL data. some Record recording its own data is not sufficient for parsing out the complete forward and reverse SQL, and needs to rely on the raw data provided by the FPW. The FPW writes after one checkpoint, and when flashing back data, the LSN interval selected should be before an available checkpoint. In the code implementation, even if the LSN interval is after a checkpoint, it will search forward until an available checkpoint is searched.
After the LSN interval of WAL is determined, the WAL Record within this interval can be parsed. According to the different Record types, the original SQL and the flashback SQL are stitched together with the metadata and FPW and recorded into the result table. The overall parsing process is shown in Figure 4:


Figure 4: WAL parsing process
In the process of parsing WAL logs, the parser uses transactions as the basic unit for parsing. A transaction corresponds to a TransactionEntry object, and a transaction can contain multiple SQL statements, each of which corresponds to an element of the change chain table. On transaction submission, all elements of the change chain table are stitched together to produce the forward SQL text, as well as the reverse SQL text.
3.3 Toast Table Parsing
A difficult part of the WAL parsing process is the toast table. AntDB uses the toast mechanism to store tables with large fields. Toast is called The OverSized Attribute Storage Technique. The main idea of the toast mechanism is to use additional toast tables to store large field data, avoiding a record spanning multiple pages, and storing the oid of the corresponding toast table and the chunk_id of its data rows in the original field storage area.
The reltoastrelid field of the pg_class table records the oid of its corresponding toast table, and the table name of the toast table is suffixed with the oid of its parent table. For example, a regular table t with oid 10022077 and reltoastrelid 10022080, the corresponding toast table for table t is: pg_toast_10022077 and oid 10022080. The regular table is distinguished from the toast table in pg_class by the field relkind, which is 'r' for the regular table and 't' for the toast table.
In terms of a DML operation performed on a regular table involving large fields, the order of wal written to that table and its corresponding toast table is as follows:
  • Insert operation- insert toast table records first, and then write to the regular table
  • Update operation- insert toast table records first, then delete toast table records, and finally update the regular table
  • Delete operation- delete the regular table records first, then delete toast table records
As you can see above, both insert and update operate on the toast table first, and then operate on the regular table, while delete is the opposite, which is crucial for handling the flashback of toast table data in the delete scenario.
3.4 Flashback Data Storage
The SQL parsed from the WAL logs will be stored in a table, the structure of which is shown in Table 1:
Table 1: parsing result table


op_text is the original SQL statement and undo_text is the flashback SQL statement. If you want to recover the data quickly, you need to export undo_text and execute it once in reverse chronological order to recover the data for this period.
4.Usage Example
Create a table t, insert 3 rows of records, and then delete all the records of table t, as shown in Figure 5:


Figure 5: operation of deleting data by mistake
Execute the flashback function wal2sql of DML data, specifying the table name t as the parameter. It is able to see the results of the DML data flashback from the wal2sql_contents table after the execution is completed. op_text indicates the original SQL, and undo_text indicates the reverse flashback SQL, as shown in Figure 6:


Figure 6: operation of data flashback

Improvements
AntDB’s DML data flashback tool solves a series of scenario-related problems during the development process, mainly including toast table parsing, subtransaction parsing, Oracle compatibility issues; it also solves scenario-related performance problems, such as performance problems in massive table scenarios and performance problems in large data volume scenarios. AntDB’s DML data flashback tool not only supports standalone version, but also supports data flashback in AntDB distributed cluster scenario.
  • [1]Toast table parsing problem is mainly due to the processing order of the original table and toast table in the delete scenario is different from that in the insert and update scenarios, which requires special treatment, otherwise it will lead to incorrect parsing results of toast table in the delete scenario.
  • [2]Sub-transaction is a scenario that AntDB uses more often. The parsing of sub-transaction needs to strictly consider the relationship between parent and child transactions and the transaction commit status, and the accuracy of sub-transaction should be ensured when parsing sub-transaction, and no part of sub-transaction should be lost.
  • [3]Compatibility with Oracle is a major feature of AntDB. AntDB is highly compatible with Oracle syntax and storage types, such as rowid of Oracle type. Data flashback needs to consider compatibility with Oracle to ensure that tables created and data generated under Oracle syntax can be flashed back correctly.
  • [4]Data flashback needs to support distributed cluster.With AntDB’s distributed architecture, data is stored in different data nodes and DML data flashback can support data flashback of all data nodes. In the use method, you only need to connect to any CN node to execute the data flashback command, which is exactly the same as the standalone version in the use method, and it is easy to use.
  • [5]The performance improvement of data flashback mainly includes the performance improvement of execution in massive table scenario, for instance, the database contains more than 1 million tables with huge amount of metadata, and this scenario needs to ensure that the parsing performance will not have a big drop. In addition, the performance of execution is optimized in large transaction scenario. 1 transaction containing a large number of SQL and large fields involving a large number of toast table parsing, is required to ensure that the performance will not be decreased too much.

Conclusion
This article introduces a way of AntDB to implement DML data flashback based on WAL logs, discusses its implementation principle, and makes a lot of improvements and enhancements based on the community open source implementation. Data flashback is a very important tool, although usually used infrequently, but in an emergency can save data. Data flashback is not limited to one way, AntDB will explore MVCC-based multi-version control data flashback in the future to provide more implementation paths for data recovery.
About AntDB database
AntDB database was started in 2008. On the core system of operators, AntDB provides online services for more than 1 billion users in 24 provinces across the country. With product features such as high performance, elastic expansion and high reliability, AntDB database can process one million core communications transactions per second at peak, ensuring the continuous and stable operation of the system for nearly ten years, and is successfully implemented for commercial purpose in communication, finance, transportation, energy, Internet of Things and other industries.
Jan 6 '23 #1
0 484

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Domenico Discepola | last post by:
Hello all. Before my arrival at my current employer, our consultants physically set up our MSSQL 7 server as follows: drive c: contains the mssql engine drive d: contains the transaction log...
0
by: DB2 DBA wanted | last post by:
Maines Paper & Food Service is a $2 billion/yr food distribution company with 9 distribution centers in the US. We are currently interviewing for the position detailed below. Relocation to...
5
by: Brian Henry | last post by:
Hello, I am working on a simple email website that is kinda like hotmail... but the listing of messages is in a data grid data bound to a table called messages... now one of the columns in the...
18
by: jpolaski | last post by:
I'd like to change the color of text in a table data field based on the content... For example, I'd like all <td>pos</td> to show the "pos" in red. All <td>pending</td> should show as the normal...
0
by: Piotrek | last post by:
Hi. I am developing a web app. On the main page I have two user controls: searchControl and detailsControl. In the searchControl I have a GridView. On this GridView DataBound handler I am...
1
by: Maxwell2006 | last post by:
Hi, I am working with strongly typed datatables. What is the most efficient way to build a new DataTAble based on the result of DataTable.Select? At this point I use a foreach loop to do the...
1
FOE2272
by: FOE2272 | last post by:
I am working on an Access DB that has a form with 8 radio buttons. The customers are those who bid on a project. each customer has the following information: Company Contact Adress City
12
by: betabrain.honshu | last post by:
Hi Folks, for those of you who are familiar with the micropledge.com project, here is a good opportunity to spend or earn something: http://micropledge.com/projects/pysalsa20 I know that the...
0
by: John Brainard | last post by:
Background: i have a table with data in it, field names & from Service1 through Service8. i have textboxes on my form named: txtDetailName1, txtDetailData 1 Here is an example of WRONG...
0
by: ema agasta | last post by:
PLEASE,HELP ME how to update the data db2 based on the last digit number 2 ?? for example: EMPNO NAME SALARY 6093 ANA 2300 1139 budi 4500 2393 ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.