Learning significance
As we all know, whether it is the front end or the back end, the final system data will be read and written through the database system. As shown in the figure below, the registered data will eventually enter the database through the back-end server. Therefore, it can be considered that the database is the core of the system. Especially once high concurrency is involved, the database may become the bottleneck of the system, so it is necessary to study the bottom of the database, Here is a brief summary of the underlying implementation of the basic database to lay the foundation for the follow-up in-depth study of the database
background knowledge
Storage & Database
The storage system includes:
-
Block storage: stores the underlying system in the software stack, but the interface is too simple
-
File storage: the most widely used storage system has friendly interfaces and various implementations
-
Object storage: a trump product on the public cloud, immutable semantic blessing
-
Key value storage: the most flexible form, with a large number of open source / black box products
The database system includes:
-
Relational: Based on relationship and relational algebra, it generally supports transaction and SQL access, and uses experience friendly storage products
-
Non relational: flexible structure and access mode, different targeted products in different scenarios
The distributed architecture includes:
-
Data distribution strategy: determines how the data is distributed to multiple physical nodes in the cluster, whether it is uniform, and whether it can achieve high performance
-
Data replication protocol: affects the performance of IO paths and the handling of machine failure scenarios
-
Distributed transaction algorithm: the algorithm that multiple database nodes cooperate to ensure the ACID characteristics of a transaction, which is usually designed based on the idea of 2pc
database structure
The execution process of an SQL statement in the database is shown in the figure below:
- The database accepts the SQL statement text passed by the client
- After lexical analysis, a group of entries are obtained
- Abstract syntax tree (AST) is obtained through semantic parsing
- Get the expression through semantic parsing
- After Rule-Based Optimization (RBO), mainly query rewriting, expression simplification, predicate pushdown, etc
- After cost based optimization (CBO), the optimal query expression is obtained, that is, enumerate all paths, calculate the cost of each path, and select the path with the lowest cost
- Build logical plan and then build physical plan
- Implementation plan during the implementation period
- Return results
The database includes three engines:
SQL Engine
- Parser: query and parse, generate a syntax tree, and verify the validity (morphology, syntax, semantics)
- Optimizer: select the optimal execution path according to the syntax tree
- Executor: query the execution process and process the data truly
Take sql statement as an example:
SELECT a FROM test WHERE a > 4;
Lexical analysis, sql statements are cut into entries:
SELECT | a | FROM | test | WHERE | a | > | 4 |
---|
Grammatical parsing combines the sequence of entries into various grammatical short sentences, which match the established grammatical rules. If the matching is successful, the corresponding abstract syntax tree will be generated. Otherwise, a grammatical error will be reported, and the established rules:
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG cwzehmib-1658938296741)( https://blog-1258366838.cos.ap-nanjing.myqcloud.com/view )]
When parsing the grammar, move to one entry for matching each time, and the matching will be regulated. Otherwise, continue to move until all entries are moved and successful, and the regulation will be parsed to generate the corresponding grammar tree. Take the above entry as an example: move to SELECT without specification and the remaining entries, and continue to move to a, a can be specified as tartet_list, using tartet_list replaces the entry a, moves to from, and continues to move to the test specification as from_list instead of test, then from and from_list can also be specified as from_clause, continue to move the protocol to where, continue to move to a, continue to move to expr instead of a, continue to move to >, continue to move to 4, and then expr>expr can be reduced to a_expr, with a_expr instead, where and a_expr protocol into where_clause, final SELECT and target_list and from_clause and where_clause is reduced to simple_select_clause, generate the corresponding syntax tree after parsing
Semantic analysis is to examine the validity of syntax trees (AST), such as tables, columns, column types, functions, expressions, etc. Continue to review three places in the above regular meeting:
- from_ Claim: check whether the table test in the statement exists
- target_list: check whether column A is an attribute of a relationship or view in the from clause
- where_clause: check whether column A is an attribute of a relationship or view in the from clause and whether the type of column a can be compared with >4
After semantic parsing, the corresponding expression will be generated for use by the optimizer
Transaction engine
Realize four features of transaction ACID
The concept of transaction: a transaction is a sequence of database operation commands, either executed or not executed. It is an integral logical unit, and data consistency is guaranteed through transaction integrity
ACID attribute of transaction:
-
Atomicity: transactions can no longer be divided, and operations in transactions either occur or do not occur
-
Consistency: the integrity constraints of the database are not destroyed before and after the transaction
-
Isolation: different transactions in a concurrent environment are independent and do not depend on or affect other transactions
-
Persistence: after the transaction is completed, the changes to the database will be permanently saved in the database and will not be rolled back
Four impacts between transactions (indirect):
- Dirty read: one transaction reads uncommitted data from another transaction, and this data may be rolled back
- Non repeatable reading: two identical queries in a transaction return different data. It is caused by the submission of other transaction modifications in the system during query
- Phantom reading: a transaction modifies the data in a table, and the modification involves all data rows in the table. At the same time, another transaction also inserts a new row of data into the table. Then the user who operates the previous transaction will find that there are still data rows in the table that have not been modified, just like an illusion
- Lost update: two transactions read the same record at the same time. A changes it first, and B also changes the record (B does not know that a has changed it). After B submits the data, B's modification result overwrites a's result
Four isolation between transactions:
quarantine | explain | effect |
---|---|---|
Uncommitted read uncommitted | Read uncommitted data | Do not solve dirty reading |
Submit read committed | Read submitted data | It can solve dirty reading, and the default isolation level of Oracle and SQL Server |
repeatable read | Repeat read | It can solve dirty reads and non repeatable reads, and mysql has the default isolation level |
Serial read serializable | Serialization | It is equivalent to locking the table. Each read and write requires a table level shared lock, which blocks each other |
Transaction control statement:
- START TRANSACTION: BEGIN or START TRANSACTION, which explicitly starts a transaction
- COMMIT transaction: COMMIT or COMMIT WORK. All modifications to the database become permanent. set autocommit=0 or 1 to set whether to COMMIT automatically
- ROLLBACK: ROLLBACK or ROLLBACK WORK will end the user's transaction and undo all uncommitted modifications in progress
- Create rollback point: SAVEPOINT S1, create a rollback point s1 in a transaction, and there can be multiple rollback points in a transaction
- Rollback to rollback point: ROLLBACK TO [SAVEPOINT] S1, rollback the transaction to mark point s1
Storage engine
Store data, indexes, logs
Storage engine is the storage mode or format in which the database stores data in the file system. Each storage engine uses different storage mechanisms, indexing techniques and ultimately provides different functions. The storage engine is on the file system, and the data will be transferred to the storage engine before being saved to the data file, and then stored according to the storage format of each storage engine
system design
Project breakdown
The following figure shows the interaction between the modules of the whole system, mainly including Parser, Optimizer, Executor, etc. specifically, the main functions include:
- SQL Engine:
- Parser: query and parse, produce the syntax tree, and verify the validity
- Optimizer: select the optimal execution path from the syntax tree
- Executor: query execution process based on volcanic model
- Transaction engine: support transaction commit and rollback mechanism
- Storage engine:
- Data structure design
- Index structure design
Project construction
Write CMakeLists.txt file with CMake tool for cross platform compilation, refer to link , cmake file is directly given here
cmake_minimum_required(VERSION 3.8) project(MyDB) set(CMAKE_CXX_STANDARD 11) set(CMAKE_CXX_COMPILER "g++") set(CMAKE_CXX_FLAGS "-g -Wall -Werror -std=c++11") set(CMAKE_CXX_FLAGS_DEBUG "-O0") set(CMAKE_CXX_FLAGS_RELEASE "-O2 -DNDEBUG ") set(CMAKE_INSTALL_PREFIX "install") set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib) set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin) include_directories(${CMAKE_SOURCE_DIR}/sql-parser/include) add_subdirectory(src/main) add_subdirectory(src/sql-parser-test)
SQL Engine Design
Parser
SQL parsing is very cumbersome, and open source is directly used SQL parser Therefore, you only need to compile it into a library and include its header file
Since the SQL parser library only provides lexical analysis and syntax analysis, and generates different query trees as shown in the following figure, semantic analysis, that is, validity verification, cannot be performed. Therefore, the SQL parser library is encapsulated and semantic analysis function is added
The core logic of semantic analysis is to call the SQLParser::parse interface to parse sql statements and verify whether they are legal. Call checkStmtsMeta to perform syntax tree semantic analysis, and this function calls checkMeta to do different checks according to different syntax tree categories
//semantic analysis bool Parser::parseStatement(std::string query) { result_ = new SQLParserResult; SQLParser::parse(query, result_);//Interface //check if (result_->isValid()) { return checkStmtsMeta(); } else { std::cout << "[BYDB-Error] Failed to parse sql statement." << std::endl; } return true; } bool Parser::checkStmtsMeta() { for (size_t i = 0; i < result_->size(); ++i) { const SQLStatement* stmt = result_->getStatement(i); if (checkMeta(stmt)) { return true; } } return false; } //Make different checks according to different categories bool Parser::checkMeta(const SQLStatement* stmt) { switch (stmt->type()) { case kStmtSelect: return checkSelectStmt(static_cast<const SelectStatement*>(stmt)); case kStmtInsert: return checkInsertStmt(static_cast<const InsertStatement*>(stmt)); case kStmtUpdate: return checkUpdateStmt(static_cast<const UpdateStatement*>(stmt)); case kStmtDelete: return checkDeleteStmt(static_cast<const DeleteStatement*>(stmt)); case kStmtCreate: return checkCreateStmt(static_cast<const CreateStatement*>(stmt)); case kStmtDrop: return checkDropStmt(static_cast<const DropStatement*>(stmt)); case kStmtTransaction: case kStmtShow: return false; default: std::cout << "[BYDB-Error] Statement type " << StmtTypeToString(stmt->type()) << " is not supported now." << std::endl; } return true; }
Here, take kStmtSelect as an example to briefly explain that everything else is the same. For details, please refer to the code. If the statement type is select, first obtain the table name and judge whether the table exists. Otherwise, continue to judge whether the groupBy clause in the select statement is empty and whether it supports UNION and other operations. For details, please refer to the code, which is not repeated
bool Parser::checkSelectStmt(const SelectStatement* stmt) { TableRef* table_ref = stmt->fromTable; Table* table = getTable(table_ref); if (table == nullptr) { std::cout << "[BYDB-Error] Can not find table " << TableNameToString(table_ref->schema, table_ref->name) << std::endl; return true; } if (stmt->groupBy != nullptr) { std::cout << "[BYDB-Error] Do not support 'Group By' clause" << std::endl; return true; } if (stmt->setOperations != nullptr) { std::cout << "[BYDB-Error] Do not support Set Operation like 'UNION', " "'Intersect', ect." << std::endl; return true; } if (stmt->withDescriptions != nullptr) { std::cout << "[BYDB-Error] Do not support 'with' clause." << std::endl; return true; } if (stmt->lockings != nullptr) { std::cout << "[BYDB-Error] Do not support 'lock' clause." << std::endl; return true; } if (stmt->selectList != nullptr) { for (auto expr : *stmt->selectList) { if (checkExpr(table, expr)) { return true; } } } if (stmt->whereClause != nullptr) { if (checkExpr(table, stmt->whereClause)) { return true; } } if (stmt->order != nullptr) { for (auto order : *stmt->order) { if (checkExpr(table, order->expr)) { return true; } } } if (stmt->limit != nullptr) { if (checkExpr(table, stmt->limit->limit)) { return true; } if (checkExpr(table, stmt->limit->offset)) { return true; } } return false; }
Optimizer
According to the generated query tree, the corresponding plan tree is generated. The plan tree is composed of basic operators. For the scenarios required in this project, the following basic operators are constructed:
Similarly, different plan trees are generated according to different types
Plan* Optimizer::createPlanTree(const SQLStatement* stmt) { switch (stmt->type()) { case kStmtSelect: return createSelectPlanTree(static_cast<const SelectStatement*>(stmt)); case kStmtInsert: return createInsertPlanTree(static_cast<const InsertStatement*>(stmt)); case kStmtUpdate: return createUpdatePlanTree(static_cast<const UpdateStatement*>(stmt)); case kStmtDelete: return createDeletePlanTree(static_cast<const DeleteStatement*>(stmt)); case kStmtCreate: return createCreatePlanTree(static_cast<const CreateStatement*>(stmt)); case kStmtDrop: return createDropPlanTree(static_cast<const DropStatement*>(stmt)); case kStmtTransaction: return createTrxPlanTree(static_cast<const TransactionStatement*>(stmt)); case kStmtShow: return createShowPlanTree(static_cast<const ShowStatement*>(stmt)); default: std::cout << "[BYDB-Error] Statement type " << StmtTypeToString(stmt->type()) << " is not supported now." << std::endl; } return nullptr; }
For example, for an UPDATE query, the corresponding UpdatePlan plan tree is as follows:
Create an UpdatePlan plan tree by calling createupdateplanttree. The code is as follows:
Plan* Optimizer::createUpdatePlanTree(const UpdateStatement* stmt) { Table* table = g_meta_data.getTable(stmt->table->schema, stmt->table->name); Plan* plan; ScanPlan* scan = new ScanPlan(); scan->type = kSeqScan; scan->table = table; plan = scan; if (stmt->where != nullptr) { Plan* filter = createFilterPlan(table->columns(), stmt->where); filter->next = plan; plan = filter; } UpdatePlan* update = new UpdatePlan(); update->table = table; update->next = plan; for (auto upd : *stmt->updates) { size_t idx = 0; update->values.push_back(upd->value); for (auto col : *table->columns()) { if (strcmp(upd->column, col->name) == 0) { update->idxs.push_back(idx); break; } idx++; } } return update; }
Executor
Rely on the Plan tree to generate the corresponding execution tree. Each Plan generates a corresponding Operator, as shown in the figure below. The code is as follows:
BaseOperator* Executor::generateOperator(Plan* plan) { BaseOperator* op = nullptr; BaseOperator* next = nullptr; /* Build Operator tree from the leaf. */ if (plan->next != nullptr) { next = generateOperator(plan->next); } switch (plan->planType) { case kCreate: op = new CreateOperator(plan, next); break; case kDrop: op = new DropOperator(plan, next); break; case kInsert: op = new InsertOperator(plan, next); break; case kUpdate: op = new UpdateOperator(plan, next); break; case kDelete: op = new DeleteOperator(plan, next); break; case kSelect: op = new SelectOperator(plan, next); break; case kScan: { ScanPlan* scan_plan = static_cast<ScanPlan*>(plan); if (scan_plan->type == kSeqScan) { op = new SeqScanOperator(plan, next); } break; } case kFilter: op = new FilterOperator(plan, next); break; case kTrx: op = new TrxOperator(plan, next); break; case kShow: op = new ShowOperator(plan, next); break; default: std::cout << "[BYDB-Error] Not support plan node " << PlanTypeToString(plan->planType); break; } return op; }
Each Operator calls next_ Exec to call the lower level Operator to generate data
class BaseOperator { public: BaseOperator(Plan* plan, BaseOperator* next) : plan_(plan), next_(next) {} ~BaseOperator() {} virtual bool exec() = 0; Plan* plan_; BaseOperator* next_; };
Transaction engine
Without considering concurrency, and data does not need to be stored and persisted, the design of transaction engine becomes simple. There is no need to implement the MVCC mechanism, just the transaction Commit and Rollback functions
Here we implement an undo stack mechanism. Each time a row of data is updated, the old version of this row of data is push ed into the undo stack. If the transaction is rolled back, pop the data of the old version one by one from the undo stack and restore it to the original data
The transaction is defined as follows. There are three types: yinsertundo, kDeleteUndo, and kUpdateUndo:
enum UndoType { kInsertUndo, kDeleteUndo, kUpdateUndo }; struct Undo { Undo(UndoType t) : type(t), tableStore(nullptr), curTup(nullptr), oldTup(nullptr) {} ~Undo() { if (type == kUpdateUndo) { free(oldTup); } } UndoType type; TableStore* tableStore; Tuple* curTup; Tuple* oldTup; }; class Transaction { public: Transaction() : inTransaction_(false) {} ~Transaction() {} ...... void begin(); void rollback(); void commit(); bool inTransaction() { return inTransaction_; } private: bool inTransaction_; std::stack<Undo*> undoStack_; }; extern Transaction g_transaction;
The commit and rollback codes are as follows:
void Transaction::rollback() { while (!undoStack_.empty()) { auto undo = undoStack_.top(); TableStore* table_store = undo->tableStore; undoStack_.pop(); switch (undo->type) { case kInsertUndo: table_store->removeTuple(undo->curTup); break; case kDeleteUndo: table_store->recoverTuple(undo->oldTup); break; case kUpdateUndo: memcpy(undo->curTup->data, undo->oldTup->data, table_store->tupleSize() - TUPLE_HEADER_SIZE); break; default: break; } delete undo; } inTransaction_ = false; } void Transaction::commit() { while (undoStack_.empty()) { auto undo = undoStack_.top(); TableStore* table_store = undo->tableStore; undoStack_.pop(); if (undo->type == kDeleteUndo) { table_store->freeTuple(undo->oldTup); } delete undo; } inTransaction_ = false; }
Storage engine
data structure
Because it is a memory database, the data structure design is simple. Apply for a batch of recording memory each time to reduce memory fragmentation and improve memory efficiency. Then put the memory of this batch of records into FreeList. When inserting data, get a memory write from FreeList and put it into DataList. When deleting data, return the data from the DataList to the FreeList. For convenience, a two-way linked list is used here
struct Tuple { Tuple* prev; Tuple* next; uchar data[]; }; class TupleList { public: TupleList() { head_ = static_cast<Tuple*>(malloc(sizeof(Tuple))); tail_ = static_cast<Tuple*>(malloc(sizeof(Tuple))); head_->next = tail_; tail_->prev = head_; head_->prev = nullptr; tail_->next = nullptr; } ...... private: Tuple* head_; Tuple* tail_; }; class TableStore { public: TableStore(std::vector<ColumnDefinition*>* columns); ~TableStore(); ...... TupleList freeList_; TupleList dataList_; };
Index design
Because only equivalence matching is required here, the simplest hash index is used
summary
Simple implementation of a database prototype is still very rough, but it is still very helpful for beginners to learn the underlying principles of the database. It can have an overall concept of the database principles and actual code reference link