Author: Jin Chongrong
This article replays the analysis process of a crash case in detail. Reviewed the relevant knowledge of C++ polymorphism and class memory layout, pc pointer and chip exception handling, and memory barrier.
1. The collapse of not talking about "wude"
1.1 View the crash call stack
The customer reported a crash problem and provided a core dump file. View the crash call stack as follows:
(gdb) bt #0 0x0000000078432d68 in asl::LooperObserverMan::notifyIdle (this=<optimized out>, looper=0x160eebd40, delay_queue_size=0) at ../../../../src/asl_message_framework/src/BaseMessageLooper.cpp:371 #1 0x00000000784928e4 in asl::MessageQueue::fetchNext (this=this@entry=0x160eedfc0, timing=@0xf4e9f60: 0) at ../../../../src/asl_message_framework/src/MessageQueue.cpp:83 #2 0x0000000078492b24 in asl::MessageQueue::next (this=0x160eedfc0, timing=@0xf4e9f60: 0) at ../../../../src/asl_message_framework/src/MessageQueue.cpp:60 #3 0x000000007832036c in asl::Looper::loop (this=0x160eebd40) at ../../../../src/asl_message_framework/src/Looper.cpp:107 #4 0x0000000078495ee0 in asl::MessageThread::run (this=0x7998e678) at ../../../../src/asl_message_framework/src/MessageThread.cpp:56 #5 0x000000007851cc70 in asl::Thread::runCallback (param=0x7998e678) at ../../../../src/asl_message_framework/src/Thread.cpp:183 #6 0x00000000010314e0 in ?? ()
Obviously, the crash occurred in the asl::LooperObserverMan::notifyIdle() function, line 371 of the BaseMessageLooper.cpp file, the source code is as follows:
1.2 Segmentation fault location is not as expected
A segment fault is prompted during a crash, which is usually an illegal address access. Combined with the source code, we have reason to suspect that the node->observer pointer exception (null pointer or wild pointer) caused the crash in this line, or that although node is not empty, it may be a wild pointer that caused the crash . View node and node->observer:
(gdb) p node $8 = (asl::LooperObserverMan::ObserverNode *) 0x17bb988e0 (gdb) p node->observer $10 = (asl::IMessageLooper::Observer *) 0x7998e758
The result is quite unexpected, these two pointers can be accessed normally.
So far, the analysis of the problem has reached a deadlock. This crash seems to be unreasonable, and it does not speak martial arts. Such a legal and normal memory access actually caused a segmentation fault.
2. Under the compilation, every detail is revealed
It is said that "there is no secret in front of the source code", now the source code is in front of us, the nodifyIdle function has a total of 7 lines, but the computer has changed "magic" in front of us. In fact, the computer is also very wronged, because the "source code" seen by human eyes is not the "source code" seen by the machine, what the machine sees is binary! At this time, people are also wronged, and the 0101 binary seen by the machine is also difficult for my human brain to process! Then everyone takes a step back, isn't assembly between the high-level language and the machine binary code?
2.1 "Zoom in" source code with assembly
A line of C++ code can be converted into multiple assembly instructions, and the assembly code is an enlarged version of the high-level language source code. So let's take a look at the assembly at the time of the crash.
(gdb) disas Dump of assembler code for function asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int): 0x0000000078432d30 <+0>: stp x19, x20, [sp,#-48]! 0x0000000078432d34 <+4>: stp x21, x22, [sp,#16] 0x0000000078432d38 <+8>: str x30, [sp,#32] 0x0000000078432d3c <+12>: ldr x19, [x0] 0x0000000078432d40 <+16>: cbz x19, 0x78432d8c <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+92> 0x0000000078432d44 <+20>: mov x22, x1 0x0000000078432d48 <+24>: mov w21, w2 0x0000000078432d4c <+28>: adrp x20, 0x786b0000 0x0000000078432d50 <+32>: b 0x78432d60 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+48> 0x0000000078432d54 <+36>: nop 0x0000000078432d58 <+40>: ldr x19, [x19,#8] 0x0000000078432d5c <+44>: cbz x19, 0x78432d8c <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+92> 0x0000000078432d60 <+48>: ldr x0, [x19] 0x0000000078432d64 <+52>: ldr x1, [x20,#1160] => 0x0000000078432d68 <+56>: ldr x2, [x0] 0x0000000078432d6c <+60>: ldr x3, [x2,#56] 0x0000000078432d70 <+64>: cmp x3, x1 0x0000000078432d74 <+68>: b.eq 0x78432d58 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+40> 0x0000000078432d78 <+72>: mov w2, w21 0x0000000078432d7c <+76>: mov x1, x22 0x0000000078432d80 <+80>: blr x3 0x0000000078432d84 <+84>: ldr x19, [x19,#8] 0x0000000078432d88 <+88>: cbnz x19, 0x78432d60 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+48> 0x0000000078432d8c <+92>: ldp x21, x22, [sp,#16] 0x0000000078432d90 <+96>: ldr x30, [sp,#32] 0x0000000078432d94 <+100>: ldp x19, x20, [sp],#48 0x0000000078432d98 <+104>: ret End of assembler dump.
Use the disas command of gdb to view the disassembly of the function on the top of the current stack. It does change the 7 lines of C++ code of notifyIdle into 27 lines of assembly instructions, allowing us to see more details.
2.2 Finding the immediate cause
Pay attention to the instructions indicated by the arrows in the figure above, namely:
=> 0x0000000078432d68 <+56>: ldr x2, [x0]
This 0x0000000078432d68 is the value of the current pc register, and the crash occurred in this ldr instruction. The meaning of this instruction is to use the value stored in the x0 register as the memory address, and load the value stored at the address in the memory into the x2 register:
(gdb) i register x0 x0 0x2e002e 3014702 (gdb) x 0x2e002e 0x2e002e: Cannot access memory at address 0x2e002e
Check that the x0 register stores 0x2e002e (the following 3014702 is the decimal of 0x2e002e), and an error occurred when we tried to fetch the memory data at this address.
So far, the direct cause of the crash has been found, and the machine has finally "repented". It shows that it has indeed encountered inaccessible memory, so the segment fault exception is triggered.
Three, stripping cocoons, detailed analysis
3.1 Analyze the compilation and find clues
Look at the three assembly instructions before the crash:
0x0000000078432d60 <+48>: ldr x0, [x19] 0x0000000078432d64 <+52>: ldr x1, [x20,#1160] => 0x0000000078432d68 <+56>: ldr x2, [x0]
These three instructions are executed sequentially, with no other jump instructions interrupting them. The value of x0 is load ed from the memory pointed to by x19, check the relevant registers and memory:
(gdb) i register x19 x19 0x17bb988e0 6370724064 (gdb) x 0x17bb988e0 0x17bb988e0: 0x7998e758 (gdb) x 0x7998e758 0x7998e758: 0x79989f40
It can be seen that 0x17bb988e0 is stored in x19, and 0x7998e758 is obtained by fetching the content of this address. Normally, this value should be stored in x0, but in fact, the illegal address 0x2e002e is stored in x0, and 0x7998e758 is a legal address, which can be retrieved normally. It contains 0x79989f40.
3.2 Suspected reason 1: stepping on memory
The problem occurs between these three lines of assembly instructions. First of all, we suspect whether it is a problem of stepping on memory.
What is stored in x0 is the value stored in x19 as the address. The memory at this address will see the final form when it crashes. Although the memory pointed to by x19 can be accessed in the end, is it possible that the block of ldr x0 [x19] The value of the memory is still 0x2e002e? In addition, although the memory pointed to by x19 can be accessed eventually, being accessible does not necessarily mean that it meets expectations. Will this memory be messed up?
3.2.1 Linked list node pointer memory meets expectations
First, let's confirm the second question and see what memory corresponds to the address stored in x19 when it finally crashes:
(gdb) i register x19 x19 0x17bb988e0 6370724064 (gdb) x 0x17bb988e0 0x17bb988e0: 0x7998e758 (gdb) p node $2 = (asl::LooperObserverMan::ObserverNode *) 0x17bb988e0 (gdb) p node->observer $3 = (asl::IMessageLooper::Observer *) 0x7998e758
It is found that the address of node is stored in x19, and the content of it is exactly the address of node->observer, which is as expected, and observer is the first member of node:
struct ObserverNode { IMessageLooper::Observer * observer; ObserverNode * next; };
3.2.2 Class memory layout as expected
Further view observer content:
(gdb) p *(node->observer) $4 = {_vptr.Observer = 0x79989f40}
It can be seen that the virtual table address of the Observer class is 0x79989f40, and further check whether the content of the virtual table meets expectations:
(gdb) x /16a 0x79989f40 0x79989f40: 0x7990c9e0 0x7990c9f0 0x79989f50: 0x78411698 <asl::IMessageLooper::Observer::onLooperStart(asl::IMessageLooper*, int, int)> 0x79909598 0x79989f60: 0x799097d8 0x799099d0 0x79989f70: 0x784116b8 <asl::IMessageLooper::Observer::onLooperBusy(asl::IMessageLooper*)> 0x79909bd8 0x79989f80: 0x784116c8 <asl::IMessageLooper::Observer::onLooperQuit(asl::IMessageLooper*)> 0x784116d0 <asl::IMessageLooper::Observer::onLooperDestroy(asl::IMessageLooper*)> 0x79989f90: 0x784116d8 <asl::IMessageLooper::Observer::onLooperCancelMsg(asl::IMessageLooper*, asl::Message*, unsigned long, unsigned long)> 0x7990c988 0x79989fa0: 0x7990c990 0x7990c998 0x79989fb0: 0x7990c9a0 0x7990c9a8
You can see the function pointers in the virtual table, and find that the memory pointed to by node and node->observer meets expectations.
3.2.3 Eliminate the possibility of stepping on memory
Let’s look at the first question again: what is stored in x0 is the memory pointed to by the address stored in x19, and what you see when it crashes is the final form. Although the memory pointed to by x19 can be accessed in the end, is it possible that when ldr x0 [x19] The value of this memory is still 0x2e002e? The value of x0 is inconsistent with the final value of [x19] due to memory being trampled?
Looking back at the three lines of instructions before the crash:
0x0000000078432d60 <+48>: ldr x0, [x19] 0x0000000078432d64 <+52>: ldr x1, [x20,#1160] => 0x0000000078432d68 <+56>: ldr x2, [x0]
It has just been confirmed that the memory pointed to by x19 is normal when the final crash occurs, but the content of x0 is abnormal. If the memory is stepped on, you need to step on the memory pointed to by x19 at ldr x0 [x19], and restore it to normal when it crashes, so the first One hypothesis is unlikely.
3.3 Suspected reason two: uninitialized variable access
Guess the reason: the memory pointed to by x19 is a wild pointer (0x2e002e) at the beginning. This value is assigned to x0, but later (asynchronous thread) is correctly assigned, resulting in a crash. Finally, the memory layout pointed to by x19 is normal, but the memory stored in x0 It is the wild pointer address that triggers the crash.
3.3.1 Business source code analysis
For this assumption, you need to further check the source code. These three instructions have entered the while loop of the asl::LooperObserverMan::notifyIdle() function, that is, if the node is not empty, then whether there is a node that is not empty, but node->observer is The time gap of the wild pointer, just after entering the while(node), ldr x0 [x19] gave x0 the node->observer address that has not been initialized yet?
View the source code for node->observer assignment:
bool LooperObserverMan::addObserver(IMessageLooper::Observer * observer) { if(observer == NULL) return false; ... ObserverNode * new_node = new ObserverNode(); new_node->next = NULL; new_node->observer = observer; if(node == NULL) _observers = new_node; else node->next = new_node; return true; }
It can be seen that node has assigned its observer component in advance (new_node->observer = observer;) before being assigned.
3.3.2 Exclude uninitialized variable access
If notifyIdle() is called before addObserver, then ObserverNode * node = _observers; the initial value of _observers in NULL is initialized in the constructor of its class:
LooperObserverMan::LooperObserverMan() : _observers(NULL) { }
The source code of 3.3.1 shows that before _observers is assigned to new_node, new_node->observer has completed the assignment.
Therefore, when node is not empty, the assumption of load ing to x0 when the node->observer memory pointed to by x19 is not initialized is also not valid.
3.4 Preliminary analysis conclusion
In summary, it is more likely that the final x0 content does not meet expectations due to system-level stability issues.
For example, interrupt or process preemption causes the current task to be interrupted after ldr x0 [x19], and the x0 register value is not restored correctly when the recovery context returns to the current task to continue execution, resulting in a crash.
Of course, the current evidence for whether this is really the case is insufficient, and a dump of the whole machine is needed for further analysis.
3.5 The problem recurs, and the lightning strikes again?
3.5.1 Recurrence of the same crash stack
Another customer reported the same problem not long ago, and the crash call stack reported by the customer is the same. If it is a hardware or system-level problem, then it is being struck by lightning twice, and a system-level or hardware problem can basically be ruled out. We should look more at why this piece of (userland) code was hacked.
3.5.2 Rediscussion on the cause of the crash
Revisit the previous analysis. It is found that an important basis for us to eliminate the second doubt in 3.3 is that the initial value of the variable _observers is NULL, and the subsequent assignment order is:
new_node->observer = xxxx; _observers = new_node;
That is, when another thread reads, _observers is either NULL, or new_node whose member variable (new_node->observer) has been assigned a value.
The split is based on two reasons:
1) The pointer _observers assignment is atomic, and the reading thread either reads NULL or reads good _observers;
2) The assignment of new_node->observer is performed before the assignment of _observers.
3.5.3 Discussion on atomicity of pointer assignment
Everyone was divided on this.
One point of view is that the assignment of basic types such as pointers and ints is not atomic, otherwise why does C++ use std::atomic to ensure the atomicity of reading and writing of basic types.
Another point of view is that: operations in the same cacheline are atomic (there is a related expression in the inter manual, but the arm one has not been found), and the pointer in this example has no special alignment restrictions, so the address is cacheline size(64bit system is 8-byte) aligned and thus atomic.
3.5.4 Discussion of assignment order
Look at these two assignment statements again:
new_node->observer = xxxx; _observers = new_node;
It is found that these two assignments are not dependent, that is, the result remains unchanged after the order is exchanged. Then there is the possibility of being reordered by the compiler and the CPU, but there is no memory barrier set here to ensure the memory order.
Therefore, there is such a possibility: due to the existence of the reoder, the writing thread first executes _observers = new_node, and at the same time, the reading thread judges the empty logic hit, and loads the uninitialized _observers->observer into the register x0, After that, the writing thread completes the assignment of _observers->observer, and the reading thread goes to the access of x0 memory, and a crash occurs.
3.6 show me the code, demo verification
3.6.1 demo structure
First copy the addObserver code intact from the base library:
bool LooperObserverMan::addObserver(Observer * observer) { if(observer == NULL) return false; ObserverNode * node = _observers; while(node) { if(node->observer == observer) return false; if(node->next == NULL) break; node = node->next; } ObserverNode * new_node = new ObserverNode(); new_node->next = NULL; new_node->observer = observer; if(node == NULL) _observers = new_node; else node->next = new_node; return true; }
Then slightly modify the notifyIdle function called by the reading thread, and remove the deeper call implementation to facilitate debug ging:
bool LooperObserverMan::notifyIdle(Observer * observer) { ObserverNode * node = _observers; while(node) { if (observer != node->observer) { std::cout << "error: observer not match!!!" << std::endl; std::cout << "observer: " << observer << ", node->observer: " << node->observer << std::endl; } node->observer->onLooperIdle(); node = node->next; return true; } return false; }
In the constructor of LooperObserverMan, the initial value of the member variable _observers is guaranteed to be NULL:
LooperObserverMan::LooperObserverMan() : _observers(NULL) { }
The contents of the header file are as follows:
#include <iostream> class Observer { public: virtual ~Observer() {} virtual void onLooperIdle() { std::cout << "onLooperIdle()" << std::endl; }; }; class LooperObserverMan { public: struct ObserverNode { Observer * observer; ObserverNode * next; }; LooperObserverMan(); ~LooperObserverMan(); bool addObserver(Observer * observer); bool notifyIdle(Observer * observer); private: ObserverNode * _observers; };
Do the following test in the main function to construct a scenario similar to that in HD SDK that only add s one observer:
#include <thread> #include "LooperObserverMan.h" int main() { Observer ob; LooperObserverMan* looper = new LooperObserverMan(); std::thread t = std::thread([&]() { looper->addObserver(&ob); }); while (1) { if (looper->notifyIdle(&ob)) { break; } } t.join(); delete looper; return 0; }
Here we start a thread to call addObserver, pass in the address of the variable Observer ob as an actual parameter, and the main thread calls the notifyIdle() interface. In the implementation of notifyIdle(), it will judge that the node is empty and return false, and the node is not If it is empty, compare the value of node->observer and call the node->observer->onLooperIdle() interface. As long as notifyIdle() returns true once, the main function will end. The input parameter of notifyIdle() is also the address of the variable Observer ob. Under normal memory order, if node is not empty, then node->observer has completed the assignment, and its value should be equal to the address of the variable Observer ob. When an exception occurs, the relevant error log will be printed out.
Use scripts for stress testing, simulate the scenario where only one observer is added each time, and start the test process test_reorder repeatedly. The shell script is as follows,
num=0; while true; do sleep 1; date; ./test_reorder; num=`expr $num + 1`; echo $num; done
3.6.2 Pressure test results
In the customer environment, 217,258 pressure tests were performed, and 10 error logs appeared, as shown below,
Sun Feb 15 09:20:29 GMT 1970
error: observer not match!!!
observer: 100c7878, node->observer: 100c7878
onLooperIdle()
191229
It shows that there is a probability of 0.5 out of 10,000. When the node value is not empty, node->observer != &ob. But when the log is printed, the value of node->observer is equal to the address of the variable ob.
3.6.3 Analysis of demo pressure test results
From the results, the second basis we extracted in 3.5.2 was overturned. In practice, when node is not empty, the out-of-order instructions may cause node->observer to be unassigned.
Instruction disorder is divided into two levels: hardware and software. We focus on the software level, that is, compiler optimization. As analyzed in 3.5.4, the assignments of node and node->observer do not depend on each other. Therefore, if the instruction out-of-order optimization condition is met, we only need to check the assembly to see whether the compilation optimization has been performed. View the assembly of the demo code as follows.
In order to reduce the cost of looking at the assembly code, we can directly look at the decompiled code generated by the reverse tool based on the assembly, as shown in the window on the right of the figure above.
Among them, pOVar1 = this->observers; that is, assign the member variable _observers of LooperObserverMan to pOVar1, because we only stress test the scenario of inserting the first node, so we only need to pay attention to the branch where pOVar1 is empty, namely:
pOVar1 = (ObserverNode *)operator.new(0x10); // new_node = new ObserverNode(); this->_observers = pOVar1; // _observers = new_node; pOVar1->observer = observer; // new_node->observer = observer; pOVar1->next = (ObserverNode *)0x0; // new_node->next = NULL;
It is found that the memory address allocated by operator.new is assigned to pOVar1, and ObserverNode * new_node = new ObserverNode() corresponding to the source code; but here, the address assigned by new is assigned to pOVar1, and then pOVar1 is assigned to the member variable _observers , that is, this->_observers = pOVar, and then assign a value to the component pOVar1->observer. Compare source code:
bool LooperObserverMan::addObserver(Observer * observer) { ... ObserverNode * new_node = new ObserverNode(); new_node->next = NULL; new_node->observer = observer; if(node == NULL) _observers = new_node; else node->next = new_node; ... }
It can be seen that the things done by _observers = new_node are advanced before new_node->observer = observer, indicating that the reorder has indeed been carried out! Then when the reading thread judges that _observers is not empty and immediately uses _observers->observer, there is a situation where _observers->observer has not been initialized, resulting in a crash.
3.6.4 Comparison of compilation results on other platforms
Compile executable programs for other platforms with the same code, and compare the assembly content.
Android platform
It is found that the compilation result of the android platform does not reorder the _observers and _observers->observer assignments (just reorder the two assignment statements new_node->next and new_node->observer), and a few lines of core decompiled code as follows:
ppOVar3 = (Observer **)operator_new(8); // new_node = new ObserverNode(); *ppOVar3 = param_1; // new_node->observer = observer; ppOVar3[1] = (Observer *)0x0; // new_node->next = NULL; if (bVar1) { *(Observer ***)this = ppOVar3; // _observers = new_node; }
Assign the memory address allocated by new to the variable ppOVar3, *ppOVar3 means the first member observer of struct ObserverNode, so *ppOVar3 = param_1 means assign the input parameter &ob to ppOvar3->observer; then ppOVar3[1] means the first member of struct ObserverNode Two member next pointers, ppOVar3[1] = (Observer*)0x0, means ppOVar3->next = NULL. So the variable ppOVar3 is the new_node variable in the source code of addObserver. After that, *(Observer ***)this = ppOVar3 corresponds to assigning the member variable _observers to ppOVar3. Therefore, the assignment order of the android platform is not optimized.
Mac platform
The mac platform is also not optimized. The decompiled variable pauVar3 is the new_node variable in the source code.
Remark:
1) Different compilation options have different results even on the same platform, such as -O3 and -O0
2) struct ObserverNode is defined as follows:
struct ObserverNode { Observer * observer; ObserverNode * next; };
3.6.5 Adding memory barriers
Since the compiler performs reorder optimization, we can use memory barriers to prohibit compiler-related optimizations. We can insert a line of assembly __asm__ __volatile__("":::"memory") representing memory barriers into the addObserver code for testing:
bool LooperObserverMan::addObserver(Observer * observer) { ... ObserverNode * new_node = new ObserverNode(); new_node->next = NULL; new_node->observer = observer; __asm__ __volatile__("":::"memory"); // insert memory barrier if(node == NULL) _observers = new_node; else node->next = new_node; return true; }
Check out the compilation of the compiled result after adding the memory barrier:
pOVar1 = (ObserverNode *)operator.new(0x10); pOVar1->observer = observer; pOVar1->next = (ObserverNode *)0x0; if (pOVar2 == (ObserverNode *)0x0) { this->_observers = pOVar1; }
It can be seen that after adding the memory barrier, the compiler no longer performs related optimizations. The memory allocated by new is assigned to pOVar1, and this->_observers will be assigned to pOVar1 after pOVar1->observer completes the assignment. The order of assignment is guaranteed.
Four, the truth comes to light, the final conclusion
At this point, the truth finally came to light. The direct cause of the crash is illegal memory access, and the illegal memory is the component of the structure variable node: node->observer. There are two threads to read and write the variable respectively, and the read thread uses the memory of the node->observer component after emptying the node. Its internal logic believes that node->observer must be legal when node is not empty; and in the thread writing code, after allocating memory to the temporary variable new_node, assign a value to its component new_node->observer, and then assign new_node to node, that is, new_node->observer = xxx; node = new_node; I want to use this design to ensure that the reading thread reads a legal node->observer when it judges that the node is not empty. But in fact, the compilation results of the qnx platform pointed out that the compiler has optimized the memory order here, adjusted the order of the two assignment statements, and broke the above assumption. When the reading thread judges that the node is not empty and calls the node->observer->onLooperIdle() interface, it crashes because the node->observer variable has not been initialized.
One-sentence summary: The reorder optimization of the compiler leads to a change in the order of instructions, which in turn causes the asynchronous read thread to use uninitialized variables to trigger a crash.
Optimization:
Solution 1: Add a memory barrier in the basic library addObserver.
Solution 2: The TimerCtrl encapsulated by the business binds the addObserver operation to the thread of the message queue callback function (notifyIdle) to avoid asynchronous reading and writing.
5. Review of Knowledge Points
In this analysis of the crash problem, a lot of knowledge learned in previous books was used. For example, when we look at the memory of the virtual table, it is actually the knowledge related to the C++ polymorphism implementation mechanism and class memory layout. These knowledge points allow us to see the interior of the code more accurately, and also help us confirm some inferences.
5.1 C++ polymorphism implementation & class memory layout
5.1.1 C++ virtual function polymorphism principle
The polymorphism mentioned here refers specifically to the dynamic polymorphism and virtual functions of C++. The polymorphic implementation of virtual functions is inseparable from the virtual function table (hereinafter referred to as the virtual table). The virtual table does not belong to the object of the class. It belongs to the entire class. It is a global variable and a table generated in the data segment during compilation. Inside the table are the function pointers of each virtual function, and these pointers point to the code segment of each function.
When the class object is constructed, the compiler generates a vptr pointer to point to the virtual table (all objects of the same class point to the globally unique virtual table). The content of the virtual table is the function pointer of each virtual function. The subclass will copy a virtual table and replace its override interface with a pointer to the function after override. This is the key to polymorphic implementation. When we take a Base pointer to a subclass object:
Base *p = new Driver();
new Driver() constructs a subclass object, so the generated vptr points to the virtual table of the subclass, so that when the pointer p is used to call the function of the subclass override, the function pointer after the override can be found from the virtual table.
5.1.2 Reasons why polymorphism must use pointers or references
When we use C++ polymorphism, we usually use the parent class pointer to point to the subclass object, or the parent class reference (Base&) the subclass object, but direct object assignment cannot call the subclass method, for example:
Base b; Driver d; b = static_cast<Base>(d);
The reason is that the vptr pointer will not be copied during this forced assignment, so after the assignment, the vptr in object b still points to the virtual table of the Base class, so the subclass method cannot be called, that is, the effect of polymorphism cannot be achieved.
There are many related materials about C++ polymorphic implementation, so I won't repeat them here.
5.1.3 Compilation optimization of subclass virtual table
When analyzing the problem this time, we check the virtual table content of the observer class as follows:
(gdb) x /16a 0x79989f40 0x79989f40: 0x7990c9e0 0x7990c9f0 0x79989f50: 0x78411698 <asl::IMessageLooper::Observer::onLooperStart(asl::IMessageLooper*, int, int)> 0x79909598 0x79989f60: 0x799097d8 0x799099d0 0x79989f70: 0x784116b8 <asl::IMessageLooper::Observer::onLooperBusy(asl::IMessageLooper*)> 0x79909bd8 0x79989f80: 0x784116c8 <asl::IMessageLooper::Observer::onLooperQuit(asl::IMessageLooper*)> 0x784116d0 <asl::IMessageLooper::Observer::onLooperDestroy(asl::IMessageLooper*)> 0x79989f90: 0x784116d8 <asl::IMessageLooper::Observer::onLooperCancelMsg(asl::IMessageLooper*, asl::Message*, unsigned long, unsigned long)> 0x7990c988 0x79989fa0: 0x7990c990 0x7990c998 0x79989fb0: 0x7990c9a0 0x7990c9a8
But looking at the source code of class Observer, there are more than 5 virtual functions shown in the memory of the above virtual table:
class Observer { public: virtual ~Observer() {} virtual void onLooperStart(IMessageLooper * looper, int queue_size, int delay_queue_size) {}; virtual void onLooperPostMsg(IMessageLooper * looper, Message * msg, uint32_t delay) {}; virtual void onLooperStartMsg(IMessageLooper * looper, Message * msg, uint64_t timing, uint64_t now) {}; virtual void onLooperEndMsg(IMessageLooper * looper, Message * msg, uint64_t timing, uint64_t now, uint32_t duration) {}; virtual void onLooperBusy(IMessageLooper * looper) {}; virtual void onLooperIdle(IMessageLooper * looper, int delay_queue_size) {}; virtual void onLooperQuit(IMessageLooper * looper) {}; virtual void onLooperDestroy(IMessageLooper * looper) {}; virtual void onLooperCancelMsg(IMessageLooper * looper, Message * msg, uint64_t timing, uint64_t now) {} };
In fact, the node->observer we printed points to the classTimerMessageObserver object, which is a subclass of asl::IMessageLooper::Observer, and several function pointers displayed in the virtual table are functions that this subclass has no override. There may be a compiler optimization here. This can be seen from the assembly of the notifyIdle function.
0x0000000078432d40 <+16>: cbz x19, 0x78432d8c <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+92> 0x0000000078432d44 <+20>: mov x22, x1 0x0000000078432d48 <+24>: mov w21, w2 0x0000000078432d4c <+28>: adrp x20, 0x786b0000 0x0000000078432d50 <+32>: b 0x78432d60 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+48> 0x0000000078432d54 <+36>: nop 0x0000000078432d58 <+40>: ldr x19, [x19,#8] 0x0000000078432d5c <+44>: cbz x19, 0x78432d8c <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+92> 0x0000000078432d60 <+48>: ldr x0, [x19] 0x0000000078432d64 <+52>: ldr x1, [x20,#1160] => 0x0000000078432d68 <+56>: ldr x2, [x0] 0x0000000078432d6c <+60>: ldr x3, [x2,#56] 0x0000000078432d70 <+64>: cmp x3, x1 0x0000000078432d74 <+68>: b.eq 0x78432d58 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+40> 0x0000000078432d78 <+72>: mov w2, w21 0x0000000078432d7c <+76>: mov x1, x22 0x0000000078432d80 <+80>: blr x3 0x0000000078432d84 <+84>: ldr x19, [x19,#8] 0x0000000078432d88 <+88>: cbnz x19, 0x78432d60 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+48> 0x0000000078432d8c <+92>: ldp x21, x22, [sp,#16] 0x0000000078432d90 <+96>: ldr x30, [sp,#32] 0x0000000078432d94 <+100>: ldp x19, x20, [sp],#48 0x0000000078432d98 <+104>: ret
cbz x19, 0x78432d8c in the line 0x0000000078432d40 <+16> means jump to 0x78432d8c if x19 is empty, and x19 is the address of node, that is, while(node) judges that node is empty and jumps to 0x0000000078432d8c <+92> line, This line actually pops the backup register in the function stack, and then returns, that is, the while ends and the function returns.
adrpx20, 0x786b0000 means take the address of the 4KB memory page header where 0x786b0000 is located and store it in x20, then jump to 0x0000000078432d60 <+48>, notice that the content of x1 is as follows:
(gdb) i register x1 x1 0x784116c0 2017531584 (gdb) x 0x784116c0 0x784116c0 <asl::IMessageLooper::Observer::onLooperIdle(asl::IMessageLooper*, int)>: 0xd503201fd65f03c0 (gdb) info symbol 0x784116c0 asl::IMessageLooper::Observer::onLooperIdle(asl::IMessageLooper*, int) in section .text of libbase_utils.so
That is, x1 is the Observer::onLooperIdle function pointer found through x20, but this function is the symbol of libbase_utils.so, that is, the virtual function pointer of the parent class (the subclass classTimerMessageObserver is defined in libGAdasUtils.so).
0x0000000078432d68 <+56>: ldr x2, [x0]
Here, the this pointer of the Observer is actually obtained, that is, the this pointer of the subclass object, which points to the virtual table of the subclass:
(gdb) i register x19 x19 0x17bb988e0 6370724064 (gdb) x 0x17bb988e0 0x17bb988e0: 0x7998e758 (gdb) x 0x7998e758 0x7998e758: 0x79989f40 // vtable address (gdb) p *node->observer $6 = {_vptr.Observer = 0x79989f40}
After that, 0x0000000078432d6c <+60>: ldrx3, [x2, #56] means the this pointer is offset by 56 bytes and the content is stored in x3. The virtual table address offset by 56 bytes is 0x79989f78:
(gdb) x /16a 0x79989f40 0x79989f40: 0x7990c9e0 0x7990c9f0 0x79989f50: 0x78411698 <asl::IMessageLooper::Observer::onLooperStart(asl::IMessageLooper*, int, int)> 0x79909598 0x79989f60: 0x799097d8 0x799099d0 0x79989f70: 0x784116b8 <asl::IMessageLooper::Observer::onLooperBusy(asl::IMessageLooper*)> 0x79909bd8 0x79989f80: 0x784116c8 <asl::IMessageLooper::Observer::onLooperQuit(asl::IMessageLooper*)> 0x784116d0 <asl::IMessageLooper::Observer::onLooperDestroy(asl::IMessageLooper*)> 0x79989f90: 0x784116d8 <asl::IMessageLooper::Observer::onLooperCancelMsg(asl::IMessageLooper*, asl::Message*, unsigned long, unsigned long)> 0x7990c988 0x79989fa0: 0x7990c990 0x7990c998 0x79989fb0: 0x7990c9a0 0x7990c9a8
Although the virtual table does not print out the function pointer corresponding to this address, it can be confirmed that it is the virtual function declared after the function onLooperBusy, that is, onLooperIdle() corresponds to the source code of notifyIdle. After that, compare cmp x3 x1 in the assembly code. When x3 and x1 are equal, jump to b.eq0x78432d58 <asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int)+40>, and <+40> line ldrx19, [x19,#8], which is equivalent to directly fetching node->next without executing any functions. Obviously, the observer here points to the subclass object, so this cmp The instruction result is false, it will not jump, it will continue to execute to 0x0000000078432d80 <+80>:blrx3, and jump to the function pointer pointed to by x3, which will be executed after the function is executed 0x0000000078432d84 <+84>:ldrx19, [x19, #8], that is, node = node->next to continue the cycle.
5.1.4 Distribution of function pointers in the virtual table
The function pointers in the virtual table are arranged in the order of virtual function declarations, but there is a small question here. According to the order of virtual function declarations, including the destructor onLooperIdle(), which is the seventh declared virtual function, it should be a virtual table Offset 6*8 = 48 bytes is correct, why is there a difference here? Let's find a demo without compilation optimization to see the memory layout of the virtual table:
(gdb) p *pa $1 = {_vptr.A = 0x400d30 <vtable for A+16>} (gdb) x /16a 0x400d30 0x400d30 <_ZTV1A+16>: 0x400ab6 <A::~A()> 0x400ae4 <A::~A()> 0x400d40 <_ZTV1A+32>: 0x400b0a <A::func1()> 0x400b34 <A::func2()> 0x400d50 <_ZTV1A+48>: 0x400b5e <A::func3(int, int)> 0x4231
You can see that there are two destructors A::~A() in this virtual table, this is because gcc implements two virtual destructors (msvc only has one). Many compilers generate two different destructors for a class: one for destroying dynamically allocated objects, and one for destroying non-dynamic objects (static objects, local objects, base subobjects, or member subobjects, called complete object destructor). The former calls operator delete internally, the latter does not. Some compilers do this by adding hidden parameters to one destructor (older versions of GCC do this, msvc++ does this), some compilers just generate two separate destructors (newer versions GCC does this).
So far, the multi-offset 8 bytes is reasonable.
5.2 Understand pc pointer and chip exception handling
During the discussion in this problem group, some students raised a question: the pc pointer is a program counter, which points to the next instruction to be executed, while the arm instruction is a three-stage pipeline, and the pc points to only the "instruction fetching" instruction , is not pointing to the "executing" or "decoding" instruction, so the crash is not the position of pc after decompilation, but pc - 4 or pc -8?
Although we can confirm that there will be no segmentation fault crash at pc-8 and pc-4 by printing registers and related memory contents this time, this question still caught me all at once. Before analyzing the kernel dump or the user mode process dump, the default call stack pc pointer is the place where the crash occurred. I really didn't think about this problem seriously. All of a sudden, I doubted my life. Is there any problem with the previous dump analysis? Should look at the code before the pc? But this is inconsistent with historical experience. Isn't the pc seen when using gdb single-step debugging not the code being executed? Then why can I see the memory assignment result after the pc passes through a line of assignment statement?
Will pc point to an instruction that has not been executed during normal execution, but will it be handled differently when an exception occurs? The answer is given in the ARM development documentation.
The ELR_ELn register is used to store the return address from an exception. The value in this register is automatically written on entry to an exception and is written to the PC as one of the effects of executing the ERET instruction that is used to return from exceptions.
When an exception occurs, the ELR_ELn register will store the address of the instruction executed after the exception returns, and fill it into the PC when the exception returns.
ELR_ELn contains the return address, which depends upon the specific exception type. Typically, this is the address of the instruction after the one that generated the exception.
For example, when an SVC (system call) instruction is executed, you want to return to the following instruction in the application. In other cases, however, you might want to re-execute the instruction that generated the exception.
But whether to store the instruction that triggered the exception or the next instruction to be executed is determined by the exception type.
Usually there are the following rules:
-
For an asynchronous exception, it is the next instruction when the interrupt occurred, or the first instruction that was not executed;
-
For a synchronous exception that is not a system call, it is the instruction that triggered the synchronous exception;
-
For system call, it is the next instruction of svc instruction.
For synchronous exceptions and asynchronous exceptions, please refer to "ARM Exception Handling". Common synchronous exceptions are:
-
Attempt to access a register with an inappropriate exception level;
-
Attempt to execute a closed or undefined (UNDEFINED) instruction;
-
Use unaligned SP;
-
Attempt to execute an instruction whose PC is not aligned;
-
Exceptions generated by software, such as executing system call (SVC), HVC or SMC instructions;
-
Data anomalies caused by address translation or permissions, etc.;
-
Command exceptions caused by address translation or permissions;
-
Exceptions caused by debugging, such as breakpoint exceptions, watchpoint exceptions, software single-step exceptions, etc.;
Our common segfault is actually "data exception caused by address translation or permissions", which is a kind of synchronous exception of data suspension, similar to page fault interrupt, the difference is that page fault interrupt will be repaired in the interrupt processing function This address, that is, the so-called on-demand page allocation makes this address available, so when this type of exception returns, the pc will point to the instruction that triggered the exception, re-execute the relevant instruction or exit.
Therefore, when we analyze the segment fault, we directly look at the code of pc in frame 0, which is where the problem is triggered. Similarly, the pc seen by the bt command during gdb single-step debugging is also the command executed before the program pauses.
More reference: How to use ARM's data-abort exception
https://www.embedded.com/how-to-use-arms-data-abort-exception/
5.3 Memory disorder and memory barrier
The essence of this problem is actually a memory out-of-order compilation optimization problem. Our assignment statement does not forcibly prohibit compiler optimization, so the compiler can give priority to performance and do some reorder optimizations under the premise of satisfying the rules. The demo code above is actually the classic store-store disorder. There are many articles on related knowledge that are well written, so I won’t repeat them here.
6. Summary
-
This article replays the analysis process of a crash case in detail.
-
After reviewing the knowledge about C++ polymorphism and class memory layout, checking the memory after understanding the principle allows us to see more details inside the code.
-
Reviewed the meaning of the pc pointer and learned more about the arm exception handling mechanism, explaining the principles behind some conclusions that are taken for granted every day.
-
Reviewed the relevant knowledge of memory barriers, and constructed a demo to verify the theoretical analysis in practice.
6.1 Inspiration
This case is very classic and has certain inspiration for our subsequent analysis and coding design.
Analyzing the problem
-
The assembly code is an enlarged version of the high-level language source code. When you can't see the problem at the high-level language level, you might as well try to view the assembly, because it is closer to the "source code" executed by the machine and has a higher "resolution".
coding design
-
Lock-free design code, especially the code we "carefully" design depends on the order of assignment, don't forget the existence of memory order optimization.
-
In addition to the coding part, the coding design must also be in harmony with the compiler, clarify the behavior of the compiler, ensure that the final compiled product meets the design expectations, and avoid the "free play" of the compiler.
6.2 Comprehension
"Learning while learning from time to time, it's not easy to say." There are two explanations, one is that it is very happy to review frequently after learning. I prefer another explanation: after learning, practice and use at the right time is a joy. The emphasis is on applying what you have learned. What's so fun about studying? The real fun is in being able to apply what you learn from the book in practice.