foreword
Whether the code style is good or not is the same as whether the writing is good or not. If a company recruits a secretary, it must not have ugly writing. Similarly, programmers with poor coding style must be incompetent. Although the editor will not be picky about ugly code, it can still be compiled and passed, but other programmers on the same Team as you will definitely not be able to stand it, and you will not be able to stand it either. When you look at the code after a few days, you don’t even know what you wrote. what is it. The code is mainly written for people to read, not for machines, but it can also be executed by machines by the way. If it is written for machines to read, then just write machine instructions directly, and there is no need to use high-level languages. Code, like language, is to express ideas and record information, so it must be written clearly and neatly to express effectively. Because of this, in a software project, the code style is generally stipulated by documents. All people involved in the project must follow a unified style regardless of their original style. For example, the Linux kernel is such a document. In this chapter, we use the code style of the kernel as the basis to explain what are the regulations for a good coding style, and what is the Rationale of these regulations. I just took the Linux kernel as an example to explain the concept of coding style. I did not say that the kernel coding style must be the best coding style, but the success of the Linux kernel project is enough to show that its coding style is the best C language coding One of the styles.
indentation and whitespace
We know that the grammar of the C language has no requirements for indentation and blanks. Spaces, tabs, and newlines can be written at will. The code that achieves the same function can also be written beautifully or ugly. For example, the rock-paper-scissors code written in the previous chapter would be ugly if written like this:
Code with missing indentation and whitespace
#include <stdio.h> #include <stdlib.h> #include <time.h> int main(void) { char gesture[3][10]={"scissor","stone","cloth"}; int man,computer,result, ret; srand(time(NULL)); while(1){ computer=rand()%3; printf("\nInput your gesture (0-scissor 1-stone 2- cloth):\n"); ret=scanf("%d",&man); if(ret!=1||man<0||man>2){ printf("Invalid input! Please input 0, 1 or 2.\n"); continue; } printf("Your gesture: %s\tComputer's gesture: %s\n",gesture[man],gesture[computer]); result=(man-computer+4)%3-1; if(result>0)printf("You win!\n"); else if(result==0)printf("Draw!\n"); else printf("You lose!\n"); } return 0; }
One is the lack of blank characters, the code density is too high, and it is difficult to look at. The second is that there is no indentation, so you can’t tell which one is paired with which one. Such a short code can still be read. If the code exceeds one screen, it will be impossible to read at all. There are no special regulations on blank characters, because basically all C code styles have similar regulations on blank characters, mainly as follows.
1. The keywords if,while,for and the following control expressions (insert a space between the brackets to separate, but the expressions inside the brackets should be close to the brackets), for example:
while (1);
2. Insert a space to separate the two sides of the binary operator, and do not add a space between the unary operator and the operand, for example:
i = i + 1,++i,!(i < 1),-x
3. There is no space between the suffix operator and the operand, for example, take the structure member s.a, function call foo(arg1), and take the array member a[i]
4. Spaces should be added after the sign and ; sign, which is the writing habit in English, for example:
for (i = 1; i<10; i++),foo(arg1, arg2)
5. The above rules about binocular operators and suffix operators are not strictly required, and sometimes they can be written more compactly in order to highlight the priority, for example:
for (i=1; i<10; i++), distance = sqrt(x*x + y*y)
But the omitted spaces must not mislead people who read the code, for example:
a||b && c
It's easy for people to interpret it as the wrong priority.
6. Since the standard character terminal of the UNIX system is 24 lines and 80 columns, longer statements that are close to or greater than 80 characters should be written in line breaks. After line breaks, use spaces to align with the above expressions or parameters, for example:
if (sqrt(x*x + y*y) > 5.0 && x < 0.0 && y > 0.0)
Another example:
foo(sqrt(x*x + y*y), a[i-1] + b[i-1] + c[i-1])
7. Long strings can be broken into multiple strings and written in separate lines, for example:
printf("This is such a long sentence that " "it cannot be held within a line\n");
The C compiler will automatically connect multiple adjacent strings together, and the above two strings are equivalent to one string.
8. Some people like to use the Tab character in the variable definition statement to align the variable names, which looks very beautiful.
→int →a, b; →double →c;
The kernel code style has the following rules about indentation.
1. Indentation should be used to reflect the hierarchical relationship of the statement block. Tab characters should be used for indentation. Spaces cannot be used instead of Tab. On a standard character terminal, a Tab looks like a width of 8 spaces. If your text editor can set the display width of a Tab to a few spaces, it is recommended to set it to 8. Such a large indentation makes the code look very clear. If some lines are indented with spaces, some lines are indented with tabs, or even spaces and tabs are mixed, then once the Tab display width of the text editor is changed, it will look very confusing, so the kernel code style can only Use tabs for indentation, not spaces instead of tabs.
2.if/else,while,do/while,for,switch, these statements can have a statement block, the { or } of the statement block should be written on the same line as the keyword, separated by spaces, instead of occupying a single line. For example it should be written like this:
if (...) { →statement list } else if (...) { →statement list }
But many people are used to writing like this:
if (...) { →statement list } else if (...) { →statement list }
The advantage of writing the kernel is that it does not need to occupy too many lines, so that more codes can be displayed on one screen. These two writing methods are widely used, as long as they can be unified in the same project.
3. The {} of the function definition occupies a single line, which is different from the statement block regulations, for example:
int foo(int a, int b) { →statement list }
4. The switch and the case and default in the statement block are aligned, that is to say, the case and default labels in the statement block are not indented relative to the switch, but the statements under the label should be indented, for example:
→switch (c) { →case 'A': → →statement list →case 'B': → →statement list →default: → →statement list →}
The custom label used for the goto statement should be written without indentation at the top, regardless of the indentation level of the statement under the label.
5. Each logical paragraph in the code should be separated by a blank line. For example, a blank line should be inserted between each function definition, and a blank line should also be inserted between header files, global variable definitions and function definitions, for example:
#include <stdio.h> #include <stdlib.h> int g; double h; int foo(void) { →statement list } int bar(int a) { →statement list } int main(void) { →statement list }
6. If the statement list of a function is very long, it can also be divided into several groups according to the correlation, separated by blank lines. This rule is not strictly required. Usually, variable definitions are grouped together, followed by a blank line, and a blank line is added before the return statement, for example:
int main(void) { →int →a, b; →double →c; →Statement group 1 →Statement group 2 →return 0; }
note
Single-line comments should be in the form of /*comment*/, with spaces separating delimiters and text. The most common form of multi-line comments is this:
/* * Multi-line * comment */
There is also a more fancy form:
/*************\ * Multi-line * * comment * \*************/
There are several occasions to use annotations:
1. A comment at the top of the entire source file. Describe the relevant information of this module, such as file name, author and version history, etc., without indentation. For example, the beginning of the kernel/sched.c file in the kernel source code directory:
/* * kernel/sched.c * * Kernel scheduler and related syscalls * * Copyright (C) 1991-2002 Linus Torvalds * * 1996-12-23 Modified by Dave Grothe to fix bugs in semaphores and * make semaphores SMP safe * 1998-11-19 Implemented schedule_timeout() and related stuff * by Andrea Arcangeli * 2002-01-04 New ultra-scalable O(1) scheduler by Ingo Molnar: * hybrid priority-list and round-robin design with * an array-switch method of distributing timeslices * and per-CPU runqueues. Cleanups and useful suggestions * by Davide Libenzi, preemptible kernel bits by Robert Love. * 2003-09-03 Interactivity tuning by Con Kolivas. * 2004-04-02 Scheduler domains code by Nick Piggin */
2. Function comments. Explain the function, parameters, return value, error code, etc. of this function, and write it on the upper side of the function definition. There should be no blank lines between this function definition and no indentation at the top.
3. Relatively independent statement group comments. Make special instructions for this group of statements, write on the upper side of the statement group, leave no blank line between this statement group, and keep the same indentation as the current statement group.
4. Short comments to the right of the line of code. Make a special note on the current line of code, usually a single-line comment, separated from the code by at least one space, and all right-hand comments in a source file should preferably be aligned up and down. Although comments can be interspersed between lines of code, it is not recommended to do so. A function in the lib/radix-tree.c file in the kernel source code directory contains the above three annotations:
/** * radix_tree_insert - insert into a radix tree * @root: radix tree root * @index: index key * @item: item to insert * * Insert an item into the radix tree at position @index. */ int radix_tree_insert(struct radix_tree_root *root, unsigned long index, void *item) { struct radix_tree_node *node = NULL, *slot; unsigned int height, shift; int offset; int error; /* Make sure the tree is high enough. */ if ((!index && !root->rnode) || index > radix_tree_maxindex(root->height)) { error = radix_tree_extend(root, index); if (error) return error; } slot = root->rnode; height = root->height; shift = (height-1) * RADIX_TREE_MAP_SHIFT; offset = 0; /* uninitialised var warning */ do { if (slot == NULL) { /* Have to add a child node. */ if (!(slot = radix_tree_node_alloc(root))) return -ENOMEM; if (node) { node->slots[offset] = slot; node->count++; } else root->rnode = slot; } /* Go a level down */ offset = (index >> shift) & RADIX_TREE_MAP_MASK; node = slot; slot = node->slots[offset]; shift -= RADIX_TREE_MAP_SHIFT; height--; } while (height > 0); if (slot != NULL) return -EEXIST; BUG_ON(!node); node->count++; node->slots[offset] = item; BUG_ON(tag_get(node, 0, offset)); BUG_ON(tag_get(node, 1, offset)); return 0; }
In particular, it is pointed out that comments in functions should be used as little as possible. Writing comments is mainly to explain what your code can do (such as function interface definition), not to explain "how to do it". As long as the code is written clearly enough, "how to do it" is clear at a glance. If you need to use comments In order to explain clearly, it means that your code is not very readable, unless it is a special place that needs to be reminded, use in-function comments.
5. Complex structure definitions require more comments than functions. For example, such a structure is defined in the kernel/sched.c file in the kernel source code directory:
/* * This is the main, per-CPU runqueue data structure. * * Locking rule: those places that want to lock multiple runqueues * (such as the load balancing or the thread migration code), lock * acquire operations must be ordered by ascending &runqueue. */ struct runqueue { spinlock_t lock; /* * nr_running and cpu_load should be in the same cacheline because * remote CPUs use both these fields when doing load calculation. */ unsigned long nr_running; #ifdef CONFIG_SMP unsigned long cpu_load[3]; #endif unsigned long long nr_switches; /* * This is part of a global counter where only the total sum * over all CPUs matters. A task can increase this counter on * one CPU and if it got migrated afterwards it may decrease * it on another CPU. Always updated under the runqueue lock: */ unsigned long nr_uninterruptible; unsigned long expired_timestamp; unsigned long long timestamp_last_tick; task_t *curr, *idle; struct mm_struct *prev_mm; prio_array_t *active, *expired, arrays[2]; int best_expired_prio; atomic_t nr_iowait; #ifdef CONFIG_SMP struct sched_domain *sd; /* For active balancing */ int active_balance; int push_cpu; task_t *migration_thread; struct list_head migration_queue; int cpu; #endif #ifdef CONFIG_SCHEDSTATS /* latency stats */ struct sched_info rq_sched_info; /* sys_sched_yield() stats */ unsigned long yld_exp_empty; unsigned long yld_act_empty; unsigned long yld_both_empty; unsigned long yld_cnt; /* schedule() stats */ unsigned long sched_switch; unsigned long sched_cnt; unsigned long sched_goidle; /* try_to_wake_up() stats */ unsigned long ttwu_cnt; unsigned long ttwu_local; #endif }
6. Complicated macro definitions and variable declarations also need comments, such as the definitions in the include/linux/jiffies.h file in the kernel source code directory:
/* TICK_USEC_TO_NSEC is the time between ticks in nsec assuming real ACTHZ and */ /* a value TUSEC for TICK_USEC (can be set bij adjtimex) */ #define TICK_USEC_TO_NSEC(TUSEC) (SH_DIV (TUSEC * USER_HZ * 1000, ACTHZ, 8)) /* some arch's have a small-data section that can be accessed register-relative * but that can only take up to, say, 4-byte variables. jiffies being part of * an 8-byte variable may not be correctly accessed unless we force the issue */ #define __jiffy_data __attribute__((section(".data"))) /* * The 64-bit value is not volatile - you MUST NOT read it * without sampling the sequence number in xtime_lock. * get_jiffies_64() will do this for you as appropriate. */ extern u64 __jiffy_data jiffies_64; extern unsigned long volatile __jiffy_data jiffies;
identifier naming
Identifier naming should follow the following principles:
1. Identifier symbols should be clear and clear, and complete words and easy-to-understand abbreviations can be used. Short words can be abbreviated by removing vowels, and longer words can be abbreviated by taking the first few letters of the word. After reading other people’s codes, you can summarize some abbreviation conventions. For example, count is written as cnt,block is written as blk,length is written as len, and the root word trans is often abbreviated as x. For example, transmit is written as xmt. I will not give more examples. Readers are invited to Pay attention to the summary and accumulation when looking at the code.
2. The kernel coding style stipulates that variables, functions and types are named in all lowercase and underlined, and constants (such as macro definitions and American drama constants) are named in all uppercase and underlined, such as the function name radix_tree_insert and type name in the example in the previous section struct radix_tree_root, constant name RADIX_TREE_MAP_SHIFT, etc.
3. The naming of global variables and global functions must be detailed, do not hesitate to use a few more words and write a few more underscores, such as the function name radix_tree_insert, because they will be used in many source files of the entire project, users must be clear about this variable Or what the function is for. The names of local variables and internal functions that are only called in one source file can be shortened, but not too short. Try not to use single letters as variable names, with one exception: it is okay to use i,j,k as loop variables.
4. A rule for Chinese programmers: It is forbidden to use Chinese pinyin as identifiers, and the readability is extremely poor.
function
Each function should be designed to be as simple as possible. Simple functions are easy to maintain. The following principles should be followed:
1. Implement a function just to do one thing well. Don’t design the function to be versatile and comprehensive. Such a function will definitely be too long, and it is often not reusable and difficult to maintain.
2. The indentation level inside the function should not be too much, generally less than 4 levels. If there are too many indentation levels, it means that the design is too complicated, and you should consider dividing it into smaller functions to call. ,
3. Do not write the function too long. It is recommended that it should not exceed two screens on a 24-line standard terminal. If it is too long, it will cause difficulty in reading. If a function exceeds two screens, you should consider dividing the function. It doesn't matter if a function is conceptually simple, but very long. For example, a function consists of a large switch, which has many cases, which is possible, because each case branch does not affect each other, and the complexity of the entire function is only equal to the complexity of one of the cases, which is very common, such as the TCP protocol state machine implementation.
4. To execute a function is to execute an action, and the function name should usually contain a verb, such as get_current,radix_tree_insert.
5. Comments must be added on the upper side of the more important function definitions, explaining the functions, parameters, return values, error codes, etc. of this function.
6. Another way to measure the complexity of a function is to see how many local variables there are. 5 to 10 local variables are already a lot. If there are more, it will be difficult to maintain. You should consider dividing it into multiple functions.
indent tool
The indent tool can format the code into a certain style, for example, format the code lacking indentation and whitespace into the kernel coding style:
indent -kr -i8 main.c
The -kr option indicates K&R style, and -i8 indicates the length of indentation 8 spaces. If the -nut option is not specified, every 8 indentation spaces will be automatically replaced by a Tab. Note that the indent command will directly modify the original file instead of printing it to the screen or outputting it to another file, which is different from many UNIX commands. It can be seen that the code formatted by the two options -kr -i8 already conforms to the code style introduced in this chapter. Necessary indentation and blank spaces are added, and longer lines of code are automatically wrapped. The fly in the ointment is that no blank lines are added, because the indet tool does not know which lines of code are logically grouped together, and the blank lines still need to be added by yourself. Of course, the original blank lines will not be deleted by indent.
If you adopt the kernel coding style introduced in this chapter, basically the two parameters -kr -i8 are enough. The indent tool also has options to support other coding styles, please refer to the Man Page for details.