There can be multiple processes in a process, and there can be multiple co processes in a process
CPU intensive computing (CPU bound)
It means that I/O can be completed in a very short time, and the CPU needs a lot of calculation and processing. It is characterized by high CPU occupancy
eg: compression and decompression, encryption and decryption, regular expression search
IO intensive computing (I/O bound)
Most of the system operation is in the CPU, and the CPU occupancy is low in the read / write operation of I/O (hard disk / memory)
eg: file handler, web crawler, database reader
Advantages: multi core CPU can be used
Disadvantages: it occupies the most resources and can be started less than threads
Applicable to: CPU intensive computing
from multiprocessing import Process from multiprocessing import Pool import os,time,random def run_proc(name): print('Run child process %s (%s)...' % (name, os.getpid())) if __name__=='__main__': print('Parent process %s.' % os.getpid())#os.getpid() can get the ID of the parent program # Process() receives target, which represents the function to be completed by binding. If it is not bound, the run method is executed by default. The second tuple is the parameter list passed into the top function, # #You can also optionally pass the parameter name to represent the name of the new child process, and you can also pass in a dictionary # p = Process(target=proc_fun, args=(18,), name = "haha, subprocess 1") p = Process(target=run_proc, args=('test',)) print('Child process will start.') p.start() #Start the Process instance through the function p.join()#Wait for the subroutine to finish before continuing. It is used for synchronization between processes print('Child process end.')
from multiprocessing import Pool import os,time,random def task(name): print("run task %s,(%s)"%(name, os.getpid())) start = time.time() time.sleep(random.random()*3) end = time.time() print("process %s run %0.2fseconds"%(name,(end - start))) if __name__ == '__main__': print("Parent process %s,"%os.getpid()) p = Pool(4)#Indicates the size of the process pool, and 4 indicates that four programs can run at the same time for i in range(5): p.apply_async(task,args=(i,))#In the Process pool, we cannot use Process here instead print("waitint......") p.close() p.join() print("all process end")
When we create a sub process and want to control its input and output, we also need to use the sub process. We can start a sub process and control its input and output through this function.
Communication between processes creates two child processes in the parent process through Queue and Pipes. One writes data to the Queue and the other reads data from the Queue
from multiprocessing import Process,Queue import os,time,random def write(q): print("start to write %s"%os.getpid()) for a in ['a','b','c']: print("put %s into queue"%a) q.put(a) time.sleep(random.random()) def read(q): print("start to read: %s" %os.getpid()) while True : a = q.get(True) print("get %s from queue" %a) if __name__ =='__main__': q = Queue() pw = Process(target=write,args=(q,)) pr = Process(target=read, args=(q,)) pw.start() pr.start() pw.join() pr.terminate()
Advantages: compared with the process, it is more lightweight and occupies less resources
Compared with processes: multithreading can only execute concurrently and can only use a single CPU
Compared with coroutines, the number of starts is limited, occupying memory resources and thread switching overhead
Applicable to: IO intensive computing, with a small number of tasks running at the same time
Multiple tasks can be completed by multiple processes, or by multiple threads in a single process, and there is at least one thread in a process. At the same time, threads are also units directly supported by the operating system
import threading, time,random def loop(): print("thread %s is running" % threading.current_thread().name)#Child thread #current_thread() returns an instance of the current function n = 0 while n < 5: start = time.time() n = n+1 print("kfhgk %s >> %s" %(threading.current_thread().name,n)) time.sleep(random.random()) end = time.time() print("to run tihis process the computer have spent %0.2f" %(end - start)) print("the process is ending") print("thread %s is running" % threading.current_thread().name)#Main thread t = threading.Thread(target=loop, name="jghdfjfg") t.start() t.join() print("thread %s is ending" % threading.current_thread().name)
In multiple processes, a copy of the same variable exists in each process and does not affect each other. In multiple threads, all variables are shared by all threads.
import threading balance = 0; lock = threading.Lock() def change(m): global balance #Multiple statements are required to execute the following two statements, and the thread may be interrupted balance = balance + m balance = balance - m def run_thread(m): for i in range(2000000):#Thread scheduling is determined by the operating system. t1 and t2 execute alternately. When the number of cycles reaches a certain level, the output result may not be accurate #Adding a lock can perfectly solve the problem, but it is worth noting that an error will be reported when directly using threading.Lock().acquire() #With this lock, it is ensured that the next thread will be executed after the current thread is executed, avoiding the situation of alternating execution. lock.acquire() try: change(m) finally: lock.release() t1 = threading.Thread(target=run_thread,args=(3,)) t2 = threading.Thread(target=run_thread,args=(5,)) t1.start() t2.start() t1.join() t2.join() print(balance)
In the case of multithreading, each thread will have multiple data. We can quickly achieve our goal through the threading.local() method.
import threading NAME = threading.local()#global variable def process_student(): #Gets the st of the current thread stn = NAME.st print("%s %s"%(stn,threading.current_thread().name))#Gets the name of the current child process def process_thread(ne): #Bind the st of the current thread NAME.st = ne process_student() t1 = threading.Thread(target=process_thread,args=('ghkghkfgh',),name='kdkgk') t1.start() t1.join()
Multi process Coroutine(asynico)
Advantages: minimum memory overhead and maximum number of starts
Disadvantages: the supported libraries are limited and the code is complex
eg: if you want to use a coroutine in a crawler, you must use aiohttp instead of the requests library
Applicable to: it is applicable to IO intensive computing and ultra multitasking, but it is only applicable to scenarios supported by ready-made libraries
It is a mechanism used by the computer programming language interpreter to synchronize threads. It makes only one thread execute at any time. On multi-core processors, the GIL interpreter also allows only one thread to execute at the same time.
During IO, the thread will release GIL, which can realize parallel operation between IO and CPU, but for CPU intensive operation, it will reduce the speed of operation.
In python, CPU intensive operations can be solved through multiprocess.