12. Process, thread and collaboration


There can be multiple processes in a process, and there can be multiple co processes in a process

CPU intensive computing (CPU bound)

It means that I/O can be completed in a very short time, and the CPU needs a lot of calculation and processing. It is characterized by high CPU occupancy
eg: compression and decompression, encryption and decryption, regular expression search

IO intensive computing (I/O bound)

Most of the system operation is in the CPU, and the CPU occupancy is low in the read / write operation of I/O (hard disk / memory)
eg: file handler, web crawler, database reader

multiprocessing

Advantages: multi core CPU can be used
Disadvantages: it occupies the most resources and can be started less than threads
Applicable to: CPU intensive computing

Process

from multiprocessing import Process
from multiprocessing import Pool
import os,time,random


def run_proc(name):
    print('Run child process %s (%s)...' % (name, os.getpid()))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())#os.getpid() can get the ID of the parent program
    # Process() receives target, which represents the function to be completed by binding. If it is not bound, the run method is executed by default. The second tuple is the parameter list passed into the top function,
    # #You can also optionally pass the parameter name to represent the name of the new child process, and you can also pass in a dictionary
    # p = Process(target=proc_fun, args=(18,), name = "haha, subprocess 1")

    p = Process(target=run_proc, args=('test',))
    print('Child process will start.')
    p.start() #Start the Process instance through the function
    p.join()#Wait for the subroutine to finish before continuing. It is used for synchronization between processes
    print('Child process end.')

Pool

from multiprocessing import Pool
import os,time,random

def task(name):
    print("run task %s,(%s)"%(name, os.getpid()))
    start = time.time()
    time.sleep(random.random()*3)
    end = time.time()
    print("process %s run %0.2fseconds"%(name,(end - start)))

if __name__ == '__main__':
    print("Parent process %s,"%os.getpid())
    p = Pool(4)#Indicates the size of the process pool, and 4 indicates that four programs can run at the same time
    for i in range(5):
        p.apply_async(task,args=(i,))#In the Process pool, we cannot use Process here instead
    print("waitint......")
    p.close()
    p.join()
    print("all process end")

subprocess

When we create a sub process and want to control its input and output, we also need to use the sub process. We can start a sub process and control its input and output through this function.

queue

Communication between processes creates two child processes in the parent process through Queue and Pipes. One writes data to the Queue and the other reads data from the Queue

from multiprocessing import Process,Queue
import os,time,random

def write(q):
    print("start to write %s"%os.getpid())
    for a in ['a','b','c']:
        print("put %s into queue"%a)
        q.put(a)
        time.sleep(random.random())
def read(q):
    print("start to read: %s" %os.getpid())
    while True :
        a = q.get(True)
        print("get %s from queue" %a)

if __name__ =='__main__':
    q = Queue()
    pw = Process(target=write,args=(q,))
    pr = Process(target=read, args=(q,))
    pw.start()
    pr.start()
    pw.join()
    pr.terminate()

Multithreading (threading)

Advantages: compared with the process, it is more lightweight and occupies less resources
Disadvantages:
Compared with processes: multithreading can only execute concurrently and can only use a single CPU
Compared with coroutines, the number of starts is limited, occupying memory resources and thread switching overhead
Applicable to: IO intensive computing, with a small number of tasks running at the same time
Multiple tasks can be completed by multiple processes, or by multiple threads in a single process, and there is at least one thread in a process. At the same time, threads are also units directly supported by the operating system

import threading, time,random

def loop():
    print("thread %s is running" % threading.current_thread().name)#Child thread
    #current_thread() returns an instance of the current function
    n = 0
    while n < 5:
        start = time.time()
        n = n+1
        print("kfhgk %s >> %s" %(threading.current_thread().name,n))
        time.sleep(random.random())
        end = time.time()
        print("to run tihis process the computer have spent %0.2f" %(end - start))
    print("the process is ending")

print("thread %s is running" % threading.current_thread().name)#Main thread
t = threading.Thread(target=loop, name="jghdfjfg")
t.start()
t.join()
print("thread %s is ending" % threading.current_thread().name)

Lock

In multiple processes, a copy of the same variable exists in each process and does not affect each other. In multiple threads, all variables are shared by all threads.

import threading
balance = 0;
lock = threading.Lock()
def change(m):
    global balance
    #Multiple statements are required to execute the following two statements, and the thread may be interrupted
    balance = balance + m
    balance = balance - m
def run_thread(m):
    for i in range(2000000):#Thread scheduling is determined by the operating system. t1 and t2 execute alternately. When the number of cycles reaches a certain level, the output result may not be accurate
        #Adding a lock can perfectly solve the problem, but it is worth noting that an error will be reported when directly using threading.Lock().acquire()
        #With this lock, it is ensured that the next thread will be executed after the current thread is executed, avoiding the situation of alternating execution.
        lock.acquire()
        try:
            change(m)
        finally:
            lock.release()
t1 = threading.Thread(target=run_thread,args=(3,))
t2 = threading.Thread(target=run_thread,args=(5,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)

threading.local()

In the case of multithreading, each thread will have multiple data. We can quickly achieve our goal through the threading.local() method.

import threading

NAME = threading.local()#global variable

def process_student():
    #Gets the st of the current thread
    stn = NAME.st
    print("%s %s"%(stn,threading.current_thread().name))#Gets the name of the current child process
def process_thread(ne):
    #Bind the st of the current thread
    NAME.st = ne
    process_student()
t1 = threading.Thread(target=process_thread,args=('ghkghkfgh',),name='kdkgk')
t1.start()
t1.join()

Multi process Coroutine(asynico)

Advantages: minimum memory overhead and maximum number of starts
Disadvantages: the supported libraries are limited and the code is complex
eg: if you want to use a coroutine in a crawler, you must use aiohttp instead of the requests library
Applicable to: it is applicable to IO intensive computing and ultra multitasking, but it is only applicable to scenarios supported by ready-made libraries

GIL

It is a mechanism used by the computer programming language interpreter to synchronize threads. It makes only one thread execute at any time. On multi-core processors, the GIL interpreter also allows only one thread to execute at the same time.
During IO, the thread will release GIL, which can realize parallel operation between IO and CPU, but for CPU intensive operation, it will reduce the speed of operation.
In python, CPU intensive operations can be solved through multiprocess.

Tags: Python

Posted by Think Pink on Tue, 21 Sep 2021 07:39:30 +0530