Pod affinity in Kubernetes scheduling

abstract.png
Pod affinity
Node affinity is to schedule Pod s based on node labels. The Pod affinity can be used to constrain the nodes that new Pods can be scheduled to based on the labels of Pods already running on the nodes. Specifically, if one or more Pods satisfying rule Y are already running on X, then this new Pod should also run on X. Where X can be a node, a rack, an availability zone, a geographic area, etc. The topology domain X can be defined through the topologyKey field, whose value is the key name Key of the node label; and Y is the rule that Kubernetes tries to satisfy. Can be defined in the form of label selectors. In addition, Pod affinity uses the podAffinity field, under which two kinds of affinity are supported:
"requiredDuringSchedulingIgnoredDuringExecution": requiredDuringScheduling indicates that it is a mandatory schedule, and the scheduler can only execute the schedule when the rules are met; while IgnoredDuringExecution indicates that it will not affect Pod s already running on the node
"preferredDuringSchedulingIgnoredDuringExecution": preferredDuringScheduling indicates that it is a preferred scheduling, and the scheduler will prioritize nodes that meet the corresponding rules to schedule Pod s according to the preference. But if no node that satisfies the rules is found, the scheduler will choose another node to schedule the Pod. IgnoredDuringExecution indicates that it will not affect Pod s already running on the node
mandatory scheduling
A K8s cluster with 5 working nodes is provided here, and then we use Deployment to deploy the front-end service of the application
# Application front-end services apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-frontendspec: # Pod number of copies replicas: 2 selector: matchLabels: app: frontend # Pod template template: metadata: # Label Information: Application front-end services labels: app: frontend spec: # Container information containers: - name: my-app-frontend image: jocatalin/kubernetes-bootcamp:v1
The effect is as follows, you can see that the front-end service runs 2 Pod s. On my-k8s-cluster-multi-node-worker2 and my-k8s-cluster-multi-node-worker3 nodes respectively

figure 1.jpeg
Now let's deploy the backend service of the application. Here we expect the Pod of the backend service to run on the same node as the Pod of the frontend service. This can be achieved through node affinity. specifically:
First, use podAffinity to define Pod affinity rules, and use requiredDuringSchedulingIgnoredDuringExecution to define mandatory scheduling rules
Then, use the label selector to determine the scope of the Pod, here we obviously select the Pod of the front-end service
Finally, use topologyKey to define the topology domain information, here we use the node host name. For new pods to be scheduled at this point and pods identified by the label selector, the hostnames of their respective running nodes are the same. That is, the Pod of the backend service must be scheduled to a node with the same hostname as the node where the Pod of the frontend service resides
In this way, the Pod of the front-end service must exist on the node where the Pod of the back-end service is located.
# Application backend services apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-backendspec: # Pod number of copies replicas: 4 selector: matchLabels: app: backend # Pod template template: metadata: # Label Information: Application backend services labels: app: backend spec: # Affinity affinity: # Pod affinity rules podAffinity: # Mandatory Scheduling Rules, but will not affect the Pod requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname # label selector labelSelector: matchLabels: app: frontend # Container information containers: - name: my-app-backend image: tutum/dnsutils command: - sleep - infinity
The effect is as follows. The five Pod s of the backend service are all running on the my-k8s-cluster-multi-node-worker2 and my-k8s-cluster-multi-node-worker3 nodes

figure 2.jpeg
preference scheduling
A K8s cluster with 5 working nodes is provided here, and then we use Deployment to deploy the front-end service of the application
# Application backend services apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-backendspec: # Pod number of copies replicas: 2 selector: matchLabels: app: backend # Pod template template: metadata: # Label Information: Application backend services labels: app: backend spec: # Container information containers: - name: my-app-backend image: tutum/dnsutils command: - sleep - infinity
The effect is as follows, you can see that the backend service runs 2 Pod s. On the my-k8s-cluster-multi-node-worker and my-k8s-cluster-multi-node-worker3 nodes respectively. According to the label information, both nodes are located in Shanghai

figure 3.jpeg
Now let's deploy the cache service of the application. Here we expect that the node where the cache service Pod is located can pass through a geographical area as much as possible with the node where the front-end service Pod is located, so as to reduce the delay of network communication.
First, use podAffinity to define Pod affinity rules, and use preferredDuringSchedulingIgnoredDuringExecution to define preferred scheduling rules. The preference for nodes can be defined by the weight value, the greater the weight value, the higher the priority. Among them, the weight range: 1~100
Then, use the label selector to determine the scope of the Pod, here we obviously select the Pod of the front-end service
Finally, use topologyKey to define the topological domain information, here we use the geographic location of the node's Region label. At this time, the new Pod to be scheduled will be more inclined to be scheduled to the node with the same Region label value as the node where the Pod is located determined by the label selector. That is, the node where the Pod of the cache service is located and the node where the Pod of the backend service is located will be located in the same geographic region as much as possible (with the same Region label value)
# Application cache service apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-redisspec: # Pod number of copies replicas: 4 selector: matchLabels: app: redis # Pod template template: metadata: # Label Information: Application backend services labels: app: redis spec: # Affinity affinity: # Pod affinity rules podAffinity: # Preferential Scheduling Rules, but will not affect the Pod preferredDuringSchedulingIgnoredDuringExecution: - weight: 70 # weight range: 1~100, The greater the weight value, higher priority podAffinityTerm: topologyKey: Region # label selector labelSelector: matchLabels: app: backend # Container information containers: - name: my-app-redis image: redis:3.2-alpine
The effect is as follows. The four Pod s of the cache service all run on the same node located in Shanghai. It should be noted that since this is a preference scheduling. Therefore, if a node in Shanghai cannot schedule Pod s to run on it for some reason, it is legal and permissible for the scheduler to select nodes in other regions for scheduling. For example, here is a node in Guangzhou

figure 4.jpeg
Pod anti-affinity
The Pod anti-affinity is also based on the label of the Pod already running on the node to constrain the nodes that the new Pod can be scheduled to. It's just the opposite of Pod affinity. That is, if one or more Pods satisfying rule Y are already running on X, then this new Pod should not run on X. In addition, Pod anti-affinity uses the podAntiAffinity field, which also supports two types of anti-affinity: requiredDuringSchedulingIgnoredDuringExecution mandatory scheduling, preferredDuringSchedulingIgnoredDuringExecution preferred scheduling
mandatory scheduling
A K8s cluster with 5 working nodes is provided here, and then we use Deployment to deploy the backend service of the application. Here, for the convenience of subsequent demonstrations, we stipulate that the Pod of the backend service is only allowed to run on the nodes in Shanghai
# Application backend services apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-backendspec: # Pod number of copies replicas: 5 selector: matchLabels: app: backend # Pod template template: metadata: # Label Information: Application backend services labels: app: backend spec: # Configure Node Selector, Require K8s only the Pod Deploy to include tags Region=ShangHai on the node nodeSelector: Region: ShangHai # Container information containers: - name: my-app-backend image: tutum/dnsutils command: - sleep - infinity
The effect is as follows, you can see that the backend service runs 5 Pod s. Run on the my-k8s-cluster-multi-node-worker, my-k8s-cluster-multi-node-worker3, and my-k8s-cluster-multi-node-worker5 nodes in Shanghai respectively

figure 5.jpeg
Now let's deploy the front-end service of the application. At the same time, for some special reasons, we expect the Pod s of the front-end and back-end services of the application to run in different regions. That is, the Pod of the front-end service runs on a node located in Shanghai, and we want the Pod of the back-end service to run on a node other than Shanghai. This can be achieved through node anti-affinity. specifically:
First, use podAntiAffinity to define Pod anti-affinity rules, and use requiredDuringSchedulingIgnoredDuringExecution to define mandatory scheduling rules
Then, use the label selector to determine the scope of the Pod, here we obviously select the Pod of the backend service
Finally, use topologyKey to define the topological domain information, here we use the geographic location of the node's Region label. For the new Pod to be scheduled at this time and the Pod determined by the label selector, the geographical location of their respective running nodes is different. That is, the Pod of the backend service must be scheduled to a node with a different geographic location from the node where the Pod of the frontend service resides
# Application front-end services apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-frontendspec: # Pod number of copies replicas: 5 selector: matchLabels: app: frontend # Pod template template: metadata: # Label Information: Application front-end services labels: app: frontend spec: # Affinity affinity: # Pod anti-affinity rules podAntiAffinity: # Mandatory Scheduling Rules, but will not affect the Pod requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: Region # label selector labelSelector: matchLabels: app: backend # Container information containers: - name: my-app-frontend image: jocatalin/kubernetes-bootcamp:v1
The effect is as follows. The five Pod s of the front-end service are all running on the my-k8s-cluster-multi-node-worker2 and my-k8s-cluster-multi-node-worker4 nodes located in Guangzhou

figure 6.jpeg
preference scheduling
A K8s cluster with 5 working nodes is provided here, and then we use Deployment to deploy the backend service of the application
# Application backend services apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-backendspec: # Pod number of copies replicas: 2 selector: matchLabels: app: backend # Pod template template: metadata: # Label Information: Application backend services labels: app: backend spec: # Container information containers: - name: my-app-backend image: tutum/dnsutils command: - sleep - infinity
The effect is as follows, you can see that the front-end service runs 2 Pod s. On my-k8s-cluster-multi-node-worker and my-k8s-cluster-multi-node-worker4 nodes respectively
figure 7.jpeg
Now let's deploy the cache service of the application. Here, the backend service and the cache service are very resource-intensive. Therefore, we propose the following scheduling requirements
The node where the cache service Pod is located can be different from the node where the front-end service Pod is located.
The node where the cache service Pod is located can be different from the node where the front-end service Pod is located in the same geographical area as possible
The 2 points of appeal are preference requirements, and the priority of point 1 is the highest
First, use podAntiAffinity to define Pod anti-affinity rules, and use preferredDuringSchedulingIgnoredDuringExecution to define preferred scheduling rules. Preferences can be defined through weight values, the greater the weight value, the higher the priority. Among them, the weight range: 1~100
Then, use the label selector to determine the scope of the Pod, here we obviously select the Pod of the front-end service
Finally, use topologyKey to define topology domain information. Here we use the node hostname and geographic location respectively. In this way, the new Pod to be scheduled will be more inclined to be scheduled on a node that has a different node or a different geographical area from the node of the Pod determined by the label selector
# Application cache service apiVersion: apps/v1kind: Deploymentmetadata: name: my-app-redisspec: # Pod number of copies replicas: 4 selector: matchLabels: app: redis # Pod template template: metadata: # Label Information: Application backend services labels: app: redis spec: # Affinity affinity: # Pod anti-affinity rules podAntiAffinity: # Preferential Scheduling Rules, but will not affect the Pod preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 # weight range: 1~100, The greater the weight value, higher priority podAffinityTerm: topologyKey: kubernetes.io/hostname # label selector labelSelector: matchLabels: app: backend - weight: 1 # weight range: 1~100, The greater the weight value, higher priority podAffinityTerm: topologyKey: Region # label selector labelSelector: matchLabels: app: backend # Container information containers: - name: my-app-redis image: redis:3.2-alpine
The effect is as follows. The four Pods of the cache service all run on nodes different from the node where the backend service Pod resides. However, due to preference scheduling here, when the final result does not satisfy the two services running in different geographic locations, it is legal and permissible for the scheduler to select nodes in the same geographic location for scheduling

figure 8.jpeg