Reporter: Hello, dear readers of Alibaba cloud. I'd like to meet you again. Today is our old friend Alibaba cloud container service ACK release's last guest to explore the mystery of life series. In previous interviews, it has brought us wonderful explanations. Interested friends are welcome to review. We have learned that since its launch in December last year, the container service ACK distribution has received everyone's attention and support, and has achieved good downloads. Do you have any views on this?
Alibaba cloud container service ACK Distro: Yes, I've been lucky to get 400 + downloads since it was launched three months ago. I also exchange technology with you through different channels. Thank you for your attention and hope you can get a better container service experience.
Reporter: OK, let's get to the point. I learned earlier that sealer can help you build & deploy quickly, and hybridnet can help build a hybrid cloud unified network plane. So who is the versatile partner introduced to us today?
ACK Distro: we all know that stateful applications in the cloud native context need to use a set of storage schemes for data persistence. Compared with distributed storage, local storage is superior in cost, ease of use, maintainability and IO performance. Therefore, today I will explain to you Alibaba's open source local storage management system and how I use it to play with container local storage. Let's first explain the opportunity of the birth of open local. Although we just mentioned the advantages of local storage over distributed storage, local storage, as a low-cost Kubernetes cluster, still has many problems:
• Kubernetes lacks the perception of storage resources: as a "non-standard" resource, the support of local storage in Kubernetes is much lower than that of standard resources (cpu, memory, etc.). Using local storage requires a certain labor cost, such as limiting Pod scheduling by marking nodes, manually managing disks of different models, manually attaching specified disks to containers through Hostpath, etc; At the same time, there are some on-site delivery problems of privatized software, such as binding the wrong host path so that the fault can not be found in time, which seriously affect the delivery efficiency of Kubernetes and the stability of application runtime;
• lack of local storage space isolation capability: improper application mounting to the host directory (such as mounting to the root path of the host) leads to host failure, such as no response during container operation due to application data filling the disk, triggering Pod expulsion, IO interaction between pods, etc;
• Kubernetes has insufficient support for stateful applications using local storage: node retention cannot be achieved through Hostpath, resulting in application data loss after Pod drift; The use of semi-automatic static Local PV can ensure the maintenance of nodes, but it cannot achieve full automation, and human participation is still required (such as creating folder paths, marking nodes, etc.); Some advanced storage capabilities (such as snapshots) cannot be used.
And open local can avoid these problems to the greatest extent and make everyone get a better experience. Using local storage on Kubernetes is as simple as using centralized storage.
Architecture Composition of open local
Reporter: can you further explain the components of open local architecture for us?
ACK Distro: of course, open local contains four components:
1. Scheduler Extender: as an extension component of Kube scheduler, it is realized through Extender, which extends the native scheduler's perception of local storage resources, so as to realize the scheduling decision of information including disk capacity, multi disk perception, disk media (ssd or hdd) and make mixed scheduling of storage resources;
2. CSI plugin: local disk management capability conforming to CSI(Container Storage Interface) standard, including the ability to create / delete / expand storage volumes, create / delete snapshots, expose storage volume metrics, etc;
3. agent: each node running in the cluster initializes the storage device according to the configuration list, and reports the local storage device information in the cluster for scheduler extender decision-making and scheduling;
4. controller: obtain the cluster storage initialization configuration and issue a detailed resource configuration list to the agent running on each node.
At the same time, open local contains two CRD s:
- Nodelocal storage: open local reports the storage device information on each node through the nodelocal storage resource, which is created by the controller and updated by the agent component of each node. The CRD is a global resource.
- Nodelocal storageinitconfig: the open local controller can create each nodelocal storage resource through the nodelocal storageinitconfig resource. The nodelocal storageinitconfig resource contains the global default node configuration and specific node configuration. If the node label of the node satisfies the expression, the specific node configuration is used; otherwise, the default configuration is used.
Its architecture diagram can refer to the following:
Usage scenario of open local
Reporter: so what kind of demand scenario will you use open local?
ACK Distro: I have summarized the following use cases. You can take your seats according to your own situation.
- The application expects that the data volume has the capacity isolation ability to avoid the situation that the log fills the system disk;
- Applications need a lot of local storage and rely on nodes to maintain, such as Hbase, etcd, ZooKeeper, ElasticSearch, etc;
- The cluster has a large number of local disks, and it is hoped to realize the automatic deployment of stateful applications through the scheduler;
- Backup instantaneous data for database applications through the ability to store snapshots.
How to use open local in ACK Distro
Reporter: next comes the old question. How can you embody the advantages of open local? Or how can you use open local to achieve best practices?
ACK Distro: don't explain my classification to you~
1. Initialization settings
First, make sure that the lvm tool has been installed in the environment. When installing and deploying, I will install open local by default, edit nodelocal storageinitconfig resources, and configure storage initialization.
# kubectl edit nlsc open-local
Using open local requires VG (VolumeGroup) in the environment. If VG already exists in your environment and there is space left, it can be configured in the white list; If there is no VG in the environment, you need to provide a block device name for open local to create VG.
apiVersion: csi.aliyun.com/v1alpha1 kind: NodeLocalStorageInitConfig metadata: name: open-local spec: globalConfig: # The global default node configuration will be populated into its Spec when nodelocal storage is created during initialization listConfig: vgs: include: # VolumeGroup whitelist, support regular expressions - open-local-pool-[0-9]+ - your-vg-name # If there is VG in the environment, it can be written into the white list and managed by open local resourceToBeInited: vgs: - devices: - /dev/vdc # If there is no VG in the environment, the user needs to provide a block device name: open-local-pool-0 # Initialize the block device / dev/vdc to a VG named open-local-pool-0
After the nodelocal storageinitconfig resource is edited, the controller and agent will update the nodelocal storage resources of all nodes.
#### 2. Storage volume dynamic provisioning
Open local deploys some storage class templates in the cluster by default. I take open local LVM, open local LVM XFS and open local LVM IO throttling as examples:
# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE open-local-lvm local.csi.aliyun.com Delete WaitForFirstConsumer true 8d open-local-lvm-xfs local.csi.aliyun.com Delete WaitForFirstConsumer true 6h56m open-local-lvm-io-throttling local.csi.aliyun.com Delete WaitForFirstConsumer true
Create a stateful set that uses open local LVM to store class templates. The storage volume file system created at this time is ext4. If the user specifies the open local LVM xfs storage template, the storage volume file system is xfs.
# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-nginx.yaml
Check the status of Pod/PVC/PV, and you can see that the storage volume is created successfully:
# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-lvm-0 1/1 Running 0 3m5s # kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE html-nginx-lvm-0 Bound local-52f1bab4-d39b-4cde-abad-6c5963b47761 5Gi RWO open-local-lvm 104s # kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE local-52f1bab4-d39b-4cde-abad-6c5963b47761 5Gi RWO Delete Bound default/html-nginx-lvm-0 open-local-lvm 2m4s kubectl describe pvc html-nginx-lvm-0
3. Storage volume expansion
Edit the spec.resources.of the corresponding PVC requests. In the storage field, expand the storage size declared by PVC from 5Gi to 20Gi.
# kubectl patch pvc html-nginx-lvm-0 -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
Check PVC/PV status:
# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE html-nginx-lvm-0 Bound local-52f1bab4-d39b-4cde-abad-6c5963b47761 20Gi RWO open-local-lvm 7h4m # kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-52f1bab4-d39b-4cde-abad-6c5963b47761 20Gi RWO Delete Bound default/html-nginx-lvm-0 open-local-lvm 7h4m
4. Storage volume snapshot
Open local has the following snapshot classes:
# kubectl get volumesnapshotclass NAME DRIVER DELETIONPOLICY AGE open-local-lvm local.csi.aliyun.com Delete 20m
To create a VolumeSnapshot resource:
# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/snapshot.yaml volumesnapshot.snapshot.storage.k8s.io/new-snapshot-test created # kubectl get volumesnapshot NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE new-snapshot-test true html-nginx-lvm-0 1863 open-local-lvm snapcontent-815def28-8979-408e-86de-1e408033de65 19s 19s # kubectl get volumesnapshotcontent NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE snapcontent-815def28-8979-408e-86de-1e408033de65 true 1863 Delete local.csi.aliyun.com open-local-lvm new-snapshot-test 48s
Create a new Pod. The storage volume data corresponding to the Pod is consistent with the data at the time when the snapshot point was applied:
# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-nginx-snap.yaml service/nginx-lvm-snap created statefulset.apps/nginx-lvm-snap created # kubectl get po -l app=nginx-lvm-snap NAME READY STATUS RESTARTS AGE nginx-lvm-snap-0 1/1 Running 0 46s # kubectl get pvc -l app=nginx-lvm-snap NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE html-nginx-lvm-snap-0 Bound local-1c69455d-c50b-422d-a5c0-2eb5c7d0d21b 4Gi RWO open-local-lvm 2m11s
5. Native block equipment
The storage volume created by open local support will be mounted in the container as a block device (in this case, the block device is in the container / dev/sdd path):
# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-block.yaml
Check Pod/PVC/PV status:
# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-lvm-block-0 1/1 Running 0 25s # kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE html-nginx-lvm-block-0 Bound local-b048c19a-fe0b-455d-9f25-b23fdef03d8c 5Gi RWO open-local-lvm 36s # kubectl describe pvc html-nginx-lvm-block-0 Name: html-nginx-lvm-block-0 Namespace: default StorageClass: open-local-lvm ... Access Modes: RWO VolumeMode: Block # Mount into the container as a piece of equipment Mounted By: nginx-lvm-block-0 ...
6.IO current limiting
open-local Support for PV set up IO Current limiting, support IO The current limiting storage class template is as follows: apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: open-local-lvm-io-throttling provisioner: local.csi.aliyun.com parameters: csi.storage.k8s.io/fstype: ext4 volumeType: "LVM" bps: "1048576" # Read / write throughput limit at 1024b / S iops: "1024" # IOPS is limited to 1024 reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true
Create a stateful set that uses open local LVM IO throttling to store class templates.
# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-io-throttling.yaml
After the Pod is in Running status, enter the Pod container:
# kubectl exec -it test-io-throttling-0 sh
At this time, the storage volume is mounted on / dev/sdd as a native block device. Execute the fio command:
# fio -name=test -filename=/dev/sdd -ioengine=psync -direct=1 -iodepth=1 -thread -bs=16k -rw=readwrite -numjobs=32 -size=1G -runtime=60 -time_based -group_reporting
The results are as follows. It can be seen that the read / write throughput is limited to 1024KiB/s:
...... Run status group 0 (all jobs): READ: bw=1024KiB/s (1049kB/s), 1024KiB/s-1024KiB/s (1049kB/s-1049kB/s), io=60.4MiB (63.3MB), run=60406-60406msec WRITE: bw=993KiB/s (1017kB/s), 993KiB/s-993KiB/s (1017kB/s-1017kB/s), io=58.6MiB (61.4MB), run=60406-60406msec Disk stats (read/write): dm-1: ios=3869/3749, merge=0/0, ticks=4848/17833, in_queue=22681, util=6.68%, aggrios=3112/3221, aggrmerge=774/631, aggrticks=3921/13598, aggrin_queue=17396, aggrutil=6.75% vdb: ios=3112/3221, merge=774/631, ticks=3921/13598, in_queue=17396, util=6.75%
7. Temporary volume
Open local supports the creation of temporary volumes for the Pod. The life cycle of the temporary volumes is the same as that of the Pod. That is, after the Pod is deleted, the temporary volumes are also deleted. It can be understood here as emptydir in the open local version.
# kubectl apply -f ./example/lvm/ephemeral.yaml
The results are as follows:
# kubectl describe po file-server Name: file-server Namespace: default ...... Containers: file-server: ...... Mounts: /srv from webroot (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-dns4c (ro) Volumes: webroot: # This is a CSI temporary volume Type: CSI (a Container Storage Interface (CSI) volume source) Driver: local.csi.aliyun.com FSType: ReadOnly: false VolumeAttributes: size=2Gi vgName=open-local-pool-0 default-token-dns4c: Type: Secret (a volume populated by a Secret) SecretName: default-token-dns4c Optional: false
8. Monitor the market
Open local has its own monitoring disk. Users can view the local storage information of the cluster through Grafana, including the information of storage devices and storage volumes. As shown in the figure below:
ACK Distro: in a word, with the help of open local, the labor cost can be reduced in operation and maintenance and the stability of cluster operation can be improved; In terms of functions, maximize the advantages of local storage, so that users can not only experience the high performance of local disk, but also enrich the application scenarios with various advanced storage features, so that developers can experience the dividends brought by cloud nativity, and realize the key step of cloud application, especially the original deployment of stateful application cloud.
Reporter: thanks for the wonderful explanation of ACK Distro. These three visits have given us a deeper understanding of ACK Distro and its partners. I hope the interview content can provide some help for you who are reading the article.
ACK Distro: Yes, members of the project team and I welcome everyone's "harassment" in GitHub community and community!
Related links
[1] Open local open source warehouse address:
https://github.com/alibaba/op...
[2]ACK Distro official website:
https://www.aliyun.com/produc...
[3]ACK Distro official GitHub:
https://github.com/AliyunCont...
[4] Making innovation within reach, Alibaba cloud container service ACK distribution is open for free download: https://mp.weixin.qq.com/s/Lc...
[5] First bullet in-depth interview:
https://mp.weixin.qq.com/s/wB...
[6] The second in-depth interview:
https://mp.weixin.qq.com/s/O0...