The in-depth dialogue with the container service ACK distribution finally played: how to play with the container local storage through open local

Reporter: Hello, dear readers of Alibaba cloud. I'd like to meet you again. Today is our old friend Alibaba cloud container service ACK release's last guest to explore the mystery of life series. In previous interviews, it has brought us wonderful explanations. Interested friends are welcome to review. We have learned that since its launch in December last year, the container service ACK distribution has received everyone's attention and support, and has achieved good downloads. Do you have any views on this?

Alibaba cloud container service ACK Distro: Yes, I've been lucky to get 400 + downloads since it was launched three months ago. I also exchange technology with you through different channels. Thank you for your attention and hope you can get a better container service experience.

Reporter: OK, let's get to the point. I learned earlier that sealer can help you build & deploy quickly, and hybridnet can help build a hybrid cloud unified network plane. So who is the versatile partner introduced to us today?

ACK Distro: we all know that stateful applications in the cloud native context need to use a set of storage schemes for data persistence. Compared with distributed storage, local storage is superior in cost, ease of use, maintainability and IO performance. Therefore, today I will explain to you Alibaba's open source local storage management system and how I use it to play with container local storage. Let's first explain the opportunity of the birth of open local. Although we just mentioned the advantages of local storage over distributed storage, local storage, as a low-cost Kubernetes cluster, still has many problems:

• Kubernetes lacks the perception of storage resources: as a "non-standard" resource, the support of local storage in Kubernetes is much lower than that of standard resources (cpu, memory, etc.). Using local storage requires a certain labor cost, such as limiting Pod scheduling by marking nodes, manually managing disks of different models, manually attaching specified disks to containers through Hostpath, etc; At the same time, there are some on-site delivery problems of privatized software, such as binding the wrong host path so that the fault can not be found in time, which seriously affect the delivery efficiency of Kubernetes and the stability of application runtime;

• lack of local storage space isolation capability: improper application mounting to the host directory (such as mounting to the root path of the host) leads to host failure, such as no response during container operation due to application data filling the disk, triggering Pod expulsion, IO interaction between pods, etc;

• Kubernetes has insufficient support for stateful applications using local storage: node retention cannot be achieved through Hostpath, resulting in application data loss after Pod drift; The use of semi-automatic static Local PV can ensure the maintenance of nodes, but it cannot achieve full automation, and human participation is still required (such as creating folder paths, marking nodes, etc.); Some advanced storage capabilities (such as snapshots) cannot be used.

And open local can avoid these problems to the greatest extent and make everyone get a better experience. Using local storage on Kubernetes is as simple as using centralized storage.

Architecture Composition of open local

Reporter: can you further explain the components of open local architecture for us?

ACK Distro: of course, open local contains four components:

1. Scheduler Extender: as an extension component of Kube scheduler, it is realized through Extender, which extends the native scheduler's perception of local storage resources, so as to realize the scheduling decision of information including disk capacity, multi disk perception, disk media (ssd or hdd) and make mixed scheduling of storage resources;

2. CSI plugin: local disk management capability conforming to CSI(Container Storage Interface) standard, including the ability to create / delete / expand storage volumes, create / delete snapshots, expose storage volume metrics, etc;

3. agent: each node running in the cluster initializes the storage device according to the configuration list, and reports the local storage device information in the cluster for scheduler extender decision-making and scheduling;

4. controller: obtain the cluster storage initialization configuration and issue a detailed resource configuration list to the agent running on each node.

At the same time, open local contains two CRD s:

  1. Nodelocal storage: open local reports the storage device information on each node through the nodelocal storage resource, which is created by the controller and updated by the agent component of each node. The CRD is a global resource.
  2. Nodelocal storageinitconfig: the open local controller can create each nodelocal storage resource through the nodelocal storageinitconfig resource. The nodelocal storageinitconfig resource contains the global default node configuration and specific node configuration. If the node label of the node satisfies the expression, the specific node configuration is used; otherwise, the default configuration is used.

Its architecture diagram can refer to the following:

Usage scenario of open local

Reporter: so what kind of demand scenario will you use open local?

ACK Distro: I have summarized the following use cases. You can take your seats according to your own situation.

  1. The application expects that the data volume has the capacity isolation ability to avoid the situation that the log fills the system disk;
  2. Applications need a lot of local storage and rely on nodes to maintain, such as Hbase, etcd, ZooKeeper, ElasticSearch, etc;
  3. The cluster has a large number of local disks, and it is hoped to realize the automatic deployment of stateful applications through the scheduler;
  4. Backup instantaneous data for database applications through the ability to store snapshots.

How to use open local in ACK Distro

Reporter: next comes the old question. How can you embody the advantages of open local? Or how can you use open local to achieve best practices?

ACK Distro: don't explain my classification to you~

1. Initialization settings

First, make sure that the lvm tool has been installed in the environment. When installing and deploying, I will install open local by default, edit nodelocal storageinitconfig resources, and configure storage initialization.

# kubectl edit nlsc open-local

Using open local requires VG (VolumeGroup) in the environment. If VG already exists in your environment and there is space left, it can be configured in the white list; If there is no VG in the environment, you need to provide a block device name for open local to create VG.

apiVersion: csi.aliyun.com/v1alpha1
kind: NodeLocalStorageInitConfig
metadata:
  name: open-local
spec:
  globalConfig: # The global default node configuration will be populated into its Spec when nodelocal storage is created during initialization
    listConfig:
      vgs:
        include: # VolumeGroup whitelist, support regular expressions
        - open-local-pool-[0-9]+
        - your-vg-name # If there is VG in the environment, it can be written into the white list and managed by open local
    resourceToBeInited:
      vgs:
      - devices:
        - /dev/vdc  # If there is no VG in the environment, the user needs to provide a block device
        name: open-local-pool-0 # Initialize the block device / dev/vdc to a VG named open-local-pool-0

After the nodelocal storageinitconfig resource is edited, the controller and agent will update the nodelocal storage resources of all nodes.

#### 2. Storage volume dynamic provisioning
Open local deploys some storage class templates in the cluster by default. I take open local LVM, open local LVM XFS and open local LVM IO throttling as examples:

# kubectl get sc
NAME                           PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
open-local-lvm                 local.csi.aliyun.com   Delete          WaitForFirstConsumer   true                   8d
open-local-lvm-xfs             local.csi.aliyun.com        Delete          WaitForFirstConsumer   true                   6h56m
open-local-lvm-io-throttling   local.csi.aliyun.com   Delete          WaitForFirstConsumer   true

Create a stateful set that uses open local LVM to store class templates. The storage volume file system created at this time is ext4. If the user specifies the open local LVM xfs storage template, the storage volume file system is xfs.

# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-nginx.yaml

Check the status of Pod/PVC/PV, and you can see that the storage volume is created successfully:

# kubectl get pod
NAME          READY   STATUS    RESTARTS   AGE
nginx-lvm-0   1/1     Running   0          3m5s
# kubectl get pvc
NAME               STATUS   VOLUME                                       CAPACITY   ACCESS MODES   STORAGECLASS     AGE
html-nginx-lvm-0   Bound    local-52f1bab4-d39b-4cde-abad-6c5963b47761   5Gi        RWO            open-local-lvm   104s
# kubectl get pv
NAME                                         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                      STORAGECLASS    AGE
local-52f1bab4-d39b-4cde-abad-6c5963b47761   5Gi        RWO            Delete           Bound    default/html-nginx-lvm-0   open-local-lvm  2m4s
kubectl describe pvc html-nginx-lvm-0

3. Storage volume expansion

Edit the spec.resources.of the corresponding PVC requests. In the storage field, expand the storage size declared by PVC from 5Gi to 20Gi.

# kubectl patch pvc html-nginx-lvm-0 -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'

Check PVC/PV status:

# kubectl get pvc
NAME                    STATUS   VOLUME                                       CAPACITY   ACCESS MODES   STORAGECLASS     AGE
html-nginx-lvm-0        Bound    local-52f1bab4-d39b-4cde-abad-6c5963b47761   20Gi       RWO            open-local-lvm   7h4m
# kubectl get pv
NAME                                         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS     REASON   AGE
local-52f1bab4-d39b-4cde-abad-6c5963b47761   20Gi       RWO            Delete           Bound    default/html-nginx-lvm-0        open-local-lvm            7h4m

4. Storage volume snapshot

Open local has the following snapshot classes:

# kubectl get volumesnapshotclass
NAME             DRIVER                DELETIONPOLICY   AGE
open-local-lvm   local.csi.aliyun.com   Delete           20m

To create a VolumeSnapshot resource:

# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/new-snapshot-test created
# kubectl get volumesnapshot
NAME                READYTOUSE   SOURCEPVC          SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS    SNAPSHOTCONTENT                                    CREATIONTIME   AGE
new-snapshot-test   true         html-nginx-lvm-0                           1863          open-local-lvm   snapcontent-815def28-8979-408e-86de-1e408033de65   19s            19s
# kubectl get volumesnapshotcontent
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                VOLUMESNAPSHOTCLASS   VOLUMESNAPSHOT      AGE
snapcontent-815def28-8979-408e-86de-1e408033de65   true         1863          Delete           local.csi.aliyun.com   open-local-lvm        new-snapshot-test   48s

Create a new Pod. The storage volume data corresponding to the Pod is consistent with the data at the time when the snapshot point was applied:

# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-nginx-snap.yaml
service/nginx-lvm-snap created
statefulset.apps/nginx-lvm-snap created
# kubectl get po -l app=nginx-lvm-snap
NAME               READY   STATUS    RESTARTS   AGE
nginx-lvm-snap-0   1/1     Running   0          46s
# kubectl get pvc -l app=nginx-lvm-snap
NAME                    STATUS   VOLUME                                       CAPACITY   ACCESS MODES   STORAGECLASS     AGE
html-nginx-lvm-snap-0   Bound    local-1c69455d-c50b-422d-a5c0-2eb5c7d0d21b   4Gi        RWO            open-local-lvm   2m11s

5. Native block equipment

The storage volume created by open local support will be mounted in the container as a block device (in this case, the block device is in the container / dev/sdd path):

# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-block.yaml

Check Pod/PVC/PV status:

# kubectl get pod
NAME                READY   STATUS    RESTARTS   AGE
nginx-lvm-block-0   1/1     Running   0          25s
# kubectl get pvc
NAME                     STATUS   VOLUME                                       CAPACITY   ACCESS MODES   STORAGECLASS     AGE
html-nginx-lvm-block-0   Bound    local-b048c19a-fe0b-455d-9f25-b23fdef03d8c   5Gi        RWO            open-local-lvm   36s
# kubectl describe pvc html-nginx-lvm-block-0
Name:          html-nginx-lvm-block-0
Namespace:     default
StorageClass:  open-local-lvm
...
Access Modes:  RWO
VolumeMode:    Block # Mount into the container as a piece of equipment
Mounted By:    nginx-lvm-block-0
...

6.IO current limiting

open-local Support for PV set up IO Current limiting, support IO The current limiting storage class template is as follows:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: open-local-lvm-io-throttling
provisioner: local.csi.aliyun.com
parameters:
  csi.storage.k8s.io/fstype: ext4
  volumeType: "LVM"
  bps: "1048576" # Read / write throughput limit at 1024b / S
  iops: "1024"   # IOPS is limited to 1024
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Create a stateful set that uses open local LVM IO throttling to store class templates.

# kubectl apply -f https://raw.githubusercontent.com/alibaba/open-local/main/example/lvm/sts-io-throttling.yaml

After the Pod is in Running status, enter the Pod container:

# kubectl exec -it test-io-throttling-0 sh

At this time, the storage volume is mounted on / dev/sdd as a native block device. Execute the fio command:

# fio -name=test -filename=/dev/sdd -ioengine=psync -direct=1 -iodepth=1 -thread -bs=16k -rw=readwrite -numjobs=32 -size=1G -runtime=60 -time_based -group_reporting

The results are as follows. It can be seen that the read / write throughput is limited to 1024KiB/s:

......
Run status group 0 (all jobs):
   READ: bw=1024KiB/s (1049kB/s), 1024KiB/s-1024KiB/s (1049kB/s-1049kB/s), io=60.4MiB (63.3MB), run=60406-60406msec
  WRITE: bw=993KiB/s (1017kB/s), 993KiB/s-993KiB/s (1017kB/s-1017kB/s), io=58.6MiB (61.4MB), run=60406-60406msec
Disk stats (read/write):
    dm-1: ios=3869/3749, merge=0/0, ticks=4848/17833, in_queue=22681, util=6.68%, aggrios=3112/3221, aggrmerge=774/631, aggrticks=3921/13598, aggrin_queue=17396, aggrutil=6.75%
  vdb: ios=3112/3221, merge=774/631, ticks=3921/13598, in_queue=17396, util=6.75%

7. Temporary volume

Open local supports the creation of temporary volumes for the Pod. The life cycle of the temporary volumes is the same as that of the Pod. That is, after the Pod is deleted, the temporary volumes are also deleted. It can be understood here as emptydir in the open local version.

# kubectl apply -f ./example/lvm/ephemeral.yaml

The results are as follows:

# kubectl describe po file-server
Name:         file-server
Namespace:    default
......
Containers:
  file-server:
    ......
    Mounts:
      /srv from webroot (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dns4c (ro)
Volumes:
  webroot:   # This is a CSI temporary volume
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            local.csi.aliyun.com
    FSType:
    ReadOnly:          false
    VolumeAttributes:      size=2Gi
                           vgName=open-local-pool-0
  default-token-dns4c:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dns4c
    Optional:    false

8. Monitor the market

Open local has its own monitoring disk. Users can view the local storage information of the cluster through Grafana, including the information of storage devices and storage volumes. As shown in the figure below:

ACK Distro: in a word, with the help of open local, the labor cost can be reduced in operation and maintenance and the stability of cluster operation can be improved; In terms of functions, maximize the advantages of local storage, so that users can not only experience the high performance of local disk, but also enrich the application scenarios with various advanced storage features, so that developers can experience the dividends brought by cloud nativity, and realize the key step of cloud application, especially the original deployment of stateful application cloud.

Reporter: thanks for the wonderful explanation of ACK Distro. These three visits have given us a deeper understanding of ACK Distro and its partners. I hope the interview content can provide some help for you who are reading the article.

ACK Distro: Yes, members of the project team and I welcome everyone's "harassment" in GitHub community and community!

Related links

[1] Open local open source warehouse address:
https://github.com/alibaba/op...
[2]ACK Distro official website:
https://www.aliyun.com/produc...
[3]ACK Distro official GitHub:
https://github.com/AliyunCont...
[4] Making innovation within reach, Alibaba cloud container service ACK distribution is open for free download: https://mp.weixin.qq.com/s/Lc...
[5] First bullet in-depth interview:
https://mp.weixin.qq.com/s/wB...
[6] The second in-depth interview:
https://mp.weixin.qq.com/s/O0...

Tags: Alibaba Cloud

Posted by cottonbuds2005 on Wed, 23 Mar 2022 07:52:36 +0530