基于最后的发现,所以,我决定深入了解:
Pod 是如何在底层实现的
Pod 和 Container 之间的实际区别是什么
如何使用 Docker 创建 Pod
在此过程中,我希望它能帮助我巩固我的 Linux、Docker 和 Kubernetes 技能。
1 $ cat > Vagrantfile <<EOF
2 # -*- mode: ruby -*-
3 # vi: set ft=ruby :
4
5 Vagrant.configure("2") do |config|
6 config.vm.box = "debian/buster64"
7 config.vm.hostname = "docker-host"
8 config.vm.define "docker-host"
9 config.vagrant.plugins = ['vagrant-vbguest']
10
11 config.vm.provider "virtualbox" do |vb|
12 vb.cpus = 2
13 vb.memory = "2048"
14 end
15
16 config.vm.provision "shell", inline: <<-SHELL
17 apt-get update
18 apt-get install -y curl vim
19 SHELL
20
21 config.vm.provision "docker"
22 end
23 EOF
24
25 $ vagrant up
26 $ vagrant ssh
最后让我们启动一个容器:
$ docker run --name foo --rm -d --memory='512MB' --cpus='0.5' nginx
1 # Look up the container in the process tree.
2 $ ps auxf
3 USER PID ... COMMAND
4 ...
5 root 4707 /usr/bin/containerd-shim-runc-v2 -namespace moby -id cc9466b3e...
6 root 4727 _ nginx: master process nginx -g daemon off;
7 systemd+ 4781 _ nginx: worker process
8 systemd+ 4782 _ nginx: worker process
9
10 # Find the namespaces used by 4727 process.
11 $ sudo lsns
12 NS TYPE NPROCS PID USER COMMAND
13 ...
14 4026532157 mnt 3 4727 root nginx: master process nginx -g daemon off;
15 4026532158 uts 3 4727 root nginx: master process nginx -g daemon off;
16 4026532159 ipc 3 4727 root nginx: master process nginx -g daemon off;
17 4026532160 pid 3 4727 root nginx: master process nginx -g daemon off;
18 4026532162 net 3 4727 root nginx: master process nginx -g daemon off;
1 PID=$(docker inspect --format '{{.State.Pid}}' foo)
2
3 # Check cgroupfs node for the container main process (4727).
4 $ cat /proc/${PID}/cgroup
5 11:freezer:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
6 10:blkio:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
7 9:rdma:/
8 8:pids:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
9 7:devices:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
10 6:cpuset:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
11 5:cpu,cpuacct:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
12 4:memory:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
13 3:net_cls,net_prio:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
14 2:perf_event:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
15 1:name=systemd:/docker/cc9466b3eb67ca374c925794776aad2fd45a34343ab66097a44594b35183dba0
16 0::/system.slice/containerd.service
1 ID=$(docker inspect --format '{{.Id}}' foo)
2
3 # Check the memory limit.
4 $ cat /sys/fs/cgroup/memory/docker/${ID}/memory.limit_in_bytes
5 536870912 # Yay! It's the 512MB we requested!
6
7 # See the CPU limits.
8 ls /sys/fs/cgroup/cpu/docker/${ID}
有趣的是在不明确设置任何资源限制的情况下启动容器都会配置一个 cgroup。实际中我没有检查过,但我的猜测是默认情况下,CPU 和 RAM 消耗不受限制,Cgroups 可能用来限制从容器内部对某些设备的访问。
这是我在调查后脑海中呈现的容器:
现在,让我们来看看 Kubernetes Pod。与容器一样,Pod 的实现可以在不同的 CRI 运行时(runtime)之间变化。例如,当 Kata 容器被用来作为一个支持的运行时类时,某些 Pod 可以就是真实的虚拟机了!并且正如预期的那样,基于 VM 的 Pod 与传统 Linux 容器实现的 Pod 在实现和功能方面会有所不同。
为了保持容器和 Pod 之间公平比较,我们会在使用 ContainerD/Runc 运行时的 Kubernetes 集群上进行探索。这也是 Docker 在底层运行容器的机制。
1 # Install arkade ()
2 $ curl -sLS https://get.arkade.dev | sh
3
4 $ arkade get kubectl minikube
5
6 minikube start --driver virtualbox --container-runtime containerd
1 $ kubectl --context=minikube 2 apply -f - <<EOFapiVersion: v1 3 kind: Pod 4 metadata: 5 name: foo 6 spec: 7 containers: 8 - name: app 9 image: docker.io/kennethreitz/httpbin 10 ports: 11 - containerPort: 80 12 resources: 13 limits: 14 memory: "256Mi" 15 - name: sidecar 16 image: curlimages/curl 17 command: ["/bin/sleep", "3650d"] 18 resources: 19 limits: 20 memory: "128Mi" 21 EOF
1 $ minikube ssh
1 $ ps auxf 2 USER PID ... COMMAND 3 ... 4 root 4947 _ containerd-shim -namespace k8s.io -workdir /mnt/sda1/var/lib/containerd/... 5 root 4966 _ /pauseroot 6 4981 _ containerd-shim -namespace k8s.io -workdir /mnt/sda1/var/lib/containerd/... 7 root 5001 _ /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 8 root 5016 _ /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 9 root 5018 _ containerd-shim -namespace k8s.io -workdir /mnt/sda1/var/lib/containerd/... 10 100 5035 _ /bin/sleep 3650d
1 $ sudo ctr --namespace=k8s.io containers ls 2 CONTAINER IMAGE RUNTIME 3 ... 4 097d4fe8a7002 docker.io/curlimages/curl@sha256:1a220 io.containerd.runtime.v1.linux 5 ... 6 dfb1cd29ab750 docker.io/kennethreitz/httpbin:latest io.containerd.runtime.v1.linux 7 ... 8 f0e87a9330466 k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linux
1 $ sudo crictl ps 2 CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID 3 097d4fe8a7002 bcb0c26a91c90 About an hour ago Running sidecar 0 f0e87a9330466 4 dfb1cd29ab750 b138b9264903f About an hour ago Running app 0 f0e87a9330466
但是注意,上述的 POD ID 字段和 ctr 输出的 pause:3.1 容器 id 一致。好吧,看上去这个 Pod 是一个辅助容器。所以,它有什么用呢?
我还没有注意到在 OCI 运行时规范中有和 Pod 相对应的东西。因此,当我对 Kubernetes API 规范提供的信息不满意时,我通常直接进入 Kubernetes Container Runtime 接口(CRI)Protobuf 文件中查找相应的信息:
1 // kubelet expects any compatible container runtime
2 // to implement the following gRPC methods:
3
4 service RuntimeService {
5 ...
6 rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}
7 rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}
8 rpc RemovePodSandbox(RemovePodSandboxRequest) returns (RemovePodSandboxResponse) {}
9 rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}
10 rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}
11
12 rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
13 rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
14 rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
15 rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
16 rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
17 rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
18 rpc UpdateContainerResources(UpdateContainerResourcesRequest) returns (UpdateContainerResourcesResponse) {}
19 rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {}
20
21 // ...
22 }
23
24 message CreateContainerRequest {
25 // ID of the PodSandbox in which the container should be created.
26 string pod_sandbox_id = 1;
27 // Config of the container.
28 ContainerConfig config = 2;
29 // Config of the PodSandbox. This is the same config that was passed
30 // to RunPodSandboxRequest to create the PodSandbox. It is passed again
31 // here just for easy reference. The PodSandboxConfig is immutable and
32 // remains the same throughout the lifetime of the pod.
33 PodSandboxConfig sandbox_config = 3;
34 }
1 $ sudo lsns 2 NS TYPE NPROCS PID USER COMMAND 3 4026532614 net 4 4966 root /pause 4 4026532715 mnt 1 4966 root /pause 5 4026532716 uts 4 4966 root /pause 6 4026532717 ipc 4 4966 root /pause 7 4026532718 pid 1 4966 root /pause 8 4026532719 mnt 2 5001 root /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 9 4026532720 pid 2 5001 root /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 10 4026532721 mnt 1 5035 100 /bin/sleep 3650d 11 4026532722 pid 1 5035 100 /bin/sleep 3650d
1 # httpbin container
2 sudo ls -l /proc/5001/ns
3 ...
4 lrwxrwxrwx 1 root root 0 Oct 24 14:05 ipc -> 'ipc:[4026532717]'
5 lrwxrwxrwx 1 root root 0 Oct 24 14:05 mnt -> 'mnt:[4026532719]'
6 lrwxrwxrwx 1 root root 0 Oct 24 14:05 net -> 'net:[4026532614]'
7 lrwxrwxrwx 1 root root 0 Oct 24 14:05 pid -> 'pid:[4026532720]'
8 lrwxrwxrwx 1 root root 0 Oct 24 14:05 uts -> 'uts:[4026532716]'
9
10 # sleep container
11 sudo ls -l /proc/5035/ns
12 ...
13 lrwxrwxrwx 1 100 101 0 Oct 24 14:05 ipc -> 'ipc:[4026532717]'
14 lrwxrwxrwx 1 100 101 0 Oct 24 14:05 mnt -> 'mnt:[4026532721]'
15 lrwxrwxrwx 1 100 101 0 Oct 24 14:05 net -> 'net:[4026532614]'
16 lrwxrwxrwx 1 100 101 0 Oct 24 14:05 pid -> 'pid:[4026532722]'
17 lrwxrwxrwx 1 100 101 0 Oct 24 14:05 uts -> 'uts:[4026532716]'
1 # Inspect httpbin container.
2 $ sudo crictl inspect dfb1cd29ab750
3 {
4 ...
5 "namespaces": [
6 {
7 "type": "pid"
8 },
9 {
10 "type": "ipc",
11 "path": "/proc/4966/ns/ipc"
12 },
13 {
14 "type": "uts",
15 "path": "/proc/4966/ns/uts"
16 },
17 {
18 "type": "mount"
19 },
20 {
21 "type": "network",
22 "path": "/proc/4966/ns/net"
23 }
24 ],
25 ...
26 }
27
28 # Inspect sleep container.
29 $ sudo crictl inspect 097d4fe8a7002
30 ...
我认为上述发现完美的解释了同一个 Pod 中容器具有的能力:
能够互相通信
通过 localhost 和/或
使用 IPC(共享内存,消息队列等)
共享 domain 和 hostname
1 $ sudo systemd-cgls 2 Control group /: 3 -.slice 4 ├─kubepods 5 │ ├─burstable 6 │ │ ├─pod4a8d5c3e-3821-4727-9d20-965febbccfbb 7 │ │ │ ├─f0e87a93304666766ab139d52f10ff2b8d4a1e6060fc18f74f28e2cb000da8b2 8 │ │ │ │ └─4966 /pause 9 │ │ │ ├─dfb1cd29ab750064ae89613cb28963353c3360c2df913995af582aebcc4e85d8 10 │ │ │ │ ├─5001 /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 11 │ │ │ │ └─5016 /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 12 │ │ │ └─097d4fe8a7002d69d6c78899dcf6731d313ce8067ae3f736f252f387582e55ad 13 │ │ │ └─5035 /bin/sleep 3650d 14 ...
1 $ sudo apt-get install cgroup-tools
1 sudo cgcreate -g cpu,memory:/pod-foo
2
3 # Check if the corresponding folders were created:
4 ls -l /sys/fs/cgroup/cpu/pod-foo/
5 ls -l /sys/fs/cgroup/memory/pod-foo/
1 $ docker run -d --rm 2 --name foo_sandbox 3 --cgroup-parent /pod-foo 4 --ipc 'shareable' 5 alpine sleep infinity
1 # app (httpbin)
2 $ docker run -d --rm
3 --name app
4 --cgroup-parent /pod-foo
5 --network container:foo_sandbox
6 --ipc container:foo_sandbox
7 kennethreitz/httpbin
8
9 # sidecar (sleep)
10 $ docker run -d --rm
11 --name sidecar
12 --cgroup-parent /pod-foo
13 --network container:foo_sandbox
14 --ipc container:foo_sandbox
15 curlimages/curl sleep 365d
1 $ sudo systemd-cgls memory 2 Controller memory; Control group /: 3 ├─pod-foo 4 │ ├─488d76cade5422b57ab59116f422d8483d435a8449ceda0c9a1888ea774acac7 5 │ │ ├─27865 /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 6 │ │ └─27880 /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 7 │ ├─9166a87f9a96a954b10ec012104366da9f1f6680387ef423ee197c61d37f39d7 8 │ │ └─27977 sleep 365d 9 │ └─c7b0ec46b16b52c5e1c447b77d67d44d16d78f9a3f93eaeb3a86aa95e08e28b6 10 │ └─27743 sleep infinity
1 $ sudo lsns 2 NS TYPE NPROCS PID USER COMMAND 3 ... 4 4026532157 mnt 1 27743 root sleep infinity 5 4026532158 uts 1 27743 root sleep infinity 6 4026532159 ipc 4 27743 root sleep infinity 7 4026532160 pid 1 27743 root sleep infinity 8 4026532162 net 4 27743 root sleep infinity 9 4026532218 mnt 2 27865 root /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 10 4026532219 uts 2 27865 root /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 11 4026532220 pid 2 27865 root /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent 12 4026532221 mnt 1 27977 _apt sleep 365d 13 4026532222 uts 1 27977 _apt sleep 365d 14 4026532223 pid 1 27977 _apt sleep 365d
1 # app container
2 $ sudo ls -l /proc/27865/ns
3 lrwxrwxrwx 1 root root 0 Oct 28 07:56 ipc -> 'ipc:[4026532159]'
4 lrwxrwxrwx 1 root root 0 Oct 28 07:56 mnt -> 'mnt:[4026532218]'
5 lrwxrwxrwx 1 root root 0 Oct 28 07:56 net -> 'net:[4026532162]'
6 lrwxrwxrwx 1 root root 0 Oct 28 07:56 pid -> 'pid:[4026532220]'
7 lrwxrwxrwx 1 root root 0 Oct 28 07:56 uts -> 'uts:[4026532219]'
8
9 # sidecar container
10 $ sudo ls -l /proc/27977/ns
11 lrwxrwxrwx 1 _apt systemd-journal 0 Oct 28 07:56 ipc -> 'ipc:[4026532159]'
12 lrwxrwxrwx 1 _apt systemd-journal 0 Oct 28 07:56 mnt -> 'mnt:[4026532221]'
13 lrwxrwxrwx 1 _apt systemd-journal 0 Oct 28 07:56 net -> 'net:[4026532162]'
14 lrwxrwxrwx 1 _apt systemd-journal 0 Oct 28 07:56 pid -> 'pid:[4026532223]'
15 lrwxrwxrwx 1 _apt systemd-journal 0 Oct 28 07:56 uts -> 'uts:[4026532222]'
服务热线
1391-024-6332
Copyright 2015-2018 www.intsavi.com.cn All Rights Reserved
电话:010-62980070 010-62961051 手机:13910246332
版权所有北京赛维博信科技发展有限公司 备案号:京ICP备14043711号-1 京ICP备14043711号-3