k8s中亲和性与反亲和性
Kubernetes
的默认调度器以预选、优选、选定机制完成将每个新的Pod
资源绑定至为其选出的目标节点上,不过,它只是Pod
对象的默认调度器,默认情况下调度器考虑的是资源足够,并且负载尽量平均。
在使用中,用户还可以自定义调度器插件,并在定义Pod
资源配置清单时通过spec.schedulerName
指定即可使
一、node亲和性
NodeAffinity
意为Node
节点亲和性的调度策略,是用于替换NodeSelector
的全新调度策略。
定义节点亲和性规则时有两种类型的节点亲和性规则 :硬亲和性required
和软亲和性preferred
。 硬亲和性实现的是强制性规则,它是Pod
调度时必须要满足的规则,而在不存在满足规则的节点时 ,Pod
对象会被置为Pending
状态。 而软亲和性规则实现的是一种柔性调度限制,它倾向于将Pod
对象运行于某类特定的节点之上,而调度器也将尽量满足此需求,但在无法满足调度需求时它将退而求其次地选择一个不匹配规则的节点
1.1、nodeSelector
对于最初的k8s实现pod指定node调度时使用nodeSelector
来实现的,主要是通过定义node以及pod标签进行选择,具体实现如下:
为节点添加label标签[root@node1 ~]# kubectl label node node2 app=webnode/node2 labeled######定义deployment启动pod,如下:---apiVersion: apps/v1kind: Deploymentmetadata:name: deploylabels:app: webspec:replicas: 13selector:matchLabels:app: webtemplate:metadata:labels:app: webspec:nodeSelector:app: web ###选择标签containers:- name: nginx-deployimage: nginx:latestimagePullPolicy: IfNotPresent[root@node1 ~]# kubectl apply -f deploy-pod.yaml deployment.apps/deploy created[root@node1 ~]# ########查看pod启动所在的节点,如下:[root@node1 ~]# kubectl get pod -o wideNAME READY STATUS RESTARTS AGEIP NODE NOMINATED NODE READINESS GATESdeploy-6cb97b569b-292kw 1/1Running 011s172.25.104.62 node2 <none> <none>deploy-6cb97b569b-2qbfm 1/1Running 011s172.25.104.52 node2 <none> <none>deploy-6cb97b569b-58px4 1/1Running 011s172.25.104.54 node2 <none> <none>deploy-6cb97b569b-7cmqv 1/1Running 011s172.25.104.56 node2 <none> <none>deploy-6cb97b569b-cmq74 1/1Running 011s172.25.104.57 node2 <none> <none>deploy-6cb97b569b-cpv8x 1/1Running 011s172.25.104.59 node2 <none> <none>deploy-6cb97b569b-d9hwz 1/1Running 011s172.25.104.63 node2 <none> <none>deploy-6cb97b569b-f2zwf 1/1Running 011s172.25.104.60 node2 <none> <none>deploy-6cb97b569b-f6hbl 1/1Running 011s172.25.104.61 node2 <none> <none>deploy-6cb97b569b-kz46f 1/1Running 011s172.25.104.58 node2 <none> <none>deploy-6cb97b569b-mjmnv 1/1Running 011s172.25.104.55 node2 <none> <none>deploy-6cb97b569b-nkdwm 1/1Running 011s172.25.104.51 node2 <none> <none>deploy-6cb97b569b-tg7qc 1/1Running 011s172.25.104.53 node2 <none> <none>
1.2、node硬亲和性
为Pod
对象使用nodeSelector
属性可以基于节点标签匹配的方式将Pod
对象强制调度至某一类特定的节点之上 ,不过它仅能基于简单的等值关系定义标签选择器,而nodeAffinity
中支持使用matchExpressions
属性构建更为复杂的标签选择机制。
---apiVersion: apps/v1kind: Deploymentmetadata:name: deploylabels:app: webspec:replicas: 6selector:matchLabels:app: webtemplate:metadata:labels:app: webspec:containers:- name: nginx-deployimage: nginx:latestimagePullPolicy: IfNotPresentaffinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution: # 硬策略nodeSelectorTerms:- matchExpressions:- key: appoperator: Invalues:- web######为node添加标签,如下:[root@node1 ~]# kubectl label node node2 app=webnode/node2 labeled######启动pod,如下pod都起在node上:[root@node1 ~]# kubectl apply -f deploy-pod.yaml deployment.apps/deploy created[root@node1 ~]# kubectl get pod -o wideNAME READY STATUS RESTARTS AGEIP NODE NOMINATED NODE READINESS GATESdeploy-66747445f7-25pwb 1/1Running 014s172.25.104.5 node2 <none> <none>deploy-66747445f7-4qdf7 1/1Running 014s172.25.104.10 node2 <none> <none>deploy-66747445f7-f24bq 1/1Running 014s172.25.104.20 node2 <none> <none>deploy-66747445f7-f9vbq 1/1Running 014s172.25.104.34 node2 <none> <none>deploy-66747445f7-gx4mq 1/1Running 014s172.25.104.28 node2 <none> <none>deploy-66747445f7-zwtc8 1/1Running 014s172.25.104.33 node2 <none> <none>
在定义节点亲和性时,requiredDuringSchedulinglgnoredDuringExecution
字段的值是一个对象列表,用于定义节点硬亲和性,它可由一到多个nodeSelectorTerm
定义的对象组成, 彼此间为“逻辑或”的关系,进行匹配度检查时,在多个nodeSelectorTerm
之间只要满足其中之一 即可。
preferredDuringSchedulingIgnoredDuringExecution
和requiredDuringSchedulingIgnoredDuringExecution
名字中的后半段符串IgnoredDuringExecution
隐含的意义所指,在Pod
资源基于节点亲和性规则调度至某节点之后,节点标签发生了改变而不再符合此节点亲和性规则时 ,调度器不会将Pod
对象从此节点上移出,因为,它仅对新建的Pod
对象生效。
nodeSelectorTerm
用于定义节点选择器条目,其值为对象列表,它可由一个或多个matchExpressions
对象定义的匹配规则组成,多个规则彼此之间为“逻辑与”的关系, 这就意味着某节点的标签需要完全匹配同一个nodeSelectorTerm
下所有的matchExpression
对象定义的规则才算成功通过节点选择器条目的检查。而matchExmpressions
又可由 一到多 个标签选择器组成,多个标签选择器彼此间为“逻辑与”的关系 。
---apiVersion: apps/v1kind: Deploymentmetadata:name: deploylabels:app: webspec:replicas: 6selector:matchLabels:app: webtemplate:metadata:labels:app: webspec:containers:- name: nginx-deployimage: nginx:latestimagePullPolicy: IfNotPresentaffinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution: # 硬策略nodeSelectorTerms:- matchExpressions:- key: appoperator: Invalues:- server- web#######为node2和node3打标签[root@node1 ~]# kubectl label node node2 app=webnode/node2 labeled[root@node1 ~]# kubectl label node node3 app=servernode/node3 labeled########启动pod,如下会在node2和node3上调度[root@node1 ~]# kubectl get pod -o wideNAME READY STATUS RESTARTS AGEIP NODE NOMINATED NODE READINESS GATESdeploy-d78b4d4d9-5hb4k1/1Running 02m17s 172.25.104.47 node2 <none> <none>deploy-d78b4d4d9-l8tjk1/1Running 02m17s 172.25.135.61 node3 <none> <none>deploy-d78b4d4d9-mcvsk1/1Running 02m17s 172.25.135.60 node3 <none> <none>deploy-d78b4d4d9-mj7gk1/1Running 02m17s 172.25.104.43 node2 <none> <none>deploy-d78b4d4d9-r5xqn1/1Running 02m17s 172.25.104.45 node2 <none> <none>deploy-d78b4d4d9-zl6841/1Running 02m17s 172.25.104.46 node2 <none> <none>
构建标签选择器表达式中支持使用操作符有In
、Notln
、Exists
、DoesNotExist
、Lt
和Gt
等
In:label
的值在某个列表中NotIn:label
的值不在某个列表中Gt:label
的值大于某个值Lt:label
的值小于某个值Exists:某个label
存在 #####values为任意值。DoesNotExist:某个label
不存在
1.3、node软亲和性
节点软亲和性为节点选择机制提供了一种柔性控制逻辑,被调度的Pod
对象不再是“必须”而是“应该”放置于某些特定节点之上,当条件不满足时它也能够接受被编排于其他不符合条件的节点之上。另外,它还为每种倾向性提供了weight
属性以便用户定义其优先级,取值范围是1 ~ 100
,数字越大优先级越高 。
---apiVersion: apps/v1kind: Deploymentmetadata:name: deploylabels:app: webspec:replicas: 6selector:matchLabels:app: webtemplate:metadata:labels:app: webspec:containers:- name: nginx-deployimage: nginx:latestimagePullPolicy: IfNotPresentaffinity:nodeAffinity:preferredDuringSchedulingIgnoredDuringExecution:- weight: 60###设置app=web的权重为60preference:matchExpressions:- key: appoperator: Invalues:- web- weight: 40 ###设置app=server的权重为40preference:matchExpressions:- key: appoperator: Invalues:- server#####启动pod如下,会发现大部分pod在app=web的node上[root@node1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGEIP NODE NOMINATED NODE READINESS GATESdeploy-55bf777f76-5466z 1/1Running 09m9s 172.25.104.50 node2 <none> <none>deploy-55bf777f76-62rrz 1/1Running 09m9s 172.25.104.49 node2 <none> <none>deploy-55bf777f76-bf9bn 1/1Running 09m9s 172.25.104.48 node2 <none> <none>deploy-55bf777f76-lx5pz 1/1Running 09m9s 172.25.104.53 node2 <none> <none>deploy-55bf777f76-s78v5 1/1Running 09m9s 172.25.135.62 node3 <none> <none>deploy-55bf777f76-t9cw9 1/1Running 09m9s 172.25.104.63 node2 <none> <none>
二、pod亲和性
Pod
亲和性指的是满足特定条件的的Pod
对象运行在同一个node上, 而反亲和性调度则要求它们不能运行于同一node 。
2.1、pod硬亲和性
Pod
强制约束的亲和性调度也使用requiredDuringSchedulinglgnoredDuringExecution
属性进行定
---apiVersion: apps/v1kind: Deploymentmetadata:name: deploylabels:app: webspec:replicas: 6selector:matchLabels:app: webtemplate:metadata:labels:app: webspec:containers:- name: nginx-deployimage: nginx:latestimagePullPolicy: IfNotPresentaffinity:podAffinity:requiredDuringSchedulingIgnoredDuringExecution:- labelSelector:matchExpressions:- key: appoperator: Invalues:- webtopologyKey: kubernetes.io/hostname######启动pod[root@node1 ~]# kubectl apply -f deploy-pod-Affinity.yaml deployment.apps/deploy created######查看如下,所有pod会在一台node上启动[root@node1 ~]# kubectl get po -o wideNAME READY STATUS RESTARTS AGE IPNODE NOMINATED NODE READINESS GATESdeploy-5b9d5b8b48-2h6qb 1/1Running 06s 172.25.166.150 node1 <none> <none>deploy-5b9d5b8b48-8dbtf 1/1Running 06s 172.25.166.151 node1 <none> <none>deploy-5b9d5b8b48-9b895 1/1Running 06s 172.25.166.154 node1 <none> <none>deploy-5b9d5b8b48-cngp7 1/1Running 06s 172.25.166.149 node1 <none> <none>deploy-5b9d5b8b48-qpp9n 1/1Running 06s 172.25.166.152 node1 <none> <none>deploy-5b9d5b8b48-ww7jk 1/1Running 06s 172.25.166.153 node1 <none> <none>
在调度示例中的Deployment
控制器创建的Pod
资源时,调度器首先会基于标签选择器 查询拥有标签app=db
的所有Pod
资源,接着获取到它们分别所属 的节点的zone
标签值,接下来再查询拥有匹配这些标签值的所有节点,从而完成节点预选。而后根据优选函数计算这些节点的优先级,从而挑选出运行新建Pod
对象的节点。
2.2、pod软亲和性
类似于节点亲和性机制,Pod
也支持使用preferredDuringSchedulinglgnoredDuringExecution
属性定义柔性亲和机制,调度器会尽力确保满足亲和约束的调度逻辑,然而在约束条 件不能得到满足时,它也允许将Pod
对象调度至其他节点运行。下面是一个使用了Pod
软亲和性调度机制的资源配置清单示例
apiVersion: apps/v1kind: Deploymentmetadata:name: app-affinityspec:replicas: 5selector:matchLabels:app: myapptemplate:metadata:name: myapplabels:app: myappspec:affinity:podAffinity:preferredDuringSchedulingIgnoredDuringExecution:- weight: 80podAffinityTerm:labelSelector:matchExpressions:- {key: app, operator: In, values: ["nginx"]}topologyKey: zone- weight: 20podAffinityTerm:labelSelector:matchExpressions:- {key: app, operator: In, values: ["apach"]}topologyKey: zonecontainers:- name: nginximage: nginx######启动pod[root@node1 ~]# kubectl apply -f pod-soft.yaml deployment.apps/app-affinity created#####如下:[root@node1 ~]# kubectl get po -o wideNAME READY STATUS RESTARTS AGE IPNODE NOMINATED NODE READINESS GATESapp-affinity-66fbb677c7-2kwjf 1/1Running 03s 172.25.166.158 node1 <none> <none>app-affinity-66fbb677c7-5thfw 1/1Running 03s 172.25.166.155 node1 <none> <none>app-affinity-66fbb677c7-drdml 1/1Running 03s 172.25.166.159 node1 <none> <none>app-affinity-66fbb677c7-qq9fn 1/1Running 03s 172.25.166.156 node1 <none> <none>app-affinity-66fbb677c7-vq4jg 1/1Running 03s 172.25.166.157 node1 <none> <none>
它定义了两组亲和性判定机制,一个是选择nginx Pod
所在节点的zone
标签,并赋予了较高的权重80
,另一个是选择apach Pod
所在节点的zone
标签,它有着略低的权重20
。于是,调度器会将目标节点分为四类 :nginx Pod
和apach Pod
同时所属的zone
、nginx Pod
单独所属的zone
、apach Pod
单独所属的zone
,以及其他所有的zone
。
2.3、pod反亲和性
podAffinity
用于定义Pod
对象的亲和约束,对应地,将其替换为podAntiAffinty
即可用于定义Pod
对象的反亲和约束。不过,反亲和性调度一般用于分散同一类应用的Pod
对象等,也包括将不同安全级别的Pod
对象调度至不同的区域、机架或节点等。下面的资源配置清单中定义了由同一Deployment
创建但彼此基于节点位置互斥的Pod
对象:
---apiVersion: apps/v1kind: Deploymentmetadata:name: nginxspec:selector:matchLabels:app: nginxreplicas: 4template:metadata:labels:app: nginxspec:affinity:podAntiAffinity:requiredDuringSchedulingIgnoredDuringExecution:- labelSelector:matchExpressions:- key: appoperator: Invalues:- nginxtopologyKey: "kubernetes.io/hostname"containers:- name: nginx-serverimage: nginx:latest######如下:启动了4个副本,只有三台node,只有一个pod无法调度,其他pod都是分布在不同node上。[root@node1 yaml]# kubectl get pod -o wideNAME READY STATUS RESTARTS AGEIPNODENOMINATED NODE READINESS GATESnginx-86d6477c48-7mmt2 1/1Running 02m18s 172.25.104.61 node2 <none> <none>nginx-86d6477c48-f2z2c 1/1Running 02m18s 172.25.135.38 node3 <none> <none>nginx-86d6477c48-nv5x2 1/1Running 02m18s 172.25.166.143 node1 <none> <none>nginx-86d6477c48-wsw4x 0/1Pending 02m18s <none> <none> <none> <none>
如果觉得《k8s中亲和性与反亲和性》对你有帮助,请点赞、收藏,并留下你的观点哦!