How can more evenly distribute pod across nodes ? After quick research I found that this example of deployment should be ok:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: "app"
                      operator: In
                      values:
                      - nginx
                topologyKey: "kubernetes.io/hostname"

In this example we instruct kubernetes to not schedule pod when another pod of app=nginx exist on selected node. This is soft limit because of preferredDuringSchedulingIgnoredDuringExecution, so when there is no way to fullfill this requirement than you can break this rule. So lets check it out, I will create 3 node cluster using kind with config:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
$ kind create cluster --config kind.yaml
$ kubectl get nodes
NAME                 STATUS   ROLES                  AGE   VERSION
kind-control-plane   Ready    control-plane,master   85m   v1.20.2
kind-worker          Ready    <none>                 85m   v1.20.2
kind-worker2         Ready    <none>                 85m   v1.20.2
# remove taints from kind-control-plane
$ for i in $(seq 0 4); do kubectl apply -f test.yaml && kubectl wait --for=condition=available --timeout=600s deployment/nginx && kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName -l app=nginx && kubectl delete -f test.yaml && kubectl wait --for=delete --timeout=600s pod -l app=nginx; done
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-677d446dfc-6grxk   Running   kind-worker2
nginx-677d446dfc-hdmr2   Running   kind-worker
nginx-677d446dfc-q6g4d   Running   kind-control-plane
deployment.apps "nginx" deleted
pod/nginx-677d446dfc-6grxk condition met
pod/nginx-677d446dfc-hdmr2 condition met
pod/nginx-677d446dfc-q6g4d condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-677d446dfc-5v9jc   Running   kind-worker2
nginx-677d446dfc-gx8r4   Running   kind-control-plane
nginx-677d446dfc-lf579   Running   kind-worker
deployment.apps "nginx" deleted
pod/nginx-677d446dfc-5v9jc condition met
pod/nginx-677d446dfc-gx8r4 condition met
pod/nginx-677d446dfc-lf579 condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-677d446dfc-8rj4t   Running   kind-worker
nginx-677d446dfc-9rxc7   Running   kind-worker2
nginx-677d446dfc-m5xsh   Running   kind-control-plane
deployment.apps "nginx" deleted
pod/nginx-677d446dfc-8rj4t condition met
pod/nginx-677d446dfc-9rxc7 condition met
pod/nginx-677d446dfc-m5xsh condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-677d446dfc-8xtwx   Running   kind-worker
nginx-677d446dfc-jsn86   Running   kind-control-plane
nginx-677d446dfc-mqhkx   Running   kind-worker2
deployment.apps "nginx" deleted
pod/nginx-677d446dfc-8xtwx condition met
pod/nginx-677d446dfc-jsn86 condition met
pod/nginx-677d446dfc-mqhkx condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-677d446dfc-2hzcw   Running   kind-control-plane
nginx-677d446dfc-mdsgg   Running   kind-worker2
nginx-677d446dfc-rf6kx   Running   kind-worker
deployment.apps "nginx" deleted
pod/nginx-677d446dfc-2hzcw condition met
pod/nginx-677d446dfc-mdsgg condition met
pod/nginx-677d446dfc-rf6kx condition met

It works pretty good compare to deployment without affinity:

$ for i in $(seq 0 4); do kubectl apply -f test.yaml && kubectl wait --for=condition=available --timeout=600s deployment/nginx && kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName -l app=nginx && kubectl delete -f test.yaml && kubectl wait --for=delete --timeout=600s pod -l app=nginx; done
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-55649fd747-gxdc7   Running   kind-worker
nginx-55649fd747-h7b8k   Running   kind-worker2
nginx-55649fd747-z4jrk   Running   kind-worker
deployment.apps "nginx" deleted
pod/nginx-55649fd747-gxdc7 condition met
pod/nginx-55649fd747-h7b8k condition met
pod/nginx-55649fd747-z4jrk condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-55649fd747-cf4dx   Running   kind-worker2
nginx-55649fd747-jlc7t   Running   kind-worker
nginx-55649fd747-rz7xh   Running   kind-worker
deployment.apps "nginx" deleted
pod/nginx-55649fd747-cf4dx condition met
pod/nginx-55649fd747-jlc7t condition met
pod/nginx-55649fd747-rz7xh condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-55649fd747-4znvz   Running   kind-worker
nginx-55649fd747-67vm6   Running   kind-worker2
nginx-55649fd747-wn97t   Running   kind-worker
deployment.apps "nginx" deleted
pod/nginx-55649fd747-4znvz condition met
pod/nginx-55649fd747-67vm6 condition met
pod/nginx-55649fd747-wn97t condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-55649fd747-vxpq6   Running   kind-worker2
nginx-55649fd747-xq7cr   Running   kind-worker
nginx-55649fd747-xs7qr   Running   kind-worker
deployment.apps "nginx" deleted
pod/nginx-55649fd747-vxpq6 condition met
pod/nginx-55649fd747-xq7cr condition met
pod/nginx-55649fd747-xs7qr condition met
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                     STATUS    NODE
nginx-55649fd747-6hnn8   Running   kind-worker
nginx-55649fd747-9kwgk   Running   kind-worker
nginx-55649fd747-djjqr   Running   kind-worker2
deployment.apps "nginx" deleted
pod/nginx-55649fd747-6hnn8 condition met
pod/nginx-55649fd747-9kwgk condition met
pod/nginx-55649fd747-djjqr condition met

So what's happen under the hood ? Kubernetes is using kube-scheduler to assign pods to nodes. We can customize configuration using plugins and policies. In this particular example we take a look at plugins. At different extension point of pod scheduling cycle kube-scheduler is using default plugins. We can disable/enable plugin at different extension point, further we can group it into different profile, which can be used in podspec manifest. First of all, let's take example from Multiple profiles and apply it into current kind cluster:

# copy current kube-scheduler pod manifest
$ docker cp kind-control-plane:/etc/kubernetes/manifests/kube-scheduler.yaml .
$ cat kube-scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
  - schedulerName: default-scheduler
  - schedulerName: no-scoring-scheduler
    plugins:
      preScore:
        disabled:
        - name: '*'
      score:
        disabled:
        - name: '*'
# modify kube-scheduler pod manifest
@@ -15,6 +15,7 @@
     - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
     - --bind-address=127.0.0.1
     - --kubeconfig=/etc/kubernetes/scheduler.conf
+    - --config=/usr/local/etc/kube-scheduler-config.yaml
     - --leader-elect=true
     - --port=0
     image: k8s.gcr.io/kube-scheduler:v1.20.2
@@ -47,6 +48,9 @@
     - mountPath: /etc/kubernetes/scheduler.conf
       name: kubeconfig
       readOnly: true
+    - mountPath: /usr/local/etc/kube-scheduler-config.yaml
+      name: kube-scheduler-config
+      readOnly: true
   hostNetwork: true
   priorityClassName: system-node-critical
   volumes:
@@ -54,4 +58,8 @@
       path: /etc/kubernetes/scheduler.conf
       type: FileOrCreate
     name: kubeconfig
+  - hostPath:
+      path: /etc/kubernetes/kube-scheduler-config.yaml
+      type: FileOrCreate
+    name: kube-scheduler-config
 status: {}
# upload config and manifest
$ docker cp kube-scheduler-config.yaml kind-control-plane:/etc/kubernetes/kube-scheduler-config.yaml
$ docker cp kube-scheduler.yaml kind-control-plane:/etc/kubernetes/manifests/kube-scheduler.yaml
# verify that kubelet restart kube-scheduler pod
$ docker exec -ti kind-control-plane bash -c "ps ax | grep -i kube-scheduler"
   7556 ?        Ssl    0:44 kube-scheduler --authentication-kubeconfig=/etc/kubernetes/scheduler.conf --authorization-kubeconfig=/etc/kubernetes/scheduler.conf --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --config=/usr/local/etc/kube-scheduler-config.yaml --leader-elect=true --port=0

Now we have new schedulerName called no-scoring-scheduler we can put it into our test deployment:

# diff
       labels:
         app: nginx
     spec:
+      schedulerName: no-scoring-scheduler
       containers:
       - name: nginx
         image: nginx:latest

all defined affinity rules won't be considered:

$ kubectl apply -f test.yaml && kubectl wait --for=condition=available --timeout=600s deployment/nginx && kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName -l app=nginx && kubectl delete -f test.yaml && kubectl wait --for=delete --timeout=600s pod -l app=nginx
deployment.apps/nginx created
deployment.apps/nginx condition met
NAME                   STATUS    NODE
nginx-dc7fc7d7-6xppv   Running   kind-worker
nginx-dc7fc7d7-cflgc   Running   kind-worker
nginx-dc7fc7d7-d89hh   Running   kind-control-plane
deployment.apps "nginx" deleted
pod/nginx-dc7fc7d7-6xppv condition met
pod/nginx-dc7fc7d7-cflgc condition met
pod/nginx-dc7fc7d7-d89hh condition met

because we disabled all plugins. Which plugin is dealing with affinity ? Quick take a look:

InterPodAffinity: Implements inter-Pod affinity and anti-affinity. Extension points: PreFilter, Filter, PreScore, Score

At Score extension point each plugin return its computed value. Digging into source code of this particular plugin, when pod antiaffinity is set and selected node contains the same pods (labels) it count score for this node as:

weight(100) * -1 = -100

score -100 is low so this node rather won't be used to assign this pod. Next time I will try to dig into kube-scheduler policies.