Kubernetes | Technology Radar

Adopt

Since Jul 2024

Kubernetes

Kubernetes remains our standard container orchestration platform. Recent improvements include migration to Cilium for eBPF-based networking, implementation of Crossplane for infrastructure-as-code within K8s, and adoption of Kyverno for policy enforcement. Our platform team now supports 50+ production services across all teams, with observability powered by OpenTelemetry and Dash0. We are actively consolidating AWS Lambda workloads onto the platform.

Platform Overview

We run Kubernetes on AWS EKS across three environments (dev, staging, production) with the following topology:

Production: 3 node groups (general, compute-optimized, GPU) across 3 availability zones
Staging: mirrors production topology at reduced scale
Dev: single node group with spot instances for cost efficiency

Each team operates in dedicated namespaces with resource quotas and network policies enforced by Kyverno. The platform team maintains shared infrastructure (ingress, cert-manager, external-dns, monitoring stack) via Helm charts in a GitOps repository.

Key Infrastructure Decisions

Networking: Cilium

We migrated from AWS VPC CNI to Cilium in Q1 2025. The primary drivers were:

eBPF-based observability — Hubble provides network flow visibility without sidecar overhead
Network policies at L7 — HTTP-aware policies replace our previous Istio service mesh for east-west traffic control
Performance — eBPF dataplane reduced pod-to-pod latency by ~15% compared to iptables-based routing

Policy Enforcement: Kyverno

Kyverno replaced OPA Gatekeeper for policy enforcement:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-limits
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        message: "CPU and memory limits are required."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

Our policy library enforces: resource limits, image provenance (only our ECR registries), label standards, PodDisruptionBudget presence for production workloads, and prohibition of latest tags.

GitOps: Flux

All cluster state is managed via Flux CD pointing at our infrastructure monorepo. The reconciliation loop ensures drift detection — any manual kubectl apply is automatically reverted within 5 minutes.

Scaling Strategy

Dimension	Approach	Tool
Horizontal Pod	Request-based + custom metrics	KEDA
Vertical Pod	Right-sizing recommendations	VPA (recommend mode)
Cluster Nodes	Bin-packing aware	Karpenter
Cost	Spot instances for non-critical	Karpenter + spot interruption handling

Karpenter replaced Cluster Autoscaler in Q2 2025, reducing node provisioning time from ~90s to ~30s and improving bin-packing efficiency by 25%.

Observability Integration

Every pod gets automatic instrumentation via OpenTelemetry Collector running as a DaemonSet:

Metrics → Dash0 (via OTLP)
Traces → Dash0 (via OTLP)
Logs → Dash0 (via FluentBit → OTLP)

Kubernetes events and audit logs feed into our security monitoring pipeline for anomaly detection.

What's Next

We are evaluating Gateway API to replace our Ingress-based routing (more expressive, better multi-team support), and exploring Kubernetes VCluster for ephemeral preview environments to replace our current branch-based namespace approach.

Links

Official Website
GitHub Repository
Amazon EKS

Edit

Jul 2025

Kubernetes remains our standard container orchestration platform. Recent improvements include migrat…

Jan 2025

Kubernetes adoption is now organization-wide. All four teams run production workloads on our multi-c…

taycan

Jul 2024

Trial→Adopt

Kubernetes has moved to adopt. All production workloads now run on managed EKS clusters with Istio s…

cayenne

Jan 2024

Trial

Kubernetes is being trialed as our container orchestration platform. The 911 team has migrated two m…

911macan