Production-Ready Kubernetes CI/CD with Security, Stateful Workloads & Workload Isolation

Modern Kubernetes platforms require more than just deployments—they demand secure CI/CD pipelines, workload isolation, and reliable handling of stateful applications.

In real enterprise environments, teams use GitOps-based CI/CD, automated security gates, and progressive delivery to release applications safely without downtime. Workload isolation using namespaces, RBAC, and network policies reduces blast radius, while StatefulSets, PVCs, and backup strategies ensure data integrity for databases and queues.

This architecture is widely adopted by SaaS and fintech companies to accelerate releases, meet compliance requirements, and protect production systems—even as teams and workloads scale rapidly

Below is a real, end-to-end production-style example you can copy and adapt.

It includes:

  • CI/CD with GitHub Actions
  • GitOps CD with Argo CD
  • Progressive delivery (Canary) with Argo Rollouts
  • Workload isolation (Namespace + RBAC + NetworkPolicy)
  • Stateful workload (PostgreSQL StatefulSet + PVC)
  • Security guardrails (Kyverno policies)

Automated CI/CD: build → test → scan → deploy → verify → rollback

Secure delivery: image scanning, policy controls, least privilege

Stateful workloads: PVC, backup/restore, safe upgrades

Isolation: namespaces, network policies, admission policies, RBAC

Phase 1 — Platform Foundation (One-time setup)

Step 1: Create Cluster Baseline

  • Provision EKS/AKS/GKE (or self-managed) with:
    • multiple node pools (system, apps, data if needed)
    • cluster autoscaler + metrics-server
    • central logging/monitoring stack (or plan it)

Deliverable: Stable cluster + node groups + basic add-ons.

Step 2: Define Environments

Minimum:

  • dev, stage, prod
    Use separate namespaces per environment (or separate clusters for strong isolation).

Deliverable: Namespace structure + environment standards.

Phase 2 — GitOps-First CI/CD (Recommended production model)

Step 3: Split Repos (Best Practice)

  • app-repo: source code + Dockerfile + tests + CI workflows
  • gitops-repo: Helm/Kustomize manifests per environment

Deliverable: Repo layout that supports auditability and safe promotions.

1) app-repo (source + Dockerfile + tests + CI)

app-repo/
  src/
    index.js
  test/
    app.test.js
  package.json
  package-lock.json
  Dockerfile
  .dockerignore
  .github/
    workflows/
      ci-build-push-and-update-gitops.yml

src/index.js (example Node API)

const express = require("express");
const app = express();

app.get("/healthz", (_req, res) => res.status(200).send("ok"));
app.get("/", (_req, res) => res.json({ service: "demo-api", status: "running" }));

const port = process.env.PORT || 3000;
app.listen(port, () => console.log(`listening on ${port}`));
module.exports = app;

test/app.test.js (Jest + Supertest)

const request = require("supertest");
const app = require("../src/index");

describe("health", () => {
  it("GET /healthz => 200", async () => {
    const res = await request(app).get("/healthz");
    expect(res.statusCode).toBe(200);
    expect(res.text).toBe("ok");
  });
});

package.json

{
  "name": "demo-api",
  "version": "1.0.0",
  "main": "src/index.js",
  "scripts": {
    "test": "jest --runInBand",
    "start": "node src/index.js"
  },
  "dependencies": {
    "express": "^4.19.2"
  },
  "devDependencies": {
    "jest": "^29.7.0",
    "supertest": "^7.0.0"
  }
}

Dockerfile (multi-stage, non-root)

# ---- build deps ----
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev

# ---- runtime ----
FROM node:20-alpine
WORKDIR /app

# Create non-root user
RUN addgroup -S app && adduser -S app -G app

COPY --from=deps /app/node_modules ./node_modules
COPY src ./src
COPY package.json ./

USER app
ENV NODE_ENV=production
EXPOSE 3000
CMD ["node", "src/index.js"]

gitops-repo (Helm/Kustomize per environment)

gitops-repo/
  apps/
    demo-api/
      chart/
        Chart.yaml
        values.yaml
        templates/
          deployment.yaml
          service.yaml
          ingress.yaml
      envs/
        dev/
          values.yaml
          kustomization.yaml
        stage/
          values.yaml
          kustomization.yaml
        prod/
          values.yaml
          kustomization.yaml
  clusters/
    dev/
      demo-api-argocd-app.yaml
    stage/
      demo-api-argocd-app.yaml
    prod/
      demo-api-argocd-app.yaml

Step 4: Container Registry

  • ECR/GCR/ACR
  • Enforce immutable tags (commit SHA)
  • Enable retention policies and optional replication

Deliverable: Secure registry + naming conventions.

ECR.tf(Terraform code)

resource "aws_ecr_repository" "app" {
  name                 = "demo-api"
  image_tag_mutability = "IMMUTABLE"   # 🔒 ENFORCE IMMUTABLE TAGS

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }

  tags = {
    Application = "demo-api"
    Environment = "shared"
    ManagedBy   = "terraform"
  }
}

Phase 3 — CI Pipeline (Build, Test, Scan, Sign)

Step 5: Build & Unit Test

In CI (GitHub Actions/GitLab/Jenkins):

  • lint + unit tests
  • build Docker image
  • tag with sha-<commit> or ${GITHUB_SHA}

Create: .github/workflows/ci.yml

name: CI - Build Test Scan Push + GitOps Update

on:
  push:
    branches: ["main"]

env:
  AWS_REGION: ap-south-1
  ECR_REPO: orders-api
  GITOPS_REPO: jagan-rajagopal/orders-gitops
  GITOPS_PATH: apps/orders-api/envs/prod

jobs:
  build_and_release:
    runs-on: ubuntu-latest
    permissions:
      contents: read

    steps:
      - name: Checkout app repo
        uses: actions/checkout@v4

      # If you use AWS OIDC, configure credentials here.
      # Keeping simple with static credentials usage pattern omitted.

      - name: Login to ECR
        run: |
          aws ecr get-login-password --region "${AWS_REGION}" \
          | docker login --username AWS --password-stdin "<ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com"

      - name: Build image
        run: |
          IMAGE_TAG="${GITHUB_SHA}"
          docker build -t "${ECR_REPO}:${IMAGE_TAG}" .
          docker tag "${ECR_REPO}:${IMAGE_TAG}" "<ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:${IMAGE_TAG}"

      - name: Run tests
        run: |
          # Replace with your real test command
          echo "Running unit tests..."
          # npm ci && npm test

      - name: Trivy scan (fail on HIGH/CRITICAL)
        uses: aquasecurity/trivy-action@0.24.0
        with:
          image-ref: "<ACCOUNT_ID>.dkr.ecr.${{ env.AWS_REGION }}.amazonaws.com/${{ env.ECR_REPO }}:${{ github.sha }}"
          format: table
          severity: HIGH,CRITICAL
          exit-code: 1

      - name: Push image
        run: |
          docker push "<ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:${GITHUB_SHA}"

      # ---- GitOps update (image tag bump) ----
      - name: Checkout GitOps repo
        uses: actions/checkout@v4
        with:
          repository: ${{ env.GITOPS_REPO }}
          token: ${{ secrets.GITOPS_TOKEN }}
          path: gitops

      - name: Update image tag in GitOps (Kustomize)
        run: |
          cd gitops/${GITOPS_PATH}
          # Replace image tag in rollout manifest (simple approach)
          sed -i "s|image: .*orders-api:.*|image: <ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:${GITHUB_SHA}|g" ../../base/rollout-canary.yaml
          git config user.email "ci-bot@awstrainingwithjagan.com"
          git config user.name "ci-bot"
          git add ../../base/rollout-canary.yaml
          git commit -m "Release orders-api ${GITHUB_SHA}" || echo "No changes to commit"
          git push

Notes:

  • Replace <ACCOUNT_ID> and ensure AWS creds are configured (OIDC recommended).
  • GITOPS_TOKEN should be a PAT with access to the GitOps repo.

Deliverable: Reproducible builds.

Step 6: Security Gates (Mandatory)

Add automated checks:

  • SAST (code scanning)
  • dependency scan (SBOM)
  • container scan (Trivy/Grype)
  • fail build on High/Critical (policy-based)

Static Application Security Testing (SAST – Semgrep)

.semgrep.yml

rules:
  - id: no-hardcoded-secrets
    pattern: |
      $KEY = "..."
    message: "Hardcoded secret detected"
    severity: ERROR
    languages: [javascript]

CI Step

- name: Run Semgrep (SAST)
  uses: returntocorp/semgrep-action@v1
  with:
    config: .semgrep.yml

Deliverable: “No scan, no release” rule.

Step 7: Push Image + Generate SBOM

  • Push image to registry
  • Store SBOM as build artifact
  • Optional: sign image (Cosign)

Deliverable: Trusted artifact pipeline.


Phase 4 — CD Pipeline (Deploy with GitOps + Progressive Delivery)

Step 8: Update GitOps Manifests (Promotion model)

CI updates gitops-repo with the new image tag:

  • dev auto-deploy on merge
  • stage deploy on release candidate tag
  • prod deploy after approval

Deliverable: Controlled promotion from dev → prod.

Step 9: Argo CD (or Flux) Deployment

  • Argo watches gitops repo
  • syncs manifests automatically
  • supports rollbacks to last known good version
  • Argo CD watches gitops-repo and applies the environment overlay path.
  • clusters/dev/demo-api-argocd-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: demo-api-dev
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/YOUR_ORG/gitops-repo.git
    targetRevision: main
    path: apps/demo-api/envs/dev
  destination:
    server: https://kubernetes.default.svc
    namespace: demo-dev
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Deliverable: Git is the source of truth for deployments.

Create similar files for stage/prod pointing to:

  • apps/demo-api/envs/stage
  • apps/demo-api/envs/prod

Step 10: Progressive Delivery

Choose one:

  • RollingUpdate for baseline
  • Blue-Green or Canary using Argo Rollouts (recommended)

Add automated analysis:

  • if error rate/latency rises → rollback

Deliverable: Safer releases, reduced incidents.


Phase 5 — Workload Isolation (Multi-tenant safety)

Step 11: Namespace Isolation + RBAC

  • Separate namespaces per team/app/env
  • Restrict access by RBAC roles
  • Use ServiceAccounts per workload (no default SA)

Deliverable: Least privilege and safer multi-team operations.

Step 12: Network Policies (Zero-trust inside cluster)

  • Default deny ingress/egress per namespace
  • Allow only required service-to-service paths
  • Restrict access to database namespaces

Deliverable: Lateral movement prevention.

Step 13: Admission Policies (Guardrails)

Implement one:

  • Kyverno (policy as code) OR Gatekeeper (OPA)
    Policies to enforce:
  • no privileged pods
  • no hostPath
  • require resource limits
  • require readOnlyRootFilesystem
  • disallow latest tag
  • enforce signed images (optional advanced)

Deliverable: Cluster-wide security enforcement.


Phase 6 — Stateful Workloads (Databases, queues, durable storage)

Step 14: Storage Architecture

  • Use StorageClasses (EBS gp3 / managed storage)
  • PVC + StatefulSets for stateful services
  • Enable zone-aware scheduling if multi-AZ

Deliverable: Reliable persistent storage model.

Step 15: Stateful Deploy Patterns

For DB/queues in Kubernetes:

  • StatefulSet + PVC
  • PodDisruptionBudget
  • Anti-affinity (spread pods)
  • Backup schedules

Deliverable: Stateful workloads that survive node disruptions.

Step 16: Backup & Restore (Non-negotiable)

Use:

  • Velero for K8s objects
  • CSI snapshots for volumes
  • Database-native backups (logical + PITR)

Test restores regularly in staging.

Deliverable: Verified recovery capability.


Phase 7 — Observability + Security Operations

Step 17: Monitoring & Alerting

  • Prometheus + Grafana
  • Alerts: latency, error rate, CPU/memory, pod restarts
  • SLO dashboards (prod readiness)

Deliverable: Visibility + proactive incident detection.

Step 18: Runtime Security (Optional but strong)

  • Falco (runtime threat detection)
  • Audit logs to SIEM
  • Image drift detection

Deliverable: Security beyond “deploy-time”.


Phase 8 — Production Runbooks & DR

Step 19: Runbooks & Incident Playbooks

Document:

  • rollback steps (Argo Rollouts/Argo CD)
  • database failover steps
  • node failure handling
  • restore procedures

Deliverable: Reduced MTTR.

Step 20: DR Testing

  • game days (simulate outage)
  • validate RTO/RPO
  • validate restore + redeploy

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *