Production-Ready Kubernetes CI/CD with Security, Stateful Workloads & Workload Isolation

Modern Kubernetes platforms require more than just deployments—they demand secure CI/CD pipelines, workload isolation, and reliable handling of stateful applications.
In real enterprise environments, teams use GitOps-based CI/CD, automated security gates, and progressive delivery to release applications safely without downtime. Workload isolation using namespaces, RBAC, and network policies reduces blast radius, while StatefulSets, PVCs, and backup strategies ensure data integrity for databases and queues.
This architecture is widely adopted by SaaS and fintech companies to accelerate releases, meet compliance requirements, and protect production systems—even as teams and workloads scale rapidly
Below is a real, end-to-end production-style example you can copy and adapt.
It includes:
- CI/CD with GitHub Actions
- GitOps CD with Argo CD
- Progressive delivery (Canary) with Argo Rollouts
- Workload isolation (Namespace + RBAC + NetworkPolicy)
- Stateful workload (PostgreSQL StatefulSet + PVC)
- Security guardrails (Kyverno policies)
Automated CI/CD: build → test → scan → deploy → verify → rollback
Secure delivery: image scanning, policy controls, least privilege
Stateful workloads: PVC, backup/restore, safe upgrades
Isolation: namespaces, network policies, admission policies, RBAC
Phase 1 — Platform Foundation (One-time setup)
Step 1: Create Cluster Baseline
- Provision EKS/AKS/GKE (or self-managed) with:
- multiple node pools (system, apps, data if needed)
- cluster autoscaler + metrics-server
- central logging/monitoring stack (or plan it)
Deliverable: Stable cluster + node groups + basic add-ons.
Step 2: Define Environments
Minimum:
dev,stage,prod
Use separate namespaces per environment (or separate clusters for strong isolation).
Deliverable: Namespace structure + environment standards.
Phase 2 — GitOps-First CI/CD (Recommended production model)
Step 3: Split Repos (Best Practice)
- app-repo: source code + Dockerfile + tests + CI workflows
- gitops-repo: Helm/Kustomize manifests per environment
Deliverable: Repo layout that supports auditability and safe promotions.
1) app-repo (source + Dockerfile + tests + CI)
app-repo/
src/
index.js
test/
app.test.js
package.json
package-lock.json
Dockerfile
.dockerignore
.github/
workflows/
ci-build-push-and-update-gitops.yml
src/index.js (example Node API)
const express = require("express");
const app = express();
app.get("/healthz", (_req, res) => res.status(200).send("ok"));
app.get("/", (_req, res) => res.json({ service: "demo-api", status: "running" }));
const port = process.env.PORT || 3000;
app.listen(port, () => console.log(`listening on ${port}`));
module.exports = app;
test/app.test.js (Jest + Supertest)
const request = require("supertest");
const app = require("../src/index");
describe("health", () => {
it("GET /healthz => 200", async () => {
const res = await request(app).get("/healthz");
expect(res.statusCode).toBe(200);
expect(res.text).toBe("ok");
});
});
package.json
{
"name": "demo-api",
"version": "1.0.0",
"main": "src/index.js",
"scripts": {
"test": "jest --runInBand",
"start": "node src/index.js"
},
"dependencies": {
"express": "^4.19.2"
},
"devDependencies": {
"jest": "^29.7.0",
"supertest": "^7.0.0"
}
}
Dockerfile (multi-stage, non-root)
# ---- build deps ----
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
# ---- runtime ----
FROM node:20-alpine
WORKDIR /app
# Create non-root user
RUN addgroup -S app && adduser -S app -G app
COPY --from=deps /app/node_modules ./node_modules
COPY src ./src
COPY package.json ./
USER app
ENV NODE_ENV=production
EXPOSE 3000
CMD ["node", "src/index.js"]
gitops-repo (Helm/Kustomize per environment)
gitops-repo/
apps/
demo-api/
chart/
Chart.yaml
values.yaml
templates/
deployment.yaml
service.yaml
ingress.yaml
envs/
dev/
values.yaml
kustomization.yaml
stage/
values.yaml
kustomization.yaml
prod/
values.yaml
kustomization.yaml
clusters/
dev/
demo-api-argocd-app.yaml
stage/
demo-api-argocd-app.yaml
prod/
demo-api-argocd-app.yaml
Step 4: Container Registry
- ECR/GCR/ACR
- Enforce immutable tags (commit SHA)
- Enable retention policies and optional replication
Deliverable: Secure registry + naming conventions.
ECR.tf(Terraform code)
resource "aws_ecr_repository" "app" {
name = "demo-api"
image_tag_mutability = "IMMUTABLE" # 🔒 ENFORCE IMMUTABLE TAGS
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "AES256"
}
tags = {
Application = "demo-api"
Environment = "shared"
ManagedBy = "terraform"
}
}
Phase 3 — CI Pipeline (Build, Test, Scan, Sign)
Step 5: Build & Unit Test
In CI (GitHub Actions/GitLab/Jenkins):
- lint + unit tests
- build Docker image
- tag with
sha-<commit>or${GITHUB_SHA}
Create: .github/workflows/ci.yml
name: CI - Build Test Scan Push + GitOps Update
on:
push:
branches: ["main"]
env:
AWS_REGION: ap-south-1
ECR_REPO: orders-api
GITOPS_REPO: jagan-rajagopal/orders-gitops
GITOPS_PATH: apps/orders-api/envs/prod
jobs:
build_and_release:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout app repo
uses: actions/checkout@v4
# If you use AWS OIDC, configure credentials here.
# Keeping simple with static credentials usage pattern omitted.
- name: Login to ECR
run: |
aws ecr get-login-password --region "${AWS_REGION}" \
| docker login --username AWS --password-stdin "<ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com"
- name: Build image
run: |
IMAGE_TAG="${GITHUB_SHA}"
docker build -t "${ECR_REPO}:${IMAGE_TAG}" .
docker tag "${ECR_REPO}:${IMAGE_TAG}" "<ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:${IMAGE_TAG}"
- name: Run tests
run: |
# Replace with your real test command
echo "Running unit tests..."
# npm ci && npm test
- name: Trivy scan (fail on HIGH/CRITICAL)
uses: aquasecurity/trivy-action@0.24.0
with:
image-ref: "<ACCOUNT_ID>.dkr.ecr.${{ env.AWS_REGION }}.amazonaws.com/${{ env.ECR_REPO }}:${{ github.sha }}"
format: table
severity: HIGH,CRITICAL
exit-code: 1
- name: Push image
run: |
docker push "<ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:${GITHUB_SHA}"
# ---- GitOps update (image tag bump) ----
- name: Checkout GitOps repo
uses: actions/checkout@v4
with:
repository: ${{ env.GITOPS_REPO }}
token: ${{ secrets.GITOPS_TOKEN }}
path: gitops
- name: Update image tag in GitOps (Kustomize)
run: |
cd gitops/${GITOPS_PATH}
# Replace image tag in rollout manifest (simple approach)
sed -i "s|image: .*orders-api:.*|image: <ACCOUNT_ID>.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:${GITHUB_SHA}|g" ../../base/rollout-canary.yaml
git config user.email "ci-bot@awstrainingwithjagan.com"
git config user.name "ci-bot"
git add ../../base/rollout-canary.yaml
git commit -m "Release orders-api ${GITHUB_SHA}" || echo "No changes to commit"
git push
Notes:
- Replace
<ACCOUNT_ID>and ensure AWS creds are configured (OIDC recommended). GITOPS_TOKENshould be a PAT with access to the GitOps repo.
Deliverable: Reproducible builds.
Step 6: Security Gates (Mandatory)
Add automated checks:
- SAST (code scanning)
- dependency scan (SBOM)
- container scan (Trivy/Grype)
- fail build on High/Critical (policy-based)
Static Application Security Testing (SAST – Semgrep)
.semgrep.yml
rules:
- id: no-hardcoded-secrets
pattern: |
$KEY = "..."
message: "Hardcoded secret detected"
severity: ERROR
languages: [javascript]
CI Step
- name: Run Semgrep (SAST)
uses: returntocorp/semgrep-action@v1
with:
config: .semgrep.yml
Deliverable: “No scan, no release” rule.
Step 7: Push Image + Generate SBOM
- Push image to registry
- Store SBOM as build artifact
- Optional: sign image (Cosign)
Deliverable: Trusted artifact pipeline.
Phase 4 — CD Pipeline (Deploy with GitOps + Progressive Delivery)
Step 8: Update GitOps Manifests (Promotion model)
CI updates gitops-repo with the new image tag:
devauto-deploy on mergestagedeploy on release candidate tagproddeploy after approval
Deliverable: Controlled promotion from dev → prod.
Step 9: Argo CD (or Flux) Deployment
- Argo watches gitops repo
- syncs manifests automatically
- supports rollbacks to last known good version
- Argo CD watches gitops-repo and applies the environment overlay path.
clusters/dev/demo-api-argocd-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: demo-api-dev
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/YOUR_ORG/gitops-repo.git
targetRevision: main
path: apps/demo-api/envs/dev
destination:
server: https://kubernetes.default.svc
namespace: demo-dev
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Deliverable: Git is the source of truth for deployments.
Create similar files for stage/prod pointing to:
apps/demo-api/envs/stageapps/demo-api/envs/prod
Step 10: Progressive Delivery
Choose one:
- RollingUpdate for baseline
- Blue-Green or Canary using Argo Rollouts (recommended)
Add automated analysis:
- if error rate/latency rises → rollback
Deliverable: Safer releases, reduced incidents.
Phase 5 — Workload Isolation (Multi-tenant safety)
Step 11: Namespace Isolation + RBAC
- Separate namespaces per team/app/env
- Restrict access by RBAC roles
- Use ServiceAccounts per workload (no default SA)
Deliverable: Least privilege and safer multi-team operations.
Step 12: Network Policies (Zero-trust inside cluster)
- Default deny ingress/egress per namespace
- Allow only required service-to-service paths
- Restrict access to database namespaces
Deliverable: Lateral movement prevention.
Step 13: Admission Policies (Guardrails)
Implement one:
- Kyverno (policy as code) OR Gatekeeper (OPA)
Policies to enforce: - no privileged pods
- no hostPath
- require resource limits
- require readOnlyRootFilesystem
- disallow latest tag
- enforce signed images (optional advanced)
Deliverable: Cluster-wide security enforcement.
Phase 6 — Stateful Workloads (Databases, queues, durable storage)
Step 14: Storage Architecture
- Use StorageClasses (EBS gp3 / managed storage)
- PVC + StatefulSets for stateful services
- Enable zone-aware scheduling if multi-AZ
Deliverable: Reliable persistent storage model.
Step 15: Stateful Deploy Patterns
For DB/queues in Kubernetes:
- StatefulSet + PVC
- PodDisruptionBudget
- Anti-affinity (spread pods)
- Backup schedules
Deliverable: Stateful workloads that survive node disruptions.
Step 16: Backup & Restore (Non-negotiable)
Use:
- Velero for K8s objects
- CSI snapshots for volumes
- Database-native backups (logical + PITR)
Test restores regularly in staging.
Deliverable: Verified recovery capability.
Phase 7 — Observability + Security Operations
Step 17: Monitoring & Alerting
- Prometheus + Grafana
- Alerts: latency, error rate, CPU/memory, pod restarts
- SLO dashboards (prod readiness)
Deliverable: Visibility + proactive incident detection.
Step 18: Runtime Security (Optional but strong)
- Falco (runtime threat detection)
- Audit logs to SIEM
- Image drift detection
Deliverable: Security beyond “deploy-time”.
Phase 8 — Production Runbooks & DR
Step 19: Runbooks & Incident Playbooks
Document:
- rollback steps (Argo Rollouts/Argo CD)
- database failover steps
- node failure handling
- restore procedures
Deliverable: Reduced MTTR.
Step 20: DR Testing
- game days (simulate outage)
- validate RTO/RPO
- validate restore + redeploy
