Production-Ready DevSecOps - Part 1: Architecture Design
DevSecOps architecture blueprint for building enterprise-grade production applications with security, high delivery, and low costs using GitOps, Kubernetes, and AWS.
Most Kubernetes tutorials cover basic deployment, but production systems require more like: secrets management, cost optimization, high availability, and comprehensive security.
This guide presents a battle-tested architecture blueprint demonstrating how to build enterprise-grade production applications with Kubernetes.
Using Easy Upload, a file processing application, as an example we’ll explore real-world DevSecOps patterns from local development through production deployment that you can adapt for your own applications.
Part 2 will provide the complete implementation guide with code, configurations, and deployment instructions.
What This Architecture Delivers
- Cost Optimization: 50-60% infrastructure cost reduction through GitOps and optimized resource management
- Security: 70-80% fewer vulnerabilities reaching production via shift-left security practices
- Speed: Faster delivery with automated pipelines and ephemeral preview environments
- Reliability: 99.9%+ uptime with multi-AZ deployments and automated rollbacks
- Developer Experience: Faster feedback loops with preview environments and automated testing
Simpler Alternatives to Kubernetes
While this guide uses Kubernetes for its rich ecosystem and portability, the same architectural blueprint (GitOps workflows, shift-left security, CI/CD pipelines, observability patterns) applies to simpler orchestration platforms.
Alternative orchestration options:
- Managed Container Services: AWS ECS/Fargate, AWS App Runner, Google Cloud Run, Azure Container Apps for simpler container deployments
- Serverless Compute: AWS Lambda, Azure Functions, Google Cloud Functions for event-driven architectures
The platform choice depends on your team’s needs and operational complexity, but the DevSecOps blueprint stays the same.
Why GitOps?
GitOps fundamentally transformed the workflow of teams I worked with, delivering greater efficiency, faster shipping, enhanced security, and significant cost savings.
Traditional vs. GitOps Transformation
Traditionally, CI/CD pipelines consist of four 24/7 environments (dev, test, staging, production).
GitOps replaces them with local dev, ephemeral preview environments per PR, and on-demand staging. Production remains the only 24/7 environment, using reserved instances for cost optimization.
Additionally, GitOps tools like ArgoCD guarantee automated deployments with Git as a source of truth, eliminating drift, instant rollbacks, and complete audit trail through Git history.
Results
- 50-60% cost reduction across teams
- Improved security through automated scanning and GitOps practices
- Faster code delivery with streamlined, automated processes
- Improved reliability through automated rollbacks, drift prevention, and disaster recovery in minutes
- Complete audit trail for compliance through Git history of all infrastructure changes
Why DevSecOps?
Traditionally, security testing is only at the end, which creates bottlenecks and expensive fixes.
DevSecOps shifts security left with continuous security from development through production with SAST/DAST scan, and dependency checks integrated from the start.
Results & Benefits
- 70-80% reduction in vulnerabilities reaching production
- Reduced remediation costs by catching vulnerabilities early in development
- Faster time to market with security integrated from day one
- Compliance-ready with automated scans and audit trails for regulatory requirements
Application Architecture Overview
Easy Upload is a web application that handles file uploads and processing. This design demonstrates real-world complexity, perfect for showcasing DevSecOps architecture patterns.
Demo of App Working
How it Works
- Frontend → API:
- A static Astro frontend uploads files to the API Service.
- API → S3 & Queue:
- The API stores files in temporary S3 bucket (
raw-uploads) and enqueues jobs in BullMQ for background processing
- The API stores files in temporary S3 bucket (
- Worker → Processed Storage & Database:
- Worker Service consumes BullMQ jobs, processes files, uploads results to
processed-uploadsS3 bucket, and updates PostgreSQL metadata
- Worker Service consumes BullMQ jobs, processes files, uploads results to
- Real-time Updates → Frontend:
- Changes pushed back via Server-Sent Events (SSE) using Redis Pub/Sub channels
Architecture Benefits
- Cost Optimization:
- Backend for Frontend Pattern: API handles all business logic, keeping frontend static for CDN caching and minimal hosting costs
- S3 Storage Optimization: Automated lifecycle policies transition files to cheaper storage tiers
- Microservices Architecture:
- Independent Services: Frontend, API, and Workers scale based on demand with isolated failures and security boundaries
- Async Processing: Fast user-facing API responses by offloading heavy processing to background workers
Infrastructure Stack
The infrastructure is heavily automated, using managed cloud services balanced with Kubernetes workloads for simplicity, security, and flexibility.
Cloud Infrastructure Diagram
Cloud & Infrastructure
Using vendor-agnostic managed services (like AWS RDS with PostgreSQL) for flexibility without lock-in:
- Cloud Provider: AWS
- Infrastructure as Code: Terraform
- Container Registry: AWS ECR
- Kubernetes Cluster: AWS EKS
- CI: GitHub Actions
- CD / GitOps: Argo CD
- Database: AWS RDS PostgreSQL (Multi-AZ)
- Cache / Queue: AWS ElastiCache Redis with BullMQ (Multi-AZ)
- Note: BullMQ chosen over RabbitMQ to reduce infrastructure overhead and provide better Node.js integration for producer-consumer patterns.
- DNS & TLS: AWS Route 53 & AWS Certificate Manager
- CDN / Entry Point: AWS CloudFront
- File Storage (Buckets): AWS S3
- Static Frontend & Assets: S3 storage (private origin and cached) served exclusively through CloudFront
- Load Balancing: AWS Network Load Balancer → Nginx Ingress
- Note: NLB serves as the AWS-required entry point to the EKS cluster, while Nginx Ingress provides the core routing logic. This architecture reduces costs when scaling (one NLB vs multiple ALBs per service) and enables cloud portability (Nginx works on any Kubernetes cluster).
Scaling & High Availability
- Horizontal Pod Autoscaler: API & Worker pods scale based on CPU/memory
- Karpenter: Rapid EKS node provisioning with On-Demand + Spot instances
- Multi-AZ Deployments: RDS & ElastiCache
Observability
- Logs: Pino JSON logs → Grafana Alloy → Loki
- Metrics: Prometheus → Mimir (CPU/Memory, error rates)
- Traces: Tempo for distributed tracing
- Profiling: Pyroscope for continuous CPU/memory profiling
- Visualization: Grafana for unified dashboards and telemetry correlation
- Alerting: Prometheus Alertmanager → Slack/PagerDuty with symptom-based alerts (e.g., high API errors)
Cloud & Infrastructure Security
- AWS WAF on CloudFront
- Pod Security Admission (PSA): Enforces restricted Pod Security Standards on application workloads
- Network Policies: Kubernetes restricts pod ingress/egress
- Pod Identity: EKS Pod Identity grants fine-grained permissions (e.g., specific S3 access)
- Secrets Management: AWS Secrets Manager with External Secrets Operator (ESO)
- Note: ESO automatically syncs secrets from AWS Secrets Manager into Kubernetes, enabling GitOps workflows without storing sensitive values in Git.
Cost Management
- Visibility:
- AWS Cost Explorer with cost allocation tags
- AWS Split Cost Allocation Data for EKS costs by namespace and pod
- Anomaly Detection: AWS Cost Anomaly Detection → Slack/PagerDuty
- Budgets: AWS Budgets with threshold alerts
Why This Infrastructure Design
- Application Portability: Open-source technologies and standard protocols (PostgreSQL, Redis, Kubernetes) enable application migration across clouds with minimal code changes
- Reduced Operational Complexity: AWS managed services reduce maintenance overhead while providing built-in redundancy and failover
- Cost-Optimized: Infrastructure cost reduction through elastic scaling with Spot instances, optimized storage policies, and ephemeral environments
- High Availability & Resilience: Multi-AZ deployments eliminate single points of failure while automated scaling and progressive rollouts ensure availability
- Fully Observable: Metrics, logs, traces, and profiles enable proactive monitoring and rapid incident resolution
- Secure by Design: Defense-in-depth approach with multiple protection layers (WAF, Pod Security Admission, network policies, pod identity, secrets management) enforces least privilege and limits blast radius
- Reproducible & Auditable: Terraform and GitOps ensure consistent deployments, complete audit trail for compliance, and instant rollback capability
Platform Infrastructure Pipeline (Terraform)
The infrastructure defined in Terraform has a dedicated pipeline ensuring all infrastructure changes are code-reviewed, security-scanned, and automatically applied.
Repository: platform-infra-repo
Workflow
- Infrastructure Change: Developer modifies Terraform configuration
- Pull Request: Triggers a GitHub Action for automated checks:
- Linting:
tflint - Security Scan:
trivy config . - Plan Review:
terraform planoutput is posted as a PR comment.
- Linting:
- Merge: On approval, a protected GitHub Action runs
terraform apply.
Why This Approach
- Code-Reviewed Infrastructure: Eliminates risky manual console changes through mandatory PR reviews.
- Automated Security Scanning: Catches misconfigurations before they reach production.
- Plan + Apply Separation: Safer deployments with full visibility into changes before execution.
- Git as Source of Truth: Complete audit trail and automated drift prevention through Git history
Application CI/CD Pipeline (GitOps)
- Development Model: Trunk-Based Development
- All developers work on short-lived feature branches off the
mainbranch - Feature branches merged frequently to reduce integration conflicts
- Keeps
mainbranch always deployable - Encourages continuous integration and fast feedback
- Paired with PR previews for testing new features in ephemeral environments
- All developers work on short-lived feature branches off the
- Repositories:
app-repo(Monorepo)- Contains all application source code
- Monorepo simplifies dependency management and ensures version consistency
- Packages separated for easy future microservices migration
config-repo(GitOps repo)- Stores Kubernetes manifests for staging and production.
- Separate from app code for independent deployments.
- GitOps Approach: CI updates manifests; Argo CD deploys automatically.
CI/CD Pipeline Workflow Diagram
CI/CD Security
- Static Code Analysis (SCA): Trivy for dependencies, IaC, and containers.
- Linting: ESLint, and Prettier
- Vulnerability Scanning: npm audit for known package vulnerabilities.
- Dependency Updates: Renovate for automated dependency PRs on weekly schedule.
- SAST: Semgrep for PRs and pre-commit hooks.
- DAST: OWASP ZAP for staging and preview environments.
- Performance Testing: k6 load tests validate system performance and scalability in staging
- E2E Testing: Playwright.
Workflow
- Feature Development
- Developer creates a feature branch to work on a new feature.
- Pre-commit (Optional)
- Format code with ESLint + Prettier.
- Run quick SAST scan using Semgrep.
- Pull Request (CI)
- Merge request from
featurebranch tomaintriggers CI pipeline. - Run Tests & Static Scans:
- Unit & integration tests.
- Full SAST scan with Semgrep.
- SCA scan with Trivy on
package.json. - npm audit for known package vulnerabilities.
- Merge request from
- Preview (Optional)
- Deploy PR changes to ephemeral cloud environment using
/deploy-previewcomment - Features:
- Fast & cheap builds via cached Docker layers and Kubernetes diffs
- Realistic data using cloud services and “golden snapshot” database
- Optional OWASP ZAP DAST scan against preview URL
- Deploy PR changes to ephemeral cloud environment using
- Staging Pipeline
- Build production-ready assets & images:
- Frontend → static S3 assets
- API & Worker → Docker images
- Security Scans: Full Trivy scan of Docker images
- Push Artifacts:
- Frontend assets to S3
- API & Worker images to AWS ECR with unique tags (e.g., api:v1.2.3, worker:v1.2.3)
- Update Kubernetes manifests in
config-repowith API & Worker image tags.
- Build production-ready assets & images:
- Staging Deployment
- Argo CD deploys updated manifests to staging cluster.
- Performance tests run via k6.
- E2E tests run via Playwright.
- Full DAST scan using OWASP ZAP.
- Production Promotion & Rollout
- Update production manifests in
config-repowith validated image tags. - Argo CD triggers Argo Rollouts for canary deployment:
- Gradual rollout (e.g., 5% → 25% → 100%).
- Monitored via Prometheus metrics (error rates, latency).
- Automatic rollback on failure.
- Update production manifests in
Why This Approach
- Infrastructure Cost Optimization: Eliminates permanent dev/test/staging environments through local development, ephemeral previews, and on-demand staging with only production running 24/7 on reserved instances
- Early Vulnerability Detection: Shift-left security with SAST, DAST, and dependency scanning catches issues during development
- Git as Source of Truth:
- Complete audit trail for compliance through Git history of all infrastructure changes
- Prevents configuration drift through Git-only deployments and mandatory code review
- Safe Deployment Process: Argo Rollouts enable progressive canary releases with automatic rollback on failure
- Reduced Merge Conflicts: Trunk-based development keeps main always deployable with frequent merges
- Fast Feature Validation: Ephemeral preview environments validate code against real cloud services without waiting for staging environment
- Independent Changes: App code and infrastructure updates occur separately without affecting each other
- Validated Deployments: Automated E2E and performance testing ensures production readiness
Local Development Strategy
The local development environment mirrors production architecture while enabling rapid iteration with full IDE integration.
Development follows a streamlined process that maintains consistency across the team:
- Application services (API, Worker, Frontend) run inside development containers with VSCode integration
- Infrastructure services (PostgreSQL, Redis, MinIO) run in Docker Compose, providing production-identical dependencies
- VSCode Dev Containers extension connects your IDE directly into the running containers for seamless development
Features & Benefits
- Production Parity: Docker Compose mirrors production services with shared configuration, eliminating “works on my machine” issues
- Quick Onboarding: Standardized Docker Compose setup enables new developers to get productive quickly with one-click environment initialization
- Fast Feedback Loop: Hot-reloading and pre-commit hooks provide immediate local validation before CI pipeline
- Cost Savings: Local development eliminates cloud resource consumption for work-in-progress code
- Development in Containers: VSCode’s Dev Containers extension enables development directly inside containers with full IntelliSense and debugging
- Ephemeral Preview Options:
- Full-stack previews: Deploy PRs to ephemeral cloud environments (frontend, backend, workers) for complete validation before staging/production
- Hybrid previews: Spin up ephemeral cloud dependencies (S3, RDS, Redis) and connect them to local Docker Compose for testing features without full stack costs
Simplified Alternative: Native Development
For this guide’s implementation, we use native development (Node.js on host + Docker Compose for infrastructure) to accelerate learning of core DevSecOps patterns. Transitioning to the full Dev Containers setup is straightforward once you grasp the deployment concepts.
Choose based on your team:
- Dev Containers → Large teams, strict consistency
- Native Development → Small teams, fast iteration
Complete Request Flow
The entire lifecycle of a user action demonstrates how all components work together:
- User Request: User visits
https://www.easyupload.com - Security Layer: CloudFront routes request, WAF inspects for threats
- Frontend Delivery: Static frontend served from S3 via CloudFront Origin Access Control (OAC)
- API Routing: API requests flow through Network Load Balancer → Ingress → API pod
- File Processing:
- API streams upload to S3 raw bucket
- Records metadata in RDS
- Enqueues job in Redis
- Background Processing:
- Worker processes file
- Updates processed bucket
- Updates database
- Publishes Redis Pub/Sub updates
- Real-time Updates: User receives progress updates and signed URLs for downloads
Complete Request Flow Diagram
Conclusion
This architecture demonstrates how GitOps workflows, shift-left security, and cloud-native patterns work together to reduce costs, strengthen security, and accelerate delivery cycles.
Beyond the technical stack, this approach eliminates manual deployment risks and operational overhead allowing teams to focus on building features instead of managing infrastructure.
What’s Next?
In Part 2, I will publish the implementation guide, including:
- Dockerfiles and Docker Compose for local dev
- Terraform IaC for AWS resources
- GitHub Actions workflows
- Kubernetes manifests with ArgoCD and Argo Rollouts
- Screenshots of successful deployments and the app live on AWS