AWS Production Foundation
AWS is the Phase 1 primary cloud target. The repo now contains the bootstrap, dev infrastructure, and runtime deploy readiness path, but application workload deployment remains protected and on-demand.
Implemented/scaffolded now:
- AWS deployment ADR.
- AWS bootstrap/state ADR.
- Service mesh decision note.
infra/aws/Terraform/OpenTofu scaffold.infra/aws/bootstrap/Terraform/OpenTofu scaffold for state bucket, lock table, optional state KMS key, and CI OIDC role scaffolding.- EKS dev target.
- RDS/Aurora PostgreSQL target.
- Narrow AWS dev ECR-only stack for image repositories.
- S3 evidence bucket with KMS.
- IAM/OIDC skeleton for CI role assumption.
- Helm
values-aws-dev.yaml. - AWS dev runtime deploy readiness:
- protected Helm values file materialization,
- runtime Kubernetes Secret helper,
- in-cluster NATS JetStream dev StatefulSet,
- Postgres migration Job.
- CI/CD workflow skeletons for plan, image release, and dev deploy.
- CI/CD workflow skeleton for bootstrap validation/planning.
- GitLab-first deployment-readiness pipeline with GitHub Actions parity.
- Shared CI scripts under
scripts/ci/. - Drift, rollback, cost, and safety guardrail docs.
- AWS dev bootstrap runbook.
- AWS dev deployment runbook.
- AWS pre-apply checklist, first-apply runbook, and CI variable matrix.
- S3 + CloudFront docs-site hosting scaffold.
Current recommendation:
- Use EKS for the first AWS bootstrap.
- Do not add a service mesh yet.
- Keep agent/device mTLS at the app/API enrollment boundary.
- Keep AWS applies behind reviewed Terraform/OpenTofu plans and manual approval.
- Use CI OIDC role assumption, not long-lived keys.
- Bootstrap remote state and CI trust before any app infrastructure apply.
- Apply the ECR-only stack before
publish_ecrif image repositories do not exist; it must create only image repositories. - Use GitLab CI now and keep GitHub Actions portability for client handoff.
- Keep docs-site publishing manual/protected until audience and access are approved.
- Run AWS dev deploy pipelines only with
DEPLOY_AWS_DEV=true. - Configure
AWS_DEV_HELM_VALUES_FILEbefore approving Helm deploy. - Install/approve AWS Load Balancer Controller before expecting ALB Ingress to expose stable dev URLs.
Open decisions:
- EKS vs ECS long-term.
- NATS JetStream vs MSK/Kinesis.
- Aurora vs RDS PostgreSQL.
- Cognito/Auth0/Keycloak.
- OpenSearch and time-series adoption timing.
- Runtime KMS/S3 object storage implementation for ADR-0018 tenant key refs.
Source docs:
docs/adr/ADR-0006-aws-phase1-deployment-architecture.mddocs/adr/ADR-0007-aws-bootstrap-and-state-strategy.mddocs/planning/aws-dev-bootstrap-runbook.mddocs/planning/aws-dev-deployment-runbook.mddocs/planning/aws-dev-pre-apply-checklist.mddocs/planning/aws-dev-first-apply-runbook.mddocs/planning/aws-dev-infra-plan-readiness.mddocs/planning/aws-dev-runtime-deploy-readiness.mddocs/architecture/aws-ci-variable-matrix.mddocs/architecture/service-mesh-decision.mddocs/architecture/aws-cicd-strategy.mddocs/architecture/aws-drift-management.mddocs/architecture/aws-cost-and-safety-guardrails.md