Skip to main content

AWS Production Foundation

AWS is the Phase 1 primary cloud target. The repo now contains the bootstrap, dev infrastructure, and runtime deploy readiness path, but application workload deployment remains protected and on-demand.

Implemented/scaffolded now:

  • AWS deployment ADR.
  • AWS bootstrap/state ADR.
  • Service mesh decision note.
  • infra/aws/ Terraform/OpenTofu scaffold.
  • infra/aws/bootstrap/ Terraform/OpenTofu scaffold for state bucket, lock table, optional state KMS key, and CI OIDC role scaffolding.
  • EKS dev target.
  • RDS/Aurora PostgreSQL target.
  • Narrow AWS dev ECR-only stack for image repositories.
  • S3 evidence bucket with KMS.
  • IAM/OIDC skeleton for CI role assumption.
  • Helm values-aws-dev.yaml.
  • AWS dev runtime deploy readiness:
    • protected Helm values file materialization,
    • runtime Kubernetes Secret helper,
    • in-cluster NATS JetStream dev StatefulSet,
    • Postgres migration Job.
  • CI/CD workflow skeletons for plan, image release, and dev deploy.
  • CI/CD workflow skeleton for bootstrap validation/planning.
  • GitLab-first deployment-readiness pipeline with GitHub Actions parity.
  • Shared CI scripts under scripts/ci/.
  • Drift, rollback, cost, and safety guardrail docs.
  • AWS dev bootstrap runbook.
  • AWS dev deployment runbook.
  • AWS pre-apply checklist, first-apply runbook, and CI variable matrix.
  • S3 + CloudFront docs-site hosting scaffold.

Current recommendation:

  • Use EKS for the first AWS bootstrap.
  • Do not add a service mesh yet.
  • Keep agent/device mTLS at the app/API enrollment boundary.
  • Keep AWS applies behind reviewed Terraform/OpenTofu plans and manual approval.
  • Use CI OIDC role assumption, not long-lived keys.
  • Bootstrap remote state and CI trust before any app infrastructure apply.
  • Apply the ECR-only stack before publish_ecr if image repositories do not exist; it must create only image repositories.
  • Use GitLab CI now and keep GitHub Actions portability for client handoff.
  • Keep docs-site publishing manual/protected until audience and access are approved.
  • Run AWS dev deploy pipelines only with DEPLOY_AWS_DEV=true.
  • Configure AWS_DEV_HELM_VALUES_FILE before approving Helm deploy.
  • Install/approve AWS Load Balancer Controller before expecting ALB Ingress to expose stable dev URLs.

Open decisions:

  • EKS vs ECS long-term.
  • NATS JetStream vs MSK/Kinesis.
  • Aurora vs RDS PostgreSQL.
  • Cognito/Auth0/Keycloak.
  • OpenSearch and time-series adoption timing.
  • Runtime KMS/S3 object storage implementation for ADR-0018 tenant key refs.

Source docs:

  • docs/adr/ADR-0006-aws-phase1-deployment-architecture.md
  • docs/adr/ADR-0007-aws-bootstrap-and-state-strategy.md
  • docs/planning/aws-dev-bootstrap-runbook.md
  • docs/planning/aws-dev-deployment-runbook.md
  • docs/planning/aws-dev-pre-apply-checklist.md
  • docs/planning/aws-dev-first-apply-runbook.md
  • docs/planning/aws-dev-infra-plan-readiness.md
  • docs/planning/aws-dev-runtime-deploy-readiness.md
  • docs/architecture/aws-ci-variable-matrix.md
  • docs/architecture/service-mesh-decision.md
  • docs/architecture/aws-cicd-strategy.md
  • docs/architecture/aws-drift-management.md
  • docs/architecture/aws-cost-and-safety-guardrails.md