AWS Platform Foundation
Status
Partial: AWS production platform foundation, bootstrap/state strategy, CI/CD readiness, docs-site hosting scaffold, and AWS dev runtime deploy readiness exist. Application workloads are still deployed only through an explicit, protected pipeline run.
Related Requirements
- SRS references: scalable SaaS platform deployment.
- Client response references: AWS is Phase 1 primary; Azure failover is Phase 2 evaluation; operational simplicity remains a constraint.
- ADR references: AWS deployment ADR, AWS bootstrap/state ADR, service mesh note.
- Task board references: OP-D039, OP-D040, OP-D041, OP-D042, OP-043, OP-044.
Problem Statement
OneProtect needs a safe, isolated AWS path that does not disturb existing workloads, does not use long-lived CI keys, and gates real infrastructure/app deployments behind reviewed IaC and CI/CD.
Architectural Intent
AWS dev is the first target. Terraform/OpenTofu owns infrastructure, Helm owns application deployment, and GitLab/GitHub CI wrappers call shared scripts. The platform starts with bootstrap/state and CI trust before VPC/EKS/RDS/app workloads.
What Was Implemented
- AWS Phase 1 deployment ADR.
- Service mesh decision note: no service mesh for first bootstrap.
infra/aws/modules for network, EKS, RDS/Postgres, KMS, S3 evidence, ECR, IAM, observability, and docs-site hosting.infra/aws/bootstrap/for state bucket, lock table, optional KMS, and CI OIDC trust roles.- AWS dev Helm values.
- GitLab-first CI/CD scripts and GitHub Actions parity skeletons.
- AWS pre-apply checklist, first-apply runbook, CI variable matrix, drift docs, and cost/safety guardrails.
- Docs-site S3 + CloudFront hosting scaffold with private S3 and Origin Access Control.
- A narrow AWS dev ECR-only stack for
api-serviceandfrontendimage repositories so CI image publishing can be unblocked before VPC/EKS/RDS. - On-demand AWS deploy gates so normal feature/develop pipelines do not publish
images or deploy unless
DEPLOY_AWS_DEV=true. - Mutable ECR BuildKit cache repositories for faster on-demand image publish.
- Helm runtime readiness for AWS dev:
- in-cluster NATS JetStream dev StatefulSet,
- Postgres migration pre-install/pre-upgrade Job,
- protected Helm values materialization,
- runtime Kubernetes Secret creation helper.
Components Involved
infra/aws/deploy/helm/oneprotect/values-aws-dev.yaml.gitlab-ci.yml.github/workflows/scripts/ci/scripts/aws/preflight-check.shscripts/aws/create-dev-k8s-secrets.shscripts/ci/materialize-helm-values.sh
APIs / Events / Schemas
No product APIs or event contracts were added by the AWS foundation work.
Deployment Notes
Current intended real-world order:
- Fill local untracked bootstrap tfvars.
- Run AWS preflight.
- Plan bootstrap only.
- Manually review the plan.
- Apply bootstrap only: state bucket, lock table, optional KMS, CI OIDC roles.
- Plan/apply the ECR-only dev stack if
publish_ecris blocked. - Publish immutable images to ECR with registry-backed BuildKit cache.
- Prepare broader dev infra in a later reviewed plan.
- Create runtime Kubernetes Secrets from approved secret sources.
- Deploy app workloads only after protected pipeline gates are ready.
The ECR-only stack must create only:
oneprotect/dev/api-serviceoneprotect/dev/frontend
AWS dev infra plan readiness now defines the non-app apply order: ECR-only if image publishing is blocked, then VPC/foundational IAM, private RDS/Postgres, EKS, kubectl access, runtime Kubernetes Secrets, Helm render, then gated workload deploy.
AWS dev now prefers Graviton/arm64 EKS nodes for cost, keeps an x86 fallback path, and requires multi-arch OneProtect images before arm64 workload scheduling.
AWS dev runtime readiness now adds an in-cluster NATS JetStream dev instance and a Helm migration Job. These are dev bootstrap choices, not final production decisions for the event backbone or database operations model.
Security / Tenant Isolation
- AWS resources are isolated under OneProtect naming and tags.
- No real account IDs, domains, secrets, kubeconfigs, or tfvars are committed.
- CI uses OIDC role assumption, not static AWS keys.
- Manual console drift is prohibited except documented break-glass.
- Docs-site bucket is private and served through CloudFront OAC when enabled.
Validation Steps
UI Validation
No application UI validation exists until AWS dev app workloads are deployed. For docs-site hosting, validate the CloudFront URL only after publishing is approved and enabled.
API Validation
No product API validation exists until app workloads are deployed to AWS dev. For bootstrap, validate Terraform/OpenTofu outputs and AWS resource names in the approved account.
Smoke Validation
make aws-preflight-check
make aws-bootstrap-plan
make aws-dev-ecr-plan-dryrun
make aws-dev-ecr-plan
make aws-dev-plan-dryrun
make aws-dev-helm-render
make aws-dev-k8s-secrets-dryrun
make docker-buildx-check
make aws-terraform-validate
make aws-iac-check
make aws-helm-template
make aws-preflight-check, make aws-bootstrap-plan, and make aws-dev-plan
are expected to fail closed without real local AWS inputs and untracked tfvars.
Known Limitations
- App workloads are not deployed to AWS dev by this note.
- The ECR-only stack does not prove runtime deployment; it only prepares image repositories.
- Runtime Kubernetes Secrets still require approved values from local/CI secret sources.
- AWS Load Balancer Controller is installed separately through a protected GitLab Agent job before ALB Ingress produces a stable AWS URL.
- Public app DNS is manual in Namecheap for
watchtower-app.mergematter.io. - Public docs DNS is manual in Namecheap for
docs.watchtower-app.mergematter.io. - arm64 image support must stay proven with buildx before workloads are pinned to Graviton nodes.
- EKS vs ECS, NATS vs MSK/Kinesis, Aurora vs RDS, production IdP, OpenSearch, time-series store, and runtime KMS/S3 enforcement remain open or queued decisions. The logical tenant key model is accepted in ADR-0018.
- Docs-site publishing is scaffolded but disabled until audience/access are approved.
Follow-Up Work
- Execute approved bootstrap apply only.
- Execute approved ECR-only apply if image publishing is blocked.
- Prepare broader AWS dev infra plan readiness.
- Configure protected GitLab OIDC variables.
- Configure protected AWS dev Helm values file.
- Create runtime Kubernetes Secrets.
- Deploy app workloads only through gated Helm pipeline.
Acceptance Criteria Mapping
| Acceptance criterion | Evidence |
|---|---|
| AWS primary path is scaffolded | AWS ADR and infra/aws/ |
| CI uses OIDC, not static keys | CI/CD docs and workflow skeletons |
| Bootstrap is gated | First-apply runbook and preflight script |
| ECR-only apply is narrow | infra/aws/envs/dev-ecr/ and deployment runbooks |
| Runtime deploy path is gated | DEPLOY_AWS_DEV, AWS_DEV_HELM_VALUES_FILE, Helm migration/NATS templates |
| No app workloads deployed by docs branch | Deployment docs and task board status |