20 battle-tested decision rules. When an exam question gives you 4 plausible answers, these heuristics tell you which one AWS expects.
For any production workload requiring high availability, deploy across at least two Availability Zones.
Why: Single-AZ deployments fail when the AZ fails. Multi-AZ is the cheapest path to HA within a region.
Store session and state outside compute (ElastiCache, DynamoDB, S3), not on EC2 local disk.
Why: Stateless servers can scale horizontally and be replaced freely. Local state couples scaling to state loss.
Do NOT default to multi-region. Only add it when RPO/RTO requirements cannot be met by multi-AZ.
Why: Multi-region is 3-10x the cost and complexity of multi-AZ. Most workloads don't need it.
Route 53 and ALB should route using health checks, not TTL-only routing.
Why: Failover should be automatic and near-instant, not DNS-cached.
Secrets NEVER in code, environment variables, or Git. Use Secrets Manager or Parameter Store.
Why: Hardcoded secrets = compromised on first leak, rotation impossible, audit trail missing.
Use IAM roles for AWS service-to-service access. Reserve access keys for external integrations only.
Why: Roles are auto-rotated by STS. Long-lived access keys are the #1 credential compromise vector.
Grant only the permissions required, nothing more. Start restrictive and loosen as needed.
Why: Over-permissive policies expose the blast radius if credentials leak.
Enable encryption on every data store. Use TLS/SSL for every in-flight connection.
Why: Compliance (HIPAA, PCI, GDPR) requires it, and it's free with AWS-managed keys.
Databases and internal services live in PRIVATE subnets. Only LBs and bastions are in public subnets.
Why: Public databases = public attack surface. Period.
For HIPAA, PCI, or regulated workloads, use CMKs — not AWS-managed keys.
Why: CMKs give you key rotation control and full CloudTrail auditing. AWS-managed keys are fine otherwise.
Reserved Instances and Savings Plans only AFTER workloads are right-sized. Don't reserve oversized capacity.
Why: Savings Plans compound your oversizing mistakes.
Batch, CI/CD, and fault-tolerant workloads = Spot. Stateful databases and user-facing APIs = NOT Spot.
Why: Spot can save 70-90%. Interruptions are tolerable for batch but deadly for stateful services.
Every S3 bucket needs a lifecycle policy: transition cold data to Glacier, delete expired data.
Why: Storage costs compound. S3 Standard → Glacier is ~95% cheaper.
For AWS service traffic from private subnets, use VPC endpoints — NOT NAT Gateway.
Why: NAT Gateway: $0.045/hr + data charges. Gateway endpoints (S3, DynamoDB) are FREE.
Global users → CloudFront in front of origin. Static + dynamic, both benefit.
Why: Edge caching + persistent HTTPS + DDoS protection (Shield Standard) — all free with CF.
Read-heavy databases (session lookups, product catalog reads) → put ElastiCache in front.
Why: Sub-ms latency vs 5-10ms RDS. Dramatically reduces DB CPU and cost.
On-demand mode for unknown/bursty traffic. Provisioned + auto-scaling for predictable, high-throughput.
Why: On-demand = zero tuning, pays for actual usage. Provisioned is cheaper at sustained high TPS.
High-cardinality, evenly-distributed partition keys. Hot partitions throttle everything.
Why: Exam loves this — hot partition = bad design = exam trap.
Asynchronous processing = SQS queue between them. Don't couple them directly.
Why: Decoupling absorbs spikes, enables retries, and lets consumers scale independently.
One event, N consumers = SNS topic with N SQS subscriptions. Each consumer gets its own queue.
Why: Each consumer can fail, retry, and scale independently. Much better than SNS-direct-to-Lambda for most cases.
SparkUpCloud has 3,400+ AWS practice questions with AI-powered adaptive learning. Your next exam is 70% heuristics + 30% specifics.
Start Studying Free →