All Scenarios
mediumSAPDEAArchitectureAnalytics

Real-Time Clickstream Analytics

An e-commerce site needs to ingest 100K clicks/sec, enrich each event with user context, feed both a real-time dashboard (5-second latency) and a daily warehouse for BI.

Key Constraints

100K events/sec ingest
Real-time dashboard (<5s latency)
Daily batch to warehouse
Store raw events for 1 year

Reference Architecture (interactive 3D)

🖱️ Drag to rotate · 📜 Scroll to zoom

Loading diagram...

Lambda Architecture with Kinesis

  1. 1Kinesis Data Streams ingests raw events (100K/sec ≈ 100 shards at 1MB/sec each).
  2. 2Lambda consumer enriches each batch by looking up DynamoDB for user context.
  3. 3Enriched events go to a second Kinesis stream — separates concerns and enables fan-out.
  4. 4Kinesis Data Firehose tees raw events to S3, partitioned by date (year=/month=/day=).
  5. 5S3 lifecycle: Standard (30d) → Standard-IA (90d) → Glacier (until 1yr expiry).
  6. 6Dashboard Lambda aggregates from enriched stream into a low-latency DynamoDB table.
  7. 7Nightly: AWS Glue or Redshift COPY pulls partitioned S3 data into Redshift for BI.

Common Traps (Wrong Answers)

  • Using SQS instead of Kinesis (loses ordering, no replay, can't fan out to multiple consumers)
  • Writing each event directly to DynamoDB or RDS (throttles the database under burst)
  • Not partitioning S3 by date (Athena/Redshift queries scan terabytes unnecessarily)
  • Sizing shards based on average traffic — peak hour matters, not average

Try the simulator

Build this architecture yourself in the drag-and-drop simulator.