Context: Everything here lives in personal repos (docker_multilang_project, Car-Match backend, ProjectHub proxy). I’ve never owned production containers.
AI assist: ChatGPT condensed my Docker/EKS notes; I validated each callout against the actual repos and AWS lab scripts on 2025-10-15.
Status: Learning log. Use it to gauge my current level, not as evidence of SRE tenure.

Reality snapshot

  • Day-to-day dev happens in Docker Compose (Node + Python + Postgres or Mongo).
  • When I want to stretch, I follow AWS workshops to stand up an EKS cluster with Terraform, deploy the same services, and watch how health checks + autoscaling behave.
  • Observability equals stdout JSON logs, /healthz routes, and occasionally Prometheus/Grafana during labs. No 24/7 pager yet.

Compose: my default sandbox

Stack anatomy

services:
api:
build: ./api
env_file: .env.api
ports: ["4000:4000"]
depends_on: [db]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4000/healthz"]
interval: 30s
timeout: 5s
retries: 3
frontend:
build:
context: ./frontend
target: production
ports: ["8080:80"]
depends_on: [api]
db:
image: postgres:15
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
  • Why it works: Health checks gate traffic, .env files keep secrets out of the compose file, and named volumes preserve data between runs.
  • Observability: Services log JSON with request IDs so docker compose logs api stitches calls together.
  • Chaos drills: I kill containers with docker compose kill api to verify the frontend fails gracefully and recovers once the container restarts.

Lessons

  • Multi-stage builds keep images small (e.g., node:20-alpine + npm ci in builder stage).
  • Mounting local certs into containers makes HTTPS dev possible without messing with the host.
  • Documenting every command (docs/dev-runbook.md) stops classmates from asking, “why doesn’t this container start?”

EKS labs: leveling up (still sandboxed)

What I practice

  1. Terraform provisioning – VPC, node groups, IAM roles. All lives in labs/eks/terraform.
  2. Deployments & services – Basic Deployment + Service manifests, ConfigMaps for environment variables, Secrets for credentials.
  3. Ingress – AWS Load Balancer Controller with TLS certs for the sample domain.
  4. Observability – Prometheus + Grafana via Helm, scraping the demo pods.
  5. Autoscaling & rolling updates – HPA based on CPU + custom metrics, kubectl rollout status demos.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: $ECR_URI/api:${GITHUB_SHA}
ports:
- containerPort: 4000
envFrom:
- secretRef:
name: api-secrets
readinessProbe:
httpGet:
path: /ready
port: 4000
initialDelaySeconds: 5
periodSeconds: 10

Honest caveats

  • Traffic is synthetic (k6 scripts + curl). No paying customers.
  • IAM roles follow workshop defaults. Before touching production I’d need a full review.
  • I rely on AWS Cloud9 + workshop accounts. Costs are low, but I still tear everything down immediately after lab time.

Tooling & guardrails

  • Container builds: npm run build:docker or scripts/docker-build.sh ensure multi-stage builds use the same base images.
  • Security: npm audit, docker scan, and occasional Trivy runs keep dependencies honest. Findings go in the repo issues list.
  • Docs: Every repo has docs/runbook.md (start/stop commands, health checks, log locations, TODOs). For EKS labs I add Terraform diagrams + destroy instructions.

What I’m working on next

  • Add automated smoke tests (k6 or Playwright) that hit the running Compose stack on CI before merging.
  • Package Terraform/EKS lab into a repeatable template so I can spin it up faster (and share with classmates).
  • Explore App Mesh or Linkerd to understand service meshes before I make claims about them.
  • Figure out how to shrink cold-start time on the Render backend (Car-Match) without leaving the free tier.

References