AWS Cloud Support Internship: Mastering Troubleshooting and Architecture
Context: Summer 2025 AWS Cloud Support Associate internship in Seattle (Oscar building). My cohort lived inside labs, mock tickets, and certification prep—no direct customer production support.
AI assist: ChatGPT + Amazon Q Business helped summarize service docs and draft troubleshooting checklists; I edited everything before submitting to mentors.
Status: Honest reflection so recruiters see exactly what I touched (and what remains on the “practice only” list).
Reality snapshot
- 12-week program split between mornings (Cloud Practitioner/SAA coursework, architecture reviews) and afternoons (ticket simulations, hands-on labs, runbook writing).
- I rotated through EC2, S3, IAM, networking, and observability labs; each lab ended with a quiz + short retro shared with a senior engineer.
- Capstone was a media metadata pipeline built entirely in AWS sandbox accounts: S3 → Lambda (FFmpeg) → DynamoDB → API Gateway + static dashboard. No external users relied on it.
Weekly structure
| Week(s) | Focus | Deliverables |
|---|---|---|
| 1–2 | Orientation, Cloud Practitioner refresh | Daily lab reports, IAM policy walk-through, “how to escalate” checklist. |
| 3–4 | Linux + networking deep dive | Troubleshoot EC2 boot loops, build VPC peering diagrams, script CloudWatch log exports. |
| 5–6 | Storage & security | S3 bucket policy labs, KMS envelope encryption exercises, Bedrock prompt-logging prototype. |
| 7–8 | Observability + automation | CloudWatch dashboard for mock SaaS, Cost Explorer alarms, npm audit playbooks. |
| 9–10 | Capstone build | S3→Lambda→DynamoDB metadata pipeline, runbook, health checks, IaC template. |
| 11 | Support simulations | Pager-style ticket drills, on-call shadowing, Amazon Leadership Principles reviews. |
| 12 | Presentations + retros | Capstone demo, personal growth plan, peer feedback write-up. |
Capstone: media metadata pipeline (lab-only)
Architecture
- Input: Files land in
media-ingest-bucket. - Processing: Node.js 20 Lambda pulls the object, shells out to FFmpeg to extract metadata, pushes a compact JSON doc to DynamoDB.
- API: API Gateway exposes read-only endpoints so a static dashboard (S3 + CloudFront) can query the table.
- Observability: CloudWatch logs, metrics, and alarms track Lambda duration, DynamoDB throttle counts, and FFmpeg exits. Budgets/Cost Explorer alerts guard the lab account.
Resources:MediaBucket:Type: AWS::S3::BucketProperties:NotificationConfiguration:LambdaConfigurations:- Event: s3:ObjectCreated:*Function: !GetAtt MetadataLambda.ArnMetadataLambda:Type: AWS::Lambda::FunctionProperties:Runtime: nodejs20.xHandler: index.handlerCode:S3Bucket: !Ref ArtifactBucketS3Key: lambda.zipEnvironment:Variables:TABLE_NAME: !Ref MetadataTableMetadataTable:Type: AWS::DynamoDB::TableProperties:BillingMode: PAY_PER_REQUESTAttributeDefinitions:- AttributeName: FileKeyAttributeType: SKeySchema:- AttributeName: FileKeyKeyType: HASH
What worked
- Lambda stayed under 2 GB memory/30 s duration even when FFmpeg processed 250 MB sample files.
- DynamoDB recorded ~300 sample rows with zero throttling thanks to on-demand mode.
- CloudWatch dashboard (latency, invocations, FFmpeg exit codes, DynamoDB consumed RCUs) made it easy to talk through the design review.
- Step Functions “stretch goal” doc lists how I’d fan-out enrichment jobs if this ever handled more than demo traffic.
What still needs work
- Replace API key auth with Cognito + IAM authorizers (on the backlog).
- Integration tests exist locally but CI/CD only runs lint + unit tests. Need to script
sam validate, deploy to a staging stack, and capture screenshots automatically. - FFmpeg binary came from a public layer; I owe the team a security review and pinning strategy before recommending it anywhere else.
Tooling & automations I leaned on
- Docs-as-code: Every lab ended with a markdown runbook + diagram (Mermaid + Excalidraw). These live in
notes/aws-internship/and were reviewed by mentors weekly. - Cost visibility: Budgets (email + Slack) triggered at 10% and 25% of the sandbox allowance, mostly to prove I could wire alerts.
- Security workflows: npm audit CI (both frontend + backend), OWASP ZAP baseline workflow for the Render-deployed intern app, Bedrock prompt logging experiments with Amazon Q.
- AI helpers: Amazon Q Business answered “where does this service log?” while ChatGPT helped translate dense docs into playbooks. Every AI-assisted snippet is annotated in the repo so it’s obvious what I edited.
Proof & artifacts
- Lab tracker: https://github.com/BradleyMatera/aws-internship-journal (private; screenshots/redacted excerpts available).
- Dashboards:
dashboards/cloudwatch-dashboard.jsonplus PNG exports in the repo. - Runbooks: See
notes/runbooks/*.md—each links to the relevant log groups, budgets, or config files. - Capstone deck: PDF stored under
presentations/capstone-media-pipeline.pdfwith the architecture diagram, metrics, and TODO table.
Gaps & next steps
- Earn Developer Associate and re-run the labs with IaC-first deployments.
- Pair with a real AWS Support engineer on a shadow shift to see how customer tickets differ from our simulations.
- Harden IAM knowledge (resource-level permissions, SCPs, org design) beyond what the internship covered.
- Turn the FFmpeg Lambda into a public tutorial once I replace the binary layer and add full test coverage.