The DevOps-to-AI Toolkit

17 Claude Code Prompts for Infrastructure Engineers

By Ebrahim Nassiep | ebie.online | Cape Town, South Africa

I spent 8 years in production DevOps — AWS, Terraform, Kubernetes, on-call at 2AM. Six months ago I started using Claude Code for infrastructure work. These are the 17 prompts that actually save hours, not the toy examples that break in production.

No fluff. Copy, paste, adjust for your stack.

01Terraform & Infrastructure as Code

Prompt 1: Refactor for Modules

I have a Terraform root module with [X] resources that are copy-pasted across environments. Identify which resources should become reusable child modules. Output the module structure, variable definitions, and a migration plan that avoids state file corruption.

Prompt 2: Cost Optimisation Audit

Audit this Terraform configuration for AWS cost waste. Look for: oversized EC2 instances, unoptimized EBS volumes (gp2 vs gp3), unused NAT Gateways, and missing reserved capacity. Provide specific instance type recommendations and estimated monthly savings.

Prompt 3: State Drift Detection

Write a Python script that compares terraform show -json output against actual AWS resources using boto3. Report drift in: instance types, security group rules, tag values, and attached volumes. Exit code 0 = no drift, 1 = drift detected.

Prompt 4: Multi-Environment CI/CD Pipeline

Design a GitHub Actions workflow that runs terraform plan on PR, requires approval for terraform apply, and posts the plan output as a PR comment. Include OIDC authentication to AWS (no long-lived secrets), state locking via DynamoDB, and a rollback strategy.

02Kubernetes & Containers

Prompt 5: Resource Right-Sizing

Given this Kubernetes Deployment YAML and 7 days of Prometheus metrics (CPU/memory percentiles), calculate optimal requests and limits. Factor in: HPA target 70%, headroom for traffic spikes, and node packing efficiency. Output updated YAML with justified values.

Prompt 6: Pod Disruption Budget Strategy

I have a StatefulSet with 3 replicas and a 99.9% uptime SLA. Design PodDisruptionBudgets, topology spread constraints, and a node drain procedure that allows cluster upgrades without breaching SLA. Include runbook steps.

Prompt 7: Security Hardening Checklist

Audit this Kubernetes namespace for security misconfigurations. Check: privileged containers, hostPath mounts, missing NetworkPolicies, RBAC over-permissions, container images running as root, and secrets in env vars. Output a prioritized remediation list with CVE references where applicable.

03Observability & Incident Response

Prompt 8: Alert Tuning

These are my current Prometheus alert rules. Identify which alerts are noisy (high false-positive rate), which are missing, and which thresholds need tuning. Use these SLOs: 99.9% availability, P99 latency < 500ms. Output optimized alert YAML with runbook links.

Prompt 9: LogQL Query Builder

I need to detect [specific error pattern] in Loki logs across [service names]. Build a LogQL query that: filters by namespace, extracts structured fields from JSON, groups by pod, and triggers an alert if the rate exceeds 5 errors/min over 5 minutes. Include a Grafana dashboard panel JSON.

Prompt 10: Incident Postmortem Template

Based on this PagerDuty timeline and Slack thread export, draft an incident postmortem following the SRE book format: summary, impact, timeline, root cause, resolution, lessons learned, action items. Keep it under 2 pages. Assign action items with owners and due dates.

04Migration & Modernisation

Prompt 11: Lift-and-Shift Assessment

I have [X] EC2 instances running [list of services]. Assess which can move to ECS Fargate, which should stay on EC2, and which need re-architecting. Include a migration sequence that minimizes downtime and a cost comparison table.

Prompt 12: Database Migration Runbook

Write a step-by-step runbook for migrating a PostgreSQL [version] database from EC2 to RDS with minimal downtime. Include: replication setup using pglogical, cutover procedure with exact commands, rollback plan, and verification checklist. Target: < 5 minutes downtime.

Prompt 13: CI/CD Migration (Jenkins → GitHub Actions)

Convert this Jenkinsfile pipeline to GitHub Actions. Maintain these requirements: matrix builds across [Python versions], Docker image builds with layer caching, semantic versioning, and artifact uploads to S3. Include a migration checklist for secrets and credentials.

05AI-Native Development Patterns

Prompt 14: Test Data Generation

Generate realistic test data for this PostgreSQL schema: [paste schema]. Create [X] rows with referential integrity, realistic distributions (not uniform random), and edge cases (nulls, long strings, Unicode). Output as COPY-compatible SQL and a Python script for regeneration.

Prompt 15: Documentation from Code

Given this Terraform module / Python service / Kubernetes manifest, generate production-ready documentation: architecture diagram description, input/output tables, deployment instructions, and troubleshooting FAQ. Write it in the style of AWS documentation — precise, no fluff.

Prompt 16: Runbook from Playbook

Convert this Ansible playbook into an interactive runbook. Break it into: pre-flight checks, main procedure with go/no-go decision points, verification steps, and rollback commands. Format as a checklist an SRE can follow during an incident at 3AM.

Prompt 17: Architecture Decision Record

I need to choose between [Option A: EKS] and [Option B: ECS Fargate] for [specific workload]. Write an ADR following the Nygard format: context, decision, consequences, compliance, notes. Include: cost projection at 1K/10K/100K requests/day, operational complexity score (1-5), and team skill requirements.

06How to Use These

Copy the prompt into Claude Code or Claude.ai
Paste your context where brackets [like this] appear
Iterate — the first output is a draft. Ask for refinements.
Test in staging — never trust AI-generated infra code in production without review
Save the good ones — build your own prompt library

The Real Lesson

These prompts won't replace your judgment. What they do is compress the time from "I need to figure this out" to "I have a working draft."

The engineers who win the next 5 years aren't the ones who know the most Terraform. They're the ones who can leverage AI to ship infrastructure that would have taken a team of three.

Start with one prompt. Use it this week. Build from there.

Get the next toolkit

Subscribe at ebie.online for the monthly AI + Infrastructure dispatch.

→ ebie.online