TL;DR
  • Cloud waste is an architecture problem, not a procurement problem
  • The five biggest waste categories each have a specific AWS tool to fix them
  • Quick wins (EC2 rightsizing, idle IPs, orphaned volumes) can be done today
  • Structural fixes (VPC endpoints, CloudFront, tagging) unlock the material savings and reduce data transfer costs
30% average cloud spend wasted (Flexera)
$0.045 per GB through NAT Gateway
$0.01 per GB cross-AZ, each direction

Cloud waste is not accidental overspending on deliberate choices. It is spending on things that should have been switched off months ago, on services provisioned for peak capacity that never arrived, or on data moving between places it has no business moving. These are architectural decisions, or the absence of them. Most AWS cost optimisation fails because teams try to negotiate unit costs instead of fixing the architecture that generates the waste.

The accountant asks "how much?" The engineer asks "why?" Only one of those questions leads to a smaller bill next month.

Accountant reads
  • Total spend up 18%
  • EC2 is the biggest line item
  • Monthly delta flagged to engineering
Engineer reads
  • NAT Gateway processed 4TB last month - why?
  • 3 RDS instances running at 8% CPU
  • Resources still running in ap-southeast-1

Five waste categories: where your AWS bill is actually going

1

Idle and over-provisioned EC2 instances

A proof-of-concept instance becomes production and nobody questions the size again. Idle instances from decommissioned workloads compound the problem. EC2 rightsizing is often the fastest way to reduce monthly costs.

  • Open Compute Optimizer - recommendations are already waiting for you
  • Filter Cost Explorer by instance, sort by lowest data transfer out
  • Action the top five rightsizing recommendations immediately
Tool: AWS Compute Optimizer + Trusted Advisor
2

NAT Gateway charges

The $0.045/hour per-gateway cost looks fine at provisioning time. The $0.045/GB data processing charge is where it compounds - especially when Lambda or ECS in private subnets call AWS services at volume.

  • Gateway VPC endpoints for S3 and DynamoDB are free
  • Interface endpoints for Secrets Manager, SSM, ECR cost ~$0.01/hour but eliminate NAT processing entirely
  • Check your top NAT data sources in VPC Flow Logs
Tool: VPC Endpoints (free for S3/DynamoDB)
3

Cross-AZ and egress data transfer

Cross-AZ is $0.01/GB in each direction. In a high-volume microservices architecture with services spread across AZs, this becomes a meaningful number fast. Egress to the internet starts at $0.09/GB.

  • Instrument traffic at the AZ level to find cross-AZ patterns
  • Serve large assets through CloudFront - origin fetch from within AWS is free or near-free
  • CloudFront eliminates a large proportion of origin requests entirely via caching
Tool: CloudFront + VPC Flow Logs
4

Over-provisioned RDS

A db.r5.2xlarge for a dev environment that is active four hours a day is not a reasonable trade-off. RDS auto-stop restarts after seven days, so manual management does not scale.

  • Compute Optimizer now covers RDS and will flag specific downsize recommendations
  • Migrate variable-load databases to Aurora Serverless v2 - scales to 0.5 ACUs when idle
Tool: Compute Optimizer + Aurora Serverless v2
5

Orphaned Elastic IPs and EBS volumes

Elastic IPs cost $0.005/hour when unattached ($3.60/month each). Detached EBS volumes from terminated instances accumulate silently. Manual snapshots from backup scripts never expire unless you enforce a retention policy.

  • Trusted Advisor surfaces unattached Elastic IPs directly
  • Run a monthly sweep of detached EBS volumes in Cost Explorer or the EC2 console
  • Audit EBS and RDS snapshots older than your retention window
Tool: Trusted Advisor + Cost Explorer

Quick wins vs structural fixes

Do today
  • Run Compute Optimizer, action top 5 rightsizing recommendations
  • Release unattached Elastic IPs
  • Delete detached EBS volumes older than 30 days
  • Check for resources in regions you are not actively using
Sprint-level fixes
  • Deploy VPC endpoints for top AWS services used from private subnets
  • Move static asset delivery to CloudFront
  • Migrate dev databases to Aurora Serverless v2
  • Implement resource tagging and cost allocation by team
  • Add Infracost to CI/CD to surface cost impact at PR time
The material savings are in the structural fixes. A 10% reduction from EC2 rightsizing is useful. A 40% reduction from eliminating unnecessary data transfer and fixing NAT routing is a different conversation entirely. This is why AWS cost optimisation is fundamentally an architecture problem.

Ongoing visibility and cloud cost management

The goal is not to read the bill better once. Build the infrastructure so cost anomalies surface automatically, before they compound.

  • Tagging enforcement. Every resource needs at minimum an environment tag and a team tag. Use AWS Config rules to flag untagged resources on creation. Without tags, Cost Explorer tells you how much EC2 costs - not whose EC2 costs that much.
  • AWS Budgets alerts. Set alerts at 80% and 100% per environment and per top service. Send to a channel engineers actually read - not email.
  • Cost Anomaly Detection. Free, 15 minutes to configure. Catches a runaway Lambda loop or misconfigured pipeline the same day it starts, not at month end.