How to Fetch EC2 CPU and IOPS Data using Python (Boto3) and CloudWatch

Fetching CPU Utilization is straightforward, but fetching IOPS (Input/Output Operations Per Second) is a common stumbling block. This is because AWS splits storage metrics between the instance itself (for instance store) and the EBS service (for attached volumes).This guide will walk you through the correct way to retrieve both using Python.

Contents

Prerequisites

  • Python 3.x installed.
  • Boto3 library (pip install boto3).
  • AWS Credentials configured (via ~/.aws/credentials or environment variables).
  • IAM Permissions: Your user/role needs cloudwatch:GetMetricStatistics and ec2:DescribeInstances.

Part 1: The Easy Part (CPU Utilization)

CPU metrics are standard for all instances and live in the AWS/EC2 namespace. We can use the get_metric_statistics API to fetch them.

Key Concept: CloudWatch returns data points. To get a single “current” number, we usually request the last few minutes of data and take the average.

Part 2: The Hard Part (IOPS & The “Missing” Data)

If you look for IOPS metrics in the AWS/EC2 namespace, you will see DiskReadOps and DiskWriteOps.

  • The Trap: These metrics only track “Instance Store” (ephemeral) volumes.
  • The Reality: Most modern EC2 instances use EBS (Elastic Block Store) volumes. EBS metrics live in the AWS/EBS namespace and are reported per volume, not per instance.

To get the “Total IOPS” for an instance, your script must:

  1. Identify all EBS volumes attached to the instance.
  2. Fetch VolumeReadOps and VolumeWriteOps for each volume from the AWS/EBS namespace.
  3. Sum them up.

The Complete Script

This script solves the aggregation problem. It fetches CPU utilization and then calculates the total Read/Write IOPS across all attached EBS volumes.

Python

import boto3
import datetime
from botocore.exceptions import ClientError

def get_ec2_metrics(instance_id, region='us-east-1'):
    cw = boto3.client('cloudwatch', region_name=region)
    ec2 = boto3.client('ec2', region_name=region)

    # Time window: Last 10 minutes
    end_time = datetime.datetime.utcnow()
    start_time = end_time - datetime.timedelta(minutes=10)
    period = 300  # 5 minute intervals

    print(f"--- Metrics for {instance_id} ({region}) ---")

    # 1. Fetch CPU Utilization
    try:
        cpu_response = cw.get_metric_statistics(
            Namespace='AWS/EC2',
            MetricName='CPUUtilization',
            Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
            StartTime=start_time,
            EndTime=end_time,
            Period=period,
            Statistics=['Average']
        )
        
        if cpu_response['Datapoints']:
            # Get the most recent data point
            latest_cpu = sorted(cpu_response['Datapoints'], key=lambda x: x['Timestamp'])[-1]
            print(f"CPU Utilization: {latest_cpu['Average']:.2f}%")
        else:
            print("CPU Utilization: No data available")

    except ClientError as e:
        print(f"Error fetching CPU: {e}")

    # 2. Fetch EBS IOPS (Aggregated across all attached volumes)
    try:
        # Find attached volumes
        instance_info = ec2.describe_instances(InstanceIds=[instance_id])
        volumes = instance_info['Reservations'][0]['Instances'][0].get('BlockDeviceMappings', [])
        
        total_read_ops = 0
        total_write_ops = 0
        has_volumes = False

        for vol in volumes:
            vol_id = vol['Ebs']['VolumeId']
            has_volumes = True
            
            for metric_name in ['VolumeReadOps', 'VolumeWriteOps']:
                response = cw.get_metric_statistics(
                    Namespace='AWS/EBS',
                    MetricName=metric_name,
                    Dimensions=[{'Name': 'VolumeId', 'Value': vol_id}],
                    StartTime=start_time,
                    EndTime=end_time,
                    Period=period,
                    Statistics=['Sum']
                )
                
                if response['Datapoints']:
                    # Sort to get latest, sum to aggregate
                    latest_point = sorted(response['Datapoints'], key=lambda x: x['Timestamp'])[-1]
                    value = latest_point['Sum']
                    
                    if metric_name == 'VolumeReadOps':
                        total_read_ops += value
                    else:
                        total_write_ops += value

        if has_volumes:
            # CloudWatch returns "Ops" (Total count in the period). 
            # To get IOPS (Ops Per Second), divide by Period.
            read_iops = total_read_ops / period
            write_iops = total_write_ops / period
            
            print(f"Total Read IOPS:  {read_iops:.2f}")
            print(f"Total Write IOPS: {write_iops:.2f}")
        else:
            print("No EBS volumes attached.")

    except ClientError as e:
        print(f"Error fetching IOPS: {e}")

# Usage
# Replace with your actual Instance ID
get_ec2_metrics('i-0123456789abcdef0') 

Critical Implementation Details

  1. Ops vs. IOPS: CloudWatch returns the count of operations in the period (VolumeReadOps). To get the rate (IOPS), you must divide this count by the period in seconds (e.g., Count / 300).
  2. Latency: CloudWatch metrics are not instant. There is typically a 5-15 minute delay for standard metrics. If you need real-time data, you must enable Detailed Monitoring, which incurs extra costs.
  3. Namespace Confusion: Always verify if you are monitoring an Instance Store volume (AWS/EC2 > DiskReadOps) or an EBS volume (AWS/EBS > VolumeReadOps). 90% of the time, you want the latter.