Developing With AWS Cost and Usage (CUR) Files

Building internal cost tools with AWS starts from understanding the CUR schema.

Everett Berry

Oct. 05, 22 · Tutorial

Likes (1)

Comment

Save

5.5K Views

If you have found this blog post, you have stumbled upon the bowels of AWS billing, a place where the complexity and variety of options are the whole reason cloud cost management tools exist. Welcome to Cost and Usage Reports (CUR), the raw firehose of billing data that provides the most detail about service cost and consumption on AWS, with files that grow to 10s or 100s of gigabytes.

To work with CUR data, you will need to understand its schema, or data dictionary, which spans 10 tables and hundreds of columns. Fortunately, for most billing cases, only these tables will be relevant.

identity	bill	lineItem	pricing	product

Columns from the following tables are only populated if there is data, for example, by buying a Reserved Instance or tagging resources.

reservation	resourceTags	savingsPlan	costCategory	discount

Cost and usage reports must be enabled first, and then there are several methods to parse them. The identity, bill, and lineItem schemas are a good foundation for working with the other tables if needed. Finally, it is possible to write SQL against CUR data. Let's review the basic CUR schema and investigate methods to query it, parse it, and use it for your own internal tools.

Enable and Download CUR

A brand new cost and usage report is configured in the billing console.

To enable CUR, head to the AWS billing console. You will need to have an S3 bucket setup first to be the report destination and have Cost Explorer itself turned on. For Vantage users, these resources are created upon connection.

It can initially take up to 24 hours for the first reports to generate, but after this AWS will deliver the reports daily. The data is not finalized until the end of the month, wh discounts or support amounts will be applied. There are also some particulars around how reports create multiple files, size limits, enabling additional things like Resource IDs, and more in the AWS docs on CUR.

Parsing Cost and Usage Reports

A snippet of a cost and usage report showing columns from the identity and bill tables.

CUR data comes as a CSV file, as shown above, and can therefore be opened in Excel or Google Sheets and easily parsed with a CSV library from any programming language. Of course, just having this data is not very useful. It needs to be deconstructed to pull out useful cost data.

At the end of this article, we cover various data destinations like Redshift and Quicksight, but CUR data can be queried programmatically from boto3. Again, this capability is only useful if you understand the CUR schema.

   
  
 
   import boto3
import datetime as dt

start = dt.datetime.now()
end = start - dt.timedelta(30)

cd = boto3.client('ce', 'us-east-1')
data = cd.get_cost_and_usage(
    TimePeriod={'Start': start, 'End':  end},
    Granularity='DAILY', Metrics=['UnblendedCost'],
    GroupBy=[
        {'Type': 'DIMENSION', 'Key': 'LINKED_ACCOUNT'},
        {'Type': 'DIMENSION', 'Key': 'SERVICE'}
    ], **kwargs) 
  

Using boto3 to get cost and usage data from the Cost Explorer API. Full code.

The Cost and Usage Report (CUR) Schema

The identity , bill , and lineItem tables will always have data. Identity is a metadata table, Billing deals with the actual billing period, entity, and type (e.g., Amazon refunded Account for Month), and finally, the Line item table contains all the actual cost and usage data. Most likely you will have some data in the product and pricing columns unless you are fully on the free tier, using only spot instances, making only Marketplace purchases, or other edge cases. With a mix of services, inspecting the product and pricing tables will give you up-to-date information on the list prices and product configurations of the services in your account.

The other five tables will be populated if you are using savings plans or reserved instances, cost categories, or tags. In fact, CUR is most useful for understanding the impact that discounts are having and where there may be additional opportunities for savings.

For the tables below, note that the column header in the CUR CSV is tableName/ColumnName, for example ,lineItem/BlendedCost.

identity Table

The usefulness of these columns comes from the fact that a single CUR report may be split over multiple files, especially if Resource IDs are enabled. In that case, the LineItemId will be the same across those files for the same resource.

Column	Notes
LineItemId	Not for identifying same lineItem across reports
TimeInterval	e.g. 2017-11-01T00:00:00Z/2017-12-01T00:00:00Z

bill Table

The biggest thing to be aware of with the bill table is that CUR data is not considered “finalized” until after the end of the billing period. AWS applies refunds and some discounts at the end of the month, before it actually invoices the account. In this table, you can check for this by determining if the invoiceID is blank. Using the Cost Explorer API, check for the estimate parameter to be set to False for the report to be finalized.

Column	Notes
BillingEntity	[AWS, AISPL, AWS Marketplace]
BillingPeriodEndDate	e.g. 2021-10-01T00:00:00Z
BillingPeriodStartDate	e.g. 2021-10-01T00:00:00Z
BillType	[Anniversary, Purchase, Refund]
InvoiceId	Until the report is final, the InvoiceId is blank.
InvoicingEntity	[Amazon Web Services, Amazon Web Services EMEA SARL, AISPL]
PayerAccountId	For an organization in AWS Organizations, this is the account ID of the management account.

lineItem Table

Two concepts make or break this table. The first is Blended vs. NetUnblended vs. Unblended costs. Blended costs incorporate the average of on-demand, and committed-use spend across accounts within organizations. Unblended costs are essentially the costs that show up on your bill. NetUnblended costs incorporate discounts. AWS recommends using UnblendedCosts, a “cash basis” of accounting for cloud computing, unless your organization has specialized needs.

ProductCode, UsageType, and Operation are the other set of columns to focus on. Between them it is possible to get very granular on costs. Make sure to GROUP BY in SQL on these and use aggregates.

Column	Notes
AvailabilityZone
BlendedCost	BlendedCost is blank for line items that have a LineItemType of Discount, which are only calculated for unblended costs.
BlendedRate	Average cost for organization level SKUs.
CurrencyCode	Available AWS currencies.
LegalEntity	[Amazon Web Services, AISPL]
LineItemDescription	Usage description. For size-flexible RIs, this could be a different instance than the one actually being used.
LineItemType	e.g. Usage is all on-demand spend. See all types.
NetUnblendedCost	Only relevant for discounts.
NetUnblendedRate	Only relevant for discounts.
NormalizationFactor	Only for size-flexible RIs.
NormalizedUsageAmount
Operation	For example, a value of RunInstances indicates the operation of an Amazon EC2 instance.
ProductCode	e.g. ‘Amazon EC2’. See a full list of AWS products.
ResourceId	Needs to be enabled. Increases report size significantly.
TaxType
UnblendedCost
UnblendedRate	Zero for EC2 and RDS using RIs.
UsageAccountId	You can use this field to track costs or usage by account.
UsageAmount	Certain subscription charges will have a UsageAmount of 0.
UsageEndDate	e.g. 2021-10-01T00:00:00Z
UsageStartDate	e.g. 2021-10-01T00:00:00Z
UsageType	For example, USW2-BoxUsage:m2.2xlarge describes an M2 High Memory Double Extra Large instance in the US West (Oregon) Region.

Transforming CUR for Cost Visibility

There are many destinations for CUR which can help organize and make the data queryable. The cloud pricing API provides HTTP query params where you can manipulate data that has already been aggregated from CUR. The console provides filter sets, visualizations, and notifications in Slack based on CUR data.

Amazon also provides direct integrations to the following SQL services:

Athena
Quicksight
Redshift

Each of these requires configuration, but once they are up and running queries like the following can be written (source):

   
  
 
   SELECT
  line_item_product_code,
  line_item_line_item_description,
  SUM(line_item_unblended_cost) AS sum_line_item_unblended_cost
FROM
  ${table_name}
WHERE
  ${date_filter}
  AND line_item_product_code = 'AmazonEC2'
  AND line_item_line_item_type NOT IN ('Tax','Refund','Credit')
GROUP BY
  line_item_product_code,
  line_item_line_item_description
ORDER BY
  sum_line_item_unblended_cost DESC; 
  

Totaling EC2 costs using the CUR schema.

As we saw in the lineItem section above, this query pulls from the ProductCode and UnblendedCost columns to sum the cost of EC2 for the time period covered in the report. Note that it excludes taxes, refunds, and credits.

Conclusion

This post is an introduction to the world of cost and usage reports, an area of AWS that we have been developing for years. Whether you write your own programs, pipe CUR into cost visibility tools, or build SQL queries, the breadth and depth of the data Amazon provide can be overwhelming. By understanding the basics of the CUR schema, you can zoom in on the columns you need quickly to be successful in developing with CUR.

AWS

Published at DZone with permission of Everett Berry. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending