Developing With AWS Cost and Usage (CUR) Files
Building internal cost tools with AWS starts from understanding the CUR schema.
Join the DZone community and get the full member experience.
Join For FreeIf you have found this blog post, you have stumbled upon the bowels of AWS billing, a place where the complexity and variety of options are the whole reason cloud cost management tools exist. Welcome to Cost and Usage Reports (CUR), the raw firehose of billing data that provides the most detail about service cost and consumption on AWS, with files that grow to 10s or 100s of gigabytes.
To work with CUR data, you will need to understand its schema, or data dictionary, which spans 10 tables and hundreds of columns. Fortunately, for most billing cases, only these tables will be relevant.
identity | bill | lineItem | pricing | product |
---|
Columns from the following tables are only populated if there is data, for example, by buying a Reserved Instance or tagging resources.
reservation | resourceTags | savingsPlan | costCategory | discount |
---|
Cost and usage reports must be enabled first, and then there are several methods to parse them. The identity
, bill
, and lineItem
schemas are a good foundation for working with the other tables if needed. Finally, it is possible to write SQL against CUR data. Let's review the basic CUR schema and investigate methods to query it, parse it, and use it for your own internal tools.
Enable and Download CUR
To enable CUR, head to the AWS billing console. You will need to have an S3 bucket setup first to be the report destination and have Cost Explorer itself turned on. For Vantage users, these resources are created upon connection.
It can initially take up to 24 hours for the first reports to generate, but after this AWS will deliver the reports daily. The data is not finalized until the end of the month, wh discounts or support amounts will be applied. There are also some particulars around how reports create multiple files, size limits, enabling additional things like Resource IDs, and more in the AWS docs on CUR.
Parsing Cost and Usage Reports
CUR data comes as a CSV file, as shown above, and can therefore be opened in Excel or Google Sheets and easily parsed with a CSV library from any programming language. Of course, just having this data is not very useful. It needs to be deconstructed to pull out useful cost data.
At the end of this article, we cover various data destinations like Redshift and Quicksight, but CUR data can be queried programmatically from boto3. Again, this capability is only useful if you understand the CUR schema.
import boto3
import datetime as dt
start = dt.datetime.now()
end = start - dt.timedelta(30)
cd = boto3.client('ce', 'us-east-1')
data = cd.get_cost_and_usage(
TimePeriod={'Start': start, 'End': end},
Granularity='DAILY', Metrics=['UnblendedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'LINKED_ACCOUNT'},
{'Type': 'DIMENSION', 'Key': 'SERVICE'}
], **kwargs)
Using boto3 to get cost and usage data from the Cost Explorer API. Full code.
The Cost and Usage Report (CUR) Schema
The identity
, bill
, and lineItem
tables will always have data. Identity is a metadata table, Billing deals with the actual billing period, entity, and type (e.g., Amazon refunded Account for Month), and finally, the Line item table contains all the actual cost and usage data. Most likely you will have some data in the product
and pricing
columns unless you are fully on the free tier, using only spot instances, making only Marketplace purchases, or other edge cases. With a mix of services, inspecting the product and pricing tables will give you up-to-date information on the list prices and product configurations of the services in your account.
The other five tables will be populated if you are using savings plans or reserved instances, cost categories, or tags. In fact, CUR is most useful for understanding the impact that discounts are having and where there may be additional opportunities for savings.
For the tables below, note that the column header in the CUR CSV is tableName/ColumnName
, for example ,lineItem/BlendedCost
.
identity Table
The usefulness of these columns comes from the fact that a single CUR report may be split over multiple files, especially if Resource IDs are enabled. In that case, the LineItemId
will be the same across those files for the same resource.
Column | Notes |
---|---|
LineItemId | Not for identifying same lineItem across reports |
TimeInterval | e.g. 2017-11-01T00:00:00Z/2017-12-01T00:00:00Z |
bill Table
The biggest thing to be aware of with the bill table is that CUR data is not considered “finalized” until after the end of the billing period. AWS applies refunds and some discounts at the end of the month, before it actually invoices the account. In this table, you can check for this by determining if the invoiceID
is blank. Using the Cost Explorer API, check for the estimate parameter to be set to False for the report to be finalized.
Column | Notes |
---|---|
BillingEntity | [AWS, AISPL, AWS Marketplace] |
BillingPeriodEndDate | e.g. 2021-10-01T00:00:00Z |
BillingPeriodStartDate | e.g. 2021-10-01T00:00:00Z |
BillType | [Anniversary, Purchase, Refund] |
InvoiceId | Until the report is final, the InvoiceId is blank. |
InvoicingEntity | [Amazon Web Services, Amazon Web Services EMEA SARL, AISPL] |
PayerAccountId | For an organization in AWS Organizations, this is the account ID of the management account. |
lineItem Table
Two concepts make or break this table. The first is Blended vs. NetUnblended vs. Unblended costs. Blended costs incorporate the average of on-demand, and committed-use spend across accounts within organizations. Unblended costs are essentially the costs that show up on your bill. NetUnblended costs incorporate discounts. AWS recommends using UnblendedCosts, a “cash basis” of accounting for cloud computing, unless your organization has specialized needs.
ProductCode, UsageType, and Operation are the other set of columns to focus on. Between them it is possible to get very granular on costs. Make sure to GROUP BY in SQL on these and use aggregates.
Column | Notes |
---|---|
AvailabilityZone | |
BlendedCost | BlendedCost is blank for line items that have a LineItemType of Discount, which are only calculated for unblended costs. |
BlendedRate | Average cost for organization level SKUs. |
CurrencyCode | Available AWS currencies. |
LegalEntity | [Amazon Web Services, AISPL] |
LineItemDescription | Usage description. For size-flexible RIs, this could be a different instance than the one actually being used. |
LineItemType | e.g. Usage is all on-demand spend. See all types. |
NetUnblendedCost | Only relevant for discounts. |
NetUnblendedRate | Only relevant for discounts. |
NormalizationFactor | Only for size-flexible RIs. |
NormalizedUsageAmount | |
Operation | For example, a value of RunInstances indicates the operation of an Amazon EC2 instance. |
ProductCode | e.g. ‘Amazon EC2’. See a full list of AWS products. |
ResourceId | Needs to be enabled. Increases report size significantly. |
TaxType | |
UnblendedCost | |
UnblendedRate | Zero for EC2 and RDS using RIs. |
UsageAccountId | You can use this field to track costs or usage by account. |
UsageAmount | Certain subscription charges will have a UsageAmount of 0. |
UsageEndDate | e.g. 2021-10-01T00:00:00Z |
UsageStartDate | e.g. 2021-10-01T00:00:00Z |
UsageType | For example, USW2-BoxUsage:m2.2xlarge describes an M2 High Memory Double Extra Large instance in the US West (Oregon) Region. |
Transforming CUR for Cost Visibility
There are many destinations for CUR which can help organize and make the data queryable. The cloud pricing API provides HTTP query params where you can manipulate data that has already been aggregated from CUR. The console provides filter sets, visualizations, and notifications in Slack based on CUR data.
Amazon also provides direct integrations to the following SQL services:
- Athena
- Quicksight
- Redshift
Each of these requires configuration, but once they are up and running queries like the following can be written (source):
SELECT
line_item_product_code,
line_item_line_item_description,
SUM(line_item_unblended_cost) AS sum_line_item_unblended_cost
FROM
${table_name}
WHERE
${date_filter}
AND line_item_product_code = 'AmazonEC2'
AND line_item_line_item_type NOT IN ('Tax','Refund','Credit')
GROUP BY
line_item_product_code,
line_item_line_item_description
ORDER BY
sum_line_item_unblended_cost DESC;
Totaling EC2 costs using the CUR schema.
As we saw in the lineItem section above, this query pulls from the ProductCode
and UnblendedCost
columns to sum the cost of EC2 for the time period covered in the report. Note that it excludes taxes, refunds, and credits.
Conclusion
This post is an introduction to the world of cost and usage reports, an area of AWS that we have been developing for years. Whether you write your own programs, pipe CUR into cost visibility tools, or build SQL queries, the breadth and depth of the data Amazon provide can be overwhelming. By understanding the basics of the CUR schema, you can zoom in on the columns you need quickly to be successful in developing with CUR.
Published at DZone with permission of Everett Berry. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments