AWS APIs and Extending Style in OpenAPI

A multi part series on my talk from the API Specifications Conference

Morad Ankri · Oct 30, 2019

Photo by Lee Robinson on Unsplash of Lions Gate Bridge, Vancouver, BC, Canada

Recently, I gave a talk at the API Specifications Conference in Vancouver, BC. The talks were not recorded, but attendees and people following along online found it interesting how we are using OpenAPI extensions at Transposit and how some of these extensions might be useful for the future progression of the OpenAPI Specification.

This blog post is the first in a series of a few posts where I’ll cover:

This series is for anyone who wants to understand APIs better, extend openAPI, or get into the guts of Amazon Web Service (AWS) APIs.

This first post will focus on parameter serialization by extending the style property in OpenAPI with a brief explanation on why and how we are using the OpenAPI Specification at Transposit.

Let’s first take a look at how SQL queries on Transposit work. For example, within Transposit a user might write a simple SELECT statement like this:

SELECT *
FROM api.list_items

This query is basically the same thing as this cURL request:

curl -X GET 'https://www.myapi.com/v3/items' 
-H 'Accept: application/json'
-H 'Authorization: Bearer ya29.GGGGG'

But our platform abstracts away from the developer concerns like:

A more complex example would be this JOIN statement:

SELECT *
FROM api.list_items AS L
JOIN api.get_item_details AS D ON L.itemId = D.id

Which is the same as if the user had to call the list items API:

curl -X GET 'https://www.myapi.com/v3/items' 
-H 'Accept: application/json'
-H 'Authorization: Bearer ya29.GGGGG'

And then for each item in the results, the user would have to call an API to get details on each item, for example:

curl -X GET 'https://www.myapi.com/v3/items/0' 
-H 'Accept: application/json'
-H 'Authorization: Bearer ya29.GGGGG'
curl -X GET 'https://www.myapi.com/v3/items/1' 
-H 'Accept: application/json'
-H 'Authorization: Bearer ya29.GGGGG'

And so on and in the end combine all the results together.

OpenAPI for dynamic API clients

What we have basically described is a dynamic API client. It can work with different APIs, different data schemas, and do more for the developer. In these examples, we choose to show SQL queries because we can compare them to a relational database. We know that relational database can run SQL queries efficiently because they control the schema and the data. With third party APIs we can't control anything. The closest thing that we can do is to use an API description language.

This is where the OpenAPI Specification comes into play. We chose OpenAPI because it has grown to be the most widely adopted and continuing to grow. We use OpenAPI documents to get a consistent view of different 3rd party APIs. This allows us to build the dynamic API client with a single code flow that will handle almost any API that can be documented with OpenAPI.

Amazon Web Service (AWS) APIs

The reality is, not all API vendors choose OpenAPI. AWS APIs aren’t described with OpenAPI. Let's see how we can support them in our dynamic API client.

The AWS CLI and the AWS Python SDK are using the same base library, Boto, that provides similar functionality. Boto is another example of a dynamic API client. As a developer that uses the Python SDK, you would access an S3 bucket, like this:

import boto3 

# Retrieve the list of existing buckets
s3 = boto3.client('s3') response = s3.list_buckets()

# Output the bucket names
print('Existing buckets:')
for bucket in response['Buckets']:
print(f' {bucket["Name"]}')

(Source: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-creating-buckets.html)

And you can do things like start EC2 instances:

import boto3 

ec2 = boto3.client('ec2')
response = ec2.describe_instances()
print(response)

ec2.start_instances(InstanceIds=[instance_id])

(Source: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/ec2-example-managing-instances.html)

Let's see how the Boto library works...

A picture of the Boto repository located here https://github.com/boto/botocore/tree/develop/botocore/data

In the Boto repository, you can see the list of the supported APIs. Each API has multiple files. The example below is for S3 but the file names are the same in all the services:

A picture of the Boto repository located here https://github.com/boto/botocore/tree/develop/botocore/data/s3/2006-03-01

The main file with the actual API schema is 'service-2.json' and the structure of this file is something like this:

{ 
"version": "2.0",
"metadata": { ... },
"operations": { ... },
"shapes": { ... }
}

Metadata has global information about the API, which contains the endpoint, protocol, names, API version, and more.

"metadata": {
"apiVersion": "2006-03-01",
"checksumFormat": "md5",
"endpointPrefix": "s3",
"globalEndpoint": "s3.amazonaws.com",
"protocol": "rest-xml",
"serviceAbbreviation": "Amazon S3",
"serviceFullName": "Amazon Simple Storage Service",
"serviceId": "S3",
"signatureVersion": "s3",
"uid": "s3-2006-03-01"
}

The API operations are described in the 'operations' object name, which contains the HTTP method, URL path, entry point for the request and response schema, and more.

"operations": {
"ListObjects": {
"name": "ListObjects",
"http": {
"method": "GET",
"requestUri": "/{Bucket}"
},
"input": {
"shape": "ListObjectsRequest"
},
"output": {
"shape": "ListObjectsOutput"
},
"errors": [{
"shape": "NoSuchBucket"
}],
},
...

The actual schemas are in the 'shapes' object, which is like the 'ref' object in OpenAPI. It can recursively describe nested objects and reuse definitions. The request schema includes the parameters and how they should be provided - path, query or body.

"shapes": {
"ListObjectsRequest": {
"type": "structure",
"required": ["Bucket"],
"members": {
"Bucket": {
"shape": "BucketName",
"location": "uri",
"locationName": "Bucket"
},
"Marker": {
"shape": "Marker",
"location": "querystring",
"locationName": "marker"
},
"MaxKeys": {
"shape": "MaxKeys",
"location": "querystring",
"locationName": "max-keys"
}

}
}

Boto -> OpenAPI

This is basically what we expect from an API schema so we tried to see if we could convert them to OpenAPI, but it didn't work.

AWS APIs don't use the path as a unique key for each API operation. In AWS APIs, all the operations have the same endpoint host and path as you can see in these two examples:

POST / HTTP/1.1 
Host: logs.us-west-2.amazonaws.com
X-Amz-Target: Logs_20140328.FilterLogEvents
Authorization: ...
X-Amz-Date: ...
X-Amz-Security-Token: ...
Content-Type: application/x-amz-json-1.1
Accept: application/json
POST / HTTP/1.1 
Host: logs.us-west-2.amazonaws.com
X-Amz-Target: Logs_20140328.GetLogEvents
Authorization: ...
X-Amz-Date: ...
X-Amz-Security-Token: ...
Content-Type: application/x-amz-json-1.1
Accept: application/json

AWS APIs use a header (i.e. X-Amz-Target) to identify the actual operation. In order to support OpenAPI and AWS APIs and still keep the same code for both, we had to create something in-between that looks mostly like OpenAPI but allows multiple APIs with the same path. We basically moved the operationId in OpenAPI to be the top level ID and the HTTP method and path to be a property of the API operation.

The second place that we had a problem was encoding the body as application/x-www-form-urlencoded. For request body with a content type of application/x-www-form-urlencoded, OpenAPI has three fields that can configure how it will work: style, explode, and allowReserved as you can see in the specification.

The combinations of style and the boolean flag explode tells us how to serialize or deserialize a value in different ways.

As described on swagger.io, "serialization means translating data structures or object state into a format that can be transmitted and reconstructed later."

For example, suppose we have two sets of parameters (for two different API calls). One is an array of color names, colorA -> ["blue","black"], and the other is an object which contains RGB values, colorB -> { "R": 100, "G": 200, "B": 150 }. Let's examine how we'd serialize these two types of parameters in an OpenAPI document:

style explode array object
matrix false ;colorA=blue,black ;colorB=R,100,G,200,B,150
matrix true ;colorA=blue;colorA=black ;R=100;G=200;B=150
label false .blue.black .R.100.G.200.B.150
label true .blue.black .R=100.G=200.B=150
form false colorA=blue,black colorB=R,100,G,200,B,150
form true colorA=blue&colorA=black R=100&G=200&B=150
simple false blue,black R,100,G,200,B,150
simple true blue,black R=100,G=200,B=150
spaceDelimited false blue%20black R%20100%20G%20200%20B%20150
pipeDelimited false blue|black R|100|G|200
deepObject true n/a colorB[R]=100&
colorB[G]=200&
colorB[B]=150

As with almost everything else in OpenAPI, it tries to capture the different ways that APIs work, but AWS found a different way to serialize values as application/x-www-form-urlencoded.

For example, here's JSON from the AWS CloudWatch API. How does AWS serialize it?

{
MetricName: "NetworkOut",
Namespace = "AWS/EC2",

Dimensions: [{
"Name": "AutoScalingGroupName",
"Value": "asg-A"
}],

Statistics: ["Sum", "Average"]

Period = 300,
StartTime = "2019-06-26T19:46:55.706Z",
EndTime: "2019-06-26T20:46:55.708Z",
}

The Dimensions and Statistics arrays are serialized with the prefix member and the index of the item:

MetricName=NetworkOut&
Namespace=AWS/EC2&

Dimensions.member.1.Name=AutoScalingGroupName&
Dimensions.member.1.Value=asg-A&

Statistics.member.1=Sum&
Statistics.member.2=Average&

Period=300&
StartTime=2019-06-26T19:46:55.706Z&
EndTime=2019-06-26T20:46:55.708Z&

There are a few different ways to serialize arrays and objects in AWS APIs. In general, EC2 works in one way and the other form-urlencoded services are working in a different way, but even within these services there are small differences. You have to examine the schema for each service closely. We found that we can configure the different serializations in the schema with three fields.

Extending Style

Finally, our first extension! The basic encoding for AWS APIs is style = form and explode = true. We added these additional configuration in the x-aws-encoding extension:

For example:

requestBody:
content:
application/x-www-form-urlencoded:
schema:
type: object
properties:
QueueName:
type: object
encoding:
QueueName:
style: form
explode: true
x-aws-encoding:
type: <enum: list | map>
prefix: <string>
indexBeforeName: <boolean>

We can see how this serialization works in these cases. In array serialization, if there is a prefix we use it before the item index.

type prefix indexBeforeName array object
list member false color.member.1=blue&
color.member.2=black
color.R=100&
color.G=200&
color.B=150
map n/a false n/a color.1.Name=R&
color.1.Value=100&
color.2.Name=G&...
list n/a true color.1=blue&
color.2=black
color.R=100&
color.G=200&
color.B=150
map n/a true n/a color.1.Name=R&
color.1.Value=100&
color.2.Name=G&...

In most of these cases, the result is the same, but the other configuration is needed when the object is nested:

type prefix indexBeforeName array of objects object of arrays
list member false color.blue.1.R=100&
color.blue.1.G=200&
color.blue.1.B=150
color.blue.member.1=100&
color.blue.member.2=200&
color.blue.member.3=150
map n/a false n/a color.1.Name=blue&
color.1.Value.member.1=100&
color.1.Value.member.2=200&
color.1.Value.member.3=150
list n/a true color.1.blue.R=100&
color.1.blue.G=200&
color.1.blue.B=150
color.blue.1=100&
color.blue.2=200&
color.blue.3=150
map n/a true n/a color.1.Name=blue&
color.1.Value.1=100&
color.1.Value.2=200&
color.1.Value.3=150

Some APIs expect the index to be before the nested field and sometimes after. As you can see in this example of blue.1 and 1.blue:

Some APIs serialize the entire object as one string. We use the type list to identify that. In other APIs we use the type map for if an object is serialized with the name and value as two different components, like you can see in color.blue.member.1 and color.1.Name=blue&color.1.Value.member.1=100.

Conclusion

With these two changes we can take each service-2.json file from boto and convert it to our intermediate API schema (that is still very similar to OpenAPI) and use our existing dynamic API client. Now we can talk with any AWS API. And more importantly, the SQL interface for our users is compatible across AWS APIs and those defined with OpenAPI.

If you think any of the extensions we shared in this post are useful to you, let us know! We are currently considering writing up a OpenAPI Specification proposal for them or publicly sharing more about them. Tweet @transposit or send an email to support@transposit.com to let us know.

In part two of this series, we’ll cover pagination in AWS APIs, how we wrote an extension for it, and how we generalize it to work with almost any API.

Lastly, thanks to Taylor Barnett for helping me with this blog post series.