AWS DynamoDB Throughput Capacity

Calculating throughput capacity!

Read capacity units

  • 1 strong consistent read per second up to 4 KB item
  • 2 eventually consistent reads per second up to 4 KB item

Write capacity unit

  • 1 write per second for an item up to 1 KB in size

Example: 1000 writes per 10 seconds @ 512 bytes evenly distributed.

Solution: 1000\10 = 100, 512 bytes rounded up = 1 KB, 100 write units

Good certification exam information… 🙂


AWS Linux 2 – VirtualBox

So you want to run the AWS Linux 2 OS on your laptop for development purposes?  Its no problem now!

I followed the instructions (here) to get Linux 2 running with a VirtualBox VM.

These are the high-level steps I took.

  1. Create a cloud-init configuration ISO
    • Create my user-data and meta-data files
    • Validate the user-data YAML file syntax in a validator (important)
    • Create a new ISO with these 2 files on a linux server using genisoimage
      • genisoimage -output mySeed.iso -volid cidata -joliet – rock user-data meta-data
  2. Download the latest Linux 2 vdi file from AWS
  3. Copy the ISO file back to my Windows machine
  4. Create my new VM in VirtualBox (Linux\Other Linux 64-bit) using the new ISO and the downloaded vdi
    • IDE Primary Master = Linux 2 vdi file
    • IDE Secondary Master (Optical) = mySeed ISO file


Now you should be able to startup your VM and connect via any user defined in your user-data file.


JMESPATH and Visual Studio Code

The default format in which AWS CLI data is returned is JSON.  The JSON data that is returned can be queried with the query language JMESPath.

For example, using the AWS CLI, I can list the details of the volumes associated with my CLI profile.  (aws ec2 describe-volumes)

Example output:

    “Volumes”: [{
        “AvailabilityZone”: “myZone”,
        “VolumeType”: “standard”,
        “VolumeId”: “myVolumeID1”,
        “State”: “in-use”,
        “SnapshotId”: “mySnapID”,
        “CreateTime”: “2017-01-02T00:55:03.000Z”,
        “Size”: 1
        “AvailabilityZone”: “myZone”,
        “VolumeType”: “standard”,
        “VolumeId”: “myVolumeID2”,
        “State”: “in-use”,
        “SnapshotId”: “mySnapID”,
        “CreateTime”: “2016-01-02T00:55:03.000Z”,
        “Size”: 1

Now, lets say there are many more volumes returned from this CLI command and from that output I want to find the volume(s) that were created before a specific date. (January 01, 2017)

I can utilize the query option on the AWS CLI command to return specifics from the JSON results.  (aws ec2 describe-volumes –query ‘Volumes[?CreateTime<`2017-01-01`].VolumeId’)

However, sometimes getting the JMESPath syntax correct is not always easily.  The extra tools that I utilize to help me are Visual Studio Code with the JMESPath plugin.


So with this VSCode plugin, you can quickly validate your JMESPath syntax if you need.  One note when using the “–query” option on the CLI is that the processing is happening on the Client machine, unlike the “–filter” option that happens on the server.


AWS Lambda and S3

Below are a couple of problems I ran into when writing a Python 2.7 Lambda function that created a file and then uploaded it to S3. (s3.upload_file)

  1. The file I was creating and writing to in the function was empty in S3 after the upload.
    • Turns out I needed the “( )” braces on the Python “close” command.  Silly issue, but took my like 20 minutes to figure out….
  2. In your Lambda function, you need to create your files under /tmp, which is your functions ephemeral storage.
    • fileName = ‘/tmp/’ + name


EC2 – Run Instances – InstanceId

So you are creating EC2 instances from the AWS CLI and\or Python using BOTO 3 and you want to get the InstanceId afterwards.  Below are the method(s) I use in each scenario.

Create 1 EC2 Instance with AWS CLI:

Example Command:

aws ec2 run-instances --profile <value> --image-id <value> --security-group-ids <value> --count 1 --instance-type <value> --subnet-id <value> --query 'Instances[0].InstanceID'

Create multiple EC2 Instances with Python\Boto3:

Example Command:

ec2_session_client = session.client('ec2')
response = ec2_session_client.run_instances(ImageID="value",SecurityGroupIds=["value"],MaxCount=value,MinCount=value,InstanceType="value",SubnetId="value")
for instance in response["Instances"]:
   if 'InstanceId' in instance"
      print (instance['InstanceId'])

So these are just a couple ways to grab the InstanceId for use later in your script.


CapitalOne – Cloud Custodian

This free Open Source tool, Cloud Custodian, is an interesting program that can be used help manage your AWS environment(s), ensuring compliance via policies written using YAML.

Thinking about this more, it seems like you could execute the polices via Lambda, or even a local Jenkins instance via a reoccurring schedule.

I would suggest storing your YAML policy files in Git and pulling them from there as needed.

Useful Links:

  • Python Home Instance –
  • Capital One Custodian Home –
  • Capital One Custodian Docs –
  • Git Location –


AWS S3 Storage Classes – Tech Talk Notes

I listened to a Tech Talk on AWS S3 recently.  They covered some high-level stuff, and then some low-level stuff.

High-Level (S3 Storage Classes)

  • Amazon S3 Standard – Active Data (S)
  • Amazon S3 Standard-Infrequent Access Data (SIA)
  • Amazon Glacier – Archive Data

Low-Level (Storage Class Analysis)


AWS CLI and Output Filtering

I love the AWS CLI –query option.  It allows you to pull out the relevant data you are looking for and display it in a nice table format.

There are a couple different ways to determine what the correct query parameters are.

#1 – Run your command and parse through the JSON that is returned.  The JSON will show you exactly what element(s) you can query on.

Example: aws ec2 describe-instances –query ‘Reservations[*].Instances[*].[InstanceId, Monitoring.State]’ –output table

#2 – Review the AWS CLI documentation. (e.g. describe-instance-status)  On this page you will want to examine the “Output” section, which gives you the same information as #1, but with a different view.


Example: aws describe-instance-status –query ‘InstanceStatuses[*].AvailabilityZone’ –output table

Overall using the query option is pretty easy and at the same time pretty powerful!


AWS Chalice – I must try it!

Traditional REST API Setup in AWS

  • Multiple EC2 instances
  • Part of an Auto Scaling group
  • Setup with an Elastic Load Balancer
  • Code is manually deployed with your tool of choice (e.g. Ansible)
  • CloudWatch for monitoring

Serverless REST API Setup in AWS

  • Amazon API Gateway (front door)
  • API Gateway handles monitoring, access, and authorization
  • AWS Lamda is configured as the back-end (Pay compute time only)
  • No EC2 servers to manage, AWS handles it
  • Manually setup or extensive SDK scripting needed

Chalice Serverless REST API Setup in AWS

  • Chalice is a Python package with syntax similar to Flask.
  • Main Chalice components
    • App object, routes, and file
  • Chalice allows you to “quicky” deploy your Python API via the Chalice CLI to AWS
  • Auto generated IAM policy
  • API Gateway and Lamda are used “behind” the scenes and automatically configured


AWS S3 – Versioning

By default, your S3 bucket in AWS will have versioning disabled.  The following link explains the different S3 storage classes.

There are three S3 versioning states that you should be aware of.  Once you enable versioning on your bucket, you can’t go back to unversioned, but rather just the suspended state.

  • unversioned (the default)
  • versioning-enabled
  • versioning-suspended

Enabling versioning is relatively easy.  Be aware that there is a cost associated with this as additional data is stored.

To view the versioning state of a S3 bucket, you can use any of the following tools.