AWS

Takeaway

Takeaway - Security

  • Data Encryption / Decryption

    • Usually refers to data at rest encryption / decryption, in which case users need to explicitly specify the encryption key (symmetric).
    • Since the encryption key needs to be specified explicitly, data at rest encryption is not enabled by default.
    • Data in transit encryption is enabled by default and does not need user intervention, but users must adopt TLS supported endpoints for encryption to work.

AWS Architecture

AWS Architecture Center (opens in a new tab)

Migrate & Modernize

Migrate & Modernize (opens in a new tab)

AWS Whitepapers

AWS Well-Architected Framework

AWS Well-Architected Framework (opens in a new tab)

Operational excellence

Security

Reliability

Disaster Recovery (DR)

Disaster Recovery (DR) (opens in a new tab)

  • Recovery Time Objective (RTO) is the maximum acceptable delay between the interruption of service and restoration of service. This determines what is considered an acceptable time window when service is unavailable.

  • Recovery Point Objective (RPO) is the maximum acceptable amount of time since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service.

  • DR strategies (opens in a new tab)

    • Backup & Restore

      • RPO / RTO : Hours
      • Lower priority use cases
      • Provision all AWS resources after event
      • Restore backups after event
      • Cost $
    • Pilot Light

      • RPO / RTO : 10s of minutes
      • Data live
      • Services idle
      • Provision some AWS resources and scale after event
      • Cost $$
    • Warm standby

      • RPO / RTO : Minutes
      • Always running, but smaller
      • Business critical
      • Scale AWS resources after event
      • Cost $$$
    • Multi-site

      • RPO / RTO : Real-time
      • Zero downtime
      • Near zero data loss
      • Mission Critical Services
      • Cost $$$$
  • Resources

Performance efficiency

Cost optimization

Sustainability

CLI

CLI - Pagination

Pagination (opens in a new tab)

  • By default, the AWS CLI uses a page size of 1000 and retrieves all available items.

  • If all available items are more than page size, multiple API calls are made until all available items are returned.

  • Parameters

    • --no-paginate

      Return only the first page of results, therefore single API call

    • --page-size

      Specify the number of items in a single page (by default 1000)

    • --max-items

      Specify the total number of items returned (by default all available items)

    • --starting-token

      When --max-items specifies a number smaller than all available items, the output will include a NextToken retrieving the remaining items.

CLI - Tagging

Find resources by specified tags in the specific Region

aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=Environment,Values=Production \
  --tags-per-page 100

CLI - Filter

CLI - Cheatsheet

CLI - CloudWatch - Get Log Groups

  • aws logs describe-log-groups

CLI - CloudWatch - Get Log Streams

  • aws logs describe-log-streams --log-group-name <log-group-name>

CLI - CloudWatch - Get Log Events

  • aws logs get-log-events --log-group-name <log-group-name> --log-stream-name <log-stream-name> --limit 100

  • aws logs get-log-events --log-group-name <log-group-name> --log-stream-name <log-stream-name> --start-time <start-time> --end-time <end-time>

CLI - CloudWatch - Get paginated all log events of a log group in text output

  • aws logs filter-log-events --log-group-name <log-group-name> --output text

    Suitable for general browsing

CLI - CloudWatch - Search keyword in log events of a log group

  • aws logs filter-log-events --log-group-name <log-group-name> --limit 100 --filter-pattern %Keyword%

CLI - S3 - Listing all user owned buckets

  • aws s3 ls

Cost Management

AWS Docs - Cost Management (opens in a new tab)

Savings Plans

AWS Docs - Savings Plans (opens in a new tab)

  • In addition to EC2, also applicable only to Fargate and Lambda

  • Aims to simplify savings planning on EC2 instances

  • Types

    • Compute Savings Plans

      • Most flexible

        • EC2
        • ECS Fargate
        • Lambda
      • Up to 66% off of On-Demand rates

    • EC2 Instance Savings Plans

      • Provide the lowest prices, offering savings up to 72% in exchange for commitment to usage of individual instance families in a Region (e.g. M5 usage in N. Virginia)

      • Up to 72% off of On-Demand rates

    • SageMaker Savings Plans

      • Up to 64% off of On-Demand rates

VPC

AWS Docs - VPC (opens in a new tab)

  • A VPC spans all AZs in the Region.

  • CLI

    • aws ec2 create-default-vpc

      create a default VPC

    • aws ec2 create-default-subnet --availability-zone <AZ>

      create a default subnet

  • Recipes

    • Calculate subnet CIDR block based on VPC CIDR block

      Use ipcalc

  • References

VPC - Subnet

  • A subnet always belongs to one VPC once created.

  • A subnet is associated with only one AZ.

  • Subnet CIDR block must be a subset of the VPC CIDR block.

  • 172.16.0.0/21 means the first 21 bits are used to identify network (subnet), the rest of bits are used identify hosts. In this case, 21 bits are used for network identification, while 32 - 21 = 11 bits are used for host identification. Therefore, when assigning IP addresses, the first 21 bits are fixed, while the rest bits will increment until all allocated.

  • public subnet is a subnet that's associated with a route table that has a route to an internet gateway.

  • You can make a default subnet into a private subnet by removing the route from the destination 0.0.0.0/0 to the internet gateway.

  • Resources

VPC - Route Table

  • A route table always belongs to one VPC once created.
  • A subnet can only be associated with one route table at a time, but you can associate multiple subnets with the same route table.
  • Each subnet in your VPC must be associated with a route table, which controls the routing for the subnet (subnet route table).
  • If not explicitly specified, the subnet is implicitly associated with the main route table.
  • Your VPC has an implicit router table, and you use route tables to control where network traffic is directed.
  • If your route table has multiple routes, we use the most specific route (longest prefix match) that matches the traffic to determine how to route the traffic.

VPC - Static IP Address

  • When you stop an EC2 instance, its public IP address is released. When you start it again, a new public IP address is assigned.

VPC - Elastic IP Address

  • If you require a public IP address to be persistently associated with the instance, allocate an Elastic IP address, essentially reserved public IP address.
  • Elastic IP address is free of charge when allocated to running EC2 instances, while charge applies when they are reserved but not in use.

VPC - Network ACL

  • One Network ACL always belongs to one VPC once created.

  • Operates at the subnet level, able to be associated with multiple subnets within the same VPC, operating like filters, therefore stateless.

  • Black / white list

  • Return traffic must be explicitly allowed by rules

  • Rules evaluation order

    • By Rule number in ascending order
    • First matched first served like a if/else block

VPC - Security Group

  • Operates at the instance level, therefore only in effect when associated with instance(s), therefore stateful.
  • By default, a security group includes an outbound rule that allows all outbound traffic.
  • White list only, you can specify allow rules, but not deny rules.
  • Return traffic is automatically allowed, regardless of Inbound or Outbound
  • Inbound rules only specify source IP, while Outbound rules only specify destination IP.
  • All rules are evaluated before a decision is made.
  • At most 5 Security Group can be associated with an instance, and union of all rules from the all associated Security Group would be applied to the instance.
  • When you specify a security group as the source for an inbound or outbound rule, traffic is allowed from the network interfaces that are associated with the source security group for the specified protocol and port. Incoming traffic is allowed based on the private IP addresses of the network interfaces that are associated with the source security group (and not the public IP or Elastic IP addresses). Adding a security group as a source does not add rules from the source security group.
  • Default security group cannot be deleted.
  • By default, a security group includes an outbound rule that allows all outbound traffic.

VPC - Security Group - CLI Cheatsheet

VPC - Security Group - Get all Security Group rules permitting inbound traffic on the given TCP port
aws_ec2_describe_security_groups_rules_ingress () {
  local protocol=$1
  local port=$2
  local filters='!IsEgress && (IpProtocol == `'${protocol}'` || IpProtocol == `-1`) && (FromPort <= `'${port}'` && ToPort >= `'${port}'` || FromPort == `-1` && ToPort == `-1`)'
  aws ec2 describe-security-group-rules \
    --query "sort_by(SecurityGroupRules, &GroupId)[? $filters].{GroupID: GroupId, From: FromPort, To: ToPort, CIDR: CidrIpv4, RuleID: SecurityGroupRuleId}" \
    --output table
}
aws_ec2_describe_security_groups_rules_ingress tcp 22
VPC - Security Group - Create a Security Group in the given VPC
aws ec2 create-security-group \
  --group-name $group_name \
  --description $description \
  --vpc-id $vpc_id
VPC - Security Group - Add an inbound rule to the given Security Group
aws ec2 authorize-security-group-ingress \
  --group-id $group_id \
  --protocol $protocol \
  --port $port \
  --cidr $cidr
 
# e.g. allowing traffic from a given IP
# aws ec2 authorize-security-group-ingress \
#   --group-id sg-1234567890abcdef0 \
#   --protocol tcp \
#   --port 22 \
#   --cidr 10.64.1.121/32   // Only one host is allowed

VPC - ENI (Elastic network interface)

AWS Docs - ENI (Elastic network interface) (opens in a new tab)

  • Once created, ENIs are specific to a subnet, but an Elastic IP can be disassociated from an ENI and available again.
  • ENI can be detached from an EC2 instance, and attached to another instance.
  • The primary ENI cannot be detached from an EC2 instance.

VPC Connection Options

VPC - Internet Gateway

AWS Docs - Internet Gateway (opens in a new tab)

  • Only one Internet Gateway can be attached to one VPC at a time.
  • Instances must have public IPs.
  • Attaching an Internet Gateway to a VPC allows instances with public IPs to access the internet.

VPC - Egress-only Internet Gateway

AWS Docs - Egress-only Internet Gateway (opens in a new tab)

  • IPv6

    • An egress-only internet gateway is for use with IPv6 traffic only.
    • IPv6 addresses are globally unique, and are therefore public by default.
  • IPv4

    • To enable outbound-only internet communication over IPv4, use a NAT gateway instead.

VPC - NAT Gateway

NAT Gateway (opens in a new tab)

  • Fully managed, highly available EC2 instance
  • NAT Gateway allows private subnet to access the internet,
  • NAT Gateway must have an EIP.
  • NAT Gateway traffic must be routed to Internet Gateway in the route table.
  • It only works one way. The internet cannot get through your NAT to your private resources unless you explicitly allow it.
  • EIP cannot be detached.
  • Bandwidth up to 45 Gbps
  • Cannot be associated with a Security Group
  • Cannot function as a Bastion host

VPC - NAT Instance

  • Self managed, but with more flexibility and customization
  • An EC2 instance configured to perform NAT
  • EIP can be detached.
  • Can be associated with a Security Group
  • Can function as a Bastion host

VPC - VPC endpoint

  • A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink (opens in a new tab) without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

  • VPC endpoint types

    • Interface endpoint
    • Gateway Load Balancer endpoint
    • Gateway endpoint
  • Key points

    • Pros

      • Secure and private connection
      • No internet needed
    • Cons

      • Not all services are supported
      • Not all Regions are supported
      • Cross region not supported
VPC - Interface endpoint
  • An interface endpoint is an ENI with a private IP address from the IP address range of your subnet that serves as an entry point for traffic destined to a supported service.
  • interface endpoints are powered by AWS PrivateLink, which bills you for each hour that your VPC endpoint remains provisioned in each AZ, irrespective of the state of its association with the service.
VPC - Gateway endpoint
  • A gateway endpoint is a gateway that you specify as a target for a route in your route table for traffic destined to a supported AWS service.
  • Doesn't use PrivateLink, therefore no hourly charge.
  • Only work in the same Region
  • Only S3 and DynamoDB are supported
  • Gateway endpoints do not allow access from on-premises networks, from peered VPCs in other Regions, or through a transit gateway.

VPC peering

AWS Docs - VPC peering (opens in a new tab)

  • A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network. You can create a VPC peering connection between your own VPCs, or with a VPC in another AWS account. The VPCs can be in different Regions (also known as an inter-Region VPC peering connection).

EC2

AWS Docs - EC2 (opens in a new tab)

EC2 - Cheatsheet

Use metadata service to get instance metadata within the instance

Get instances by keyword in name

  • aws ec2 describe-instances --filters "Name=tag:Name,Values=*<keyword>*"

    Server filter with AWS CLI v2

  • aws ec2 describe-instances | jq '.Reservations[].Instances[] | select(.Tags[].Key == "Name" and (.Tags[].Value | contains("<keyword>")))'

    Client filter with jq

Get instances by state

  • aws ec2 describe-instances --filters "Name=instance-state-name,Values=running|stopped"

Get instance types and specification

  • aws ec2 describe-instance-types

Get public key of SSH key pair

  • aws ec2 describe-key-pairs --key-names <key-pair-name> --include-public-key

Create/Update a tag of an instance

  • aws ec2 create-tags --resources <instance-id> --tags 'Key=<key>,Value=<value>'

List all tags of an instance

  • aws ec2 describe-tags --filters "Name=resource-id,Values=<instance-id>"

ELB (Elastic Load Balancing)

  • To distribute traffic between the instances (often in a Auto Scaling group)

  • ELB can be enabled within a single AZ or across multiple AZ to maintain consistent application performance.

  • Sticky Session (opens in a new tab)

    AKA Session affinity, enabling the load balancer to bind a user's session to a specific instance. This ensures that all requests from the user during the session are sent to the same instance, so user won't need to keep authenticating themselves.

  • Load balancers

    • Application Load Balancer

      • Operate at OSI Layer 7

      • Supports WebSocket and HTTP/2

      • Register targets in target groups and route traffic to target groups.

      • Cross-zone load balancing is always enabled.

      • Access logs capture detailed information about requests sent to the ALB.

      • ALB exposes a static DNS for access.

      • Listeners

        • A listener is a process that checks for connection requests, using the protocol and port that you configure. The rules that you define for a listener determine how the load balancer routes requests to its registered targets.

        • Listener rule condition types (opens in a new tab)

          • host-header
          • http-header
          • http-request-method
          • path-pattern
          • query-string
          • source-ip
        • Authenticate users (opens in a new tab)

          • You can configure an ALB to securely authenticate users as they access your applications. This enables you to offload the work of authenticating users to your load balancer so that your applications can focus on their business logic.
    • Network Load Balancer

      • Operate at OSI Layer 4
      • Exposes a public static IP for access.
      • Cross-zone load balancing is by default disabled.
      • Target type
        • EC2 Instances
        • IP addresses
    • Classic Load Balancer

      • Only for EC2 Instances
      • CLB exposes a static DNS for access.
      • A CLB with HTTP or HTTPS listeners might route more traffic to higher-capacity instance types.
  • Target group

    • Target type

      • One to many EC2 Instances

        • Supports load balancing to EC2 instances within a specific VPC.
        • Facilitates the use of EC2 Auto Scaling to manage and scale your EC2 capacity.
      • One to many IP addresses

        • Supports load balancing to VPC and on-premises resources.
        • Facilitates routing to multiple IP addresses and network interfaces on the same instance.
        • Offers flexibility with microservice based architectures, simplifying inter-application communication.
        • Supports IPv6 targets, enabling end-to-end IPv6 communication, and IPv4-to-IPv6 NAT.
      • Single Lambda function

        • Facilitates routing to a single Lambda function.
        • Accessible to ALB only.
      • Application Load Balancer

        • Offers the flexibility for a NLB to accept and route TCP requests within a specific VPC.
        • Facilitates using static IP addresses and PrivateLink with an ALB.
    • Protocol

      • HTTP/1.1
      • HTTP/2
      • gRPC
  • Health Check

ELB - Cheatsheet

Describe all load balancers
aws elbv2 describe-load-balancers \
--query 'sort_by(LoadBalancers,&LoadBalancerName)[].{LoadBalancer:LoadBalancerName,Type:Type,DNS:DNSName}' \
--output table
Describe all listeners and their target group of the given load balancer
aws elbv2 describe-listeners \
--load-balancer-arn <load-balancer-arn> \
--query 'sort_by(Listeners,&ListenerArn)[].{Protocol:Protocol,Port:Port,TargetGroup:DefaultActions[0].TargetGroupArn}' \
--output table
Describe the given target groups
aws elbv2 describe-target-groups \
--filter Name=target-group-name,Values=<target-group-name> \
--query 'sort_by(TargetGroups,&TargetGroupName)[].{TargetGroup:TargetGroupName,Protocol:Protocol,Port:Port,VPC:VpcId}' \
--output table
Associate a Security Group with the given Load Balancer
aws elbv2 set-security-groups \
--load-balancer-arn $load_balancer_arn \
--security-groups $security_group_id
Show health state of all target groups
#!/bin/bash
 
# Get a list of all target groups
target_group_arns=($(aws elbv2 describe-target-groups --query "TargetGroups[].TargetGroupArn" --output text))
 
# Loop through the target groups and check if there are running instances
for arn in "${target_group_arns[@]}"; do
    echo "Checking target group: $arn"
    aws elbv2 describe-target-health \
      --target-group-arn "$arn" \
      --query 'TargetHealthDescriptions[].{"Target ID":Target.Id, Port:Target.Port, State:TargetHealth.State} | sort_by(@, &State)' \
      --output table
done

EC2 - Auto Scaling

AWS Docs - Auto Scaling (opens in a new tab)

  • Auto Scaling group can span across multiple AZs within a Region, but not across multiple Regions.
  • Auto Scaling works with all 3 load balancers.
  • CloudWatch Alarms can be used to trigger Auto Scaling actions.

EC2 - Launch Template

AWS Docs - Launch Template (opens in a new tab)

  • Improvements over Launch Configuration

    • Supports versioning, while Launch Configuration is immutable

    • Supports multiple instance types and purchase options

    • More EC2 options

      • Systems Manager parameters (AMI ID)
      • The current generation of EBS Provisioned IOPS volumes (io2)
      • EBS volume tagging
      • T2 Unlimited instances
      • Elastic Inference
      • Dedicated Hosts

EC2 - ASG Capacity limits

ASG Capacity limits (opens in a new tab)

  • After you have created your Auto Scaling group, the Auto Scaling group starts by launching enough EC2 instances to meet its minimum capacity (or its desired capacity, if specified).
  • The minimum and maximum capacity are required to create an Auto Scaling group.
  • Desired capacity (either by manual scaling or automatic scaling) must fall between the minimum and maximum capacity.

EC2 - Scaling policy

AWS Docs - Scaling policy (opens in a new tab)

  • A scaling policy instructs Auto Scaling to track a specific CloudWatch metric, and it defines what action to take when the associated CloudWatch alarm is in ALARM. The metrics that are used to trigger an alarm are an aggregation of metrics coming from all of the instances in the Auto Scaling group.

  • Target tracking scaling

    • The scaling policy adds or removes capacity as required to keep the metric at, or close to, the specified target value.
    • Triggered by an automatically created and managed CloudWatch Alarm by EC2 Auto Scaling, which users shouldn't modify.
    • You don't need to specify scaling action.
    • eg: Configure a target tracking scaling policy to keep the average aggregate CPU utilization of your Auto Scaling group at 40 percent.
  • Step scaling

    • Triggered by a specified existing CloudWatch Alarm
    • Scaling action (add, remove, set) is based on multiple step adjustments
  • Simple scaling

    • Triggered by a specified existing CloudWatch Alarm
    • Scaling action (add, remove, set) is based on a single scaling adjustment
  • Scaling cooldown (opens in a new tab)

    A scaling cooldown helps you prevent your Auto Scaling group from launching or terminating additional instances before the effects of previous activities are visible.

EC2 - Scheduled Actions

Scheduled actions (opens in a new tab)

  • Set up your own scaling schedule according to predictable load changes

EC2 - Termination Policy

Termination Policy (opens in a new tab)

  • Default termination policy

    1. Determine whether any of the instances eligible for termination use the oldest launch template or launch configuration.
    2. After applying the preceding criteria, if there are multiple unprotected instances to terminate, determine which instances are closest to the next billing hour.

EC2 Monitoring

  • Instances (opens in a new tab)
    • By default, basic monitoring is enabled when you create a launch template or when you use the AWS Management Console to create a launch configuration.
    • By default, detailed monitoring is enabled when you create a launch configuration using the AWS CLI or an SDK.
  • Health check (opens in a new tab)
    • Auto Scaling can determine the health status of an instance using one or more of the following:
      • EC2 Status Checks
      • ELB Health Checks
      • Custom Health Checks
    • The default health checks for an Auto Scaling group are EC2 status checks only.

EBS (Elastic Block Store)

  • Can only be attached to another instance within the same AZ

  • Backup and restore snapshot can be used to share data with instances in another AZ.

  • Usually one volume can only be attached to one instance at a time (Multi-Attach is not common)

  • You can use block-level storage only in combination with an EC2 instance where the OS is running

  • After you attach an EBS volume to your instance, it is exposed as a block device. You must create a file system if there isn't one and then mount it before you can use it.

    • New volumes are raw block devices without a file system.
    • Volumes that were created from snapshots likely have a file system on them already.
  • Amazon Data Lifecycle Manager

    • Automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs
  • Snapshot

    • Incremental, tracking changes only

    • A volume becomes available right when the restore operation begins, even though the actual data had not yet been fully copied to the disk

    • Backup occur asynchronously; the point-in-time snapshot is created immediately, but the status of the snapshot is pending until the snapshot is complete

    • Stored in S3

    • Be aware of the performance penalty when initializing volumes from snapshots

    • Fast Snapshot Restore (opens in a new tab)

      enables you to create a volume from a snapshot that is fully initialized at creation. This eliminates the latency of I/O operations on a block when it is accessed for the first time.

  • Volume types (opens in a new tab)

  • Performance Characteristics

    • Throughput = Size per IO Operation * IOPS
    • Size per IO Operation
      • the amount of data written/read in a single IO request.
      • data / request
      • EBS merges smaller, sequential I/O operations that are 32 KiB or over to form a single I/O of 256 KiB before processing.
      • EBS splits I/O operations larger than the maximum 256 KiB into smaller operations.
    • IOPS
      • the number of IO requests on a single block can be completed by the storage device in a second.
      • requests / second
    • Throughput
      • the amount of data transferred from/to a storage device in a second. Typically stated in KB/MB/GB/s
      • data / second
  • Network bandwidth limits

    • EC2 instances access EBS volumes over network connections.
    • EBS volumes can be accessed using dedicated networks (available on EBS-optimized instances) and shared networks (non EBS-optimized instances).
  • Encryption

    • You encrypt EBS volumes by enabling encryption, either using encryption by default or by enabling encryption when you create a volume that you want to encrypt.
    • EBS encryption uses KMS CMK when creating encrypted volumes and snapshots.
    • Encryption operations occur on the servers that host EC2 instances, ensuring the security of both data-at-rest and data-in-transit between an instance and its attached EBS storage.
    • Encryption by default is a Region-specific setting. If you enable it for a Region, you cannot disable it for individual volumes or snapshots in that Region.
    • Volumes
      • Can only be encrypted upon creation
      • Encrypted volumes cannot be unencrypted.
    • Snapshots
      • Snapshots created from an encrypted volume are always encrypted.
      • Encrypted snapshots cannot be unencrypted.
      • Unencrypted snapshots can only be encrypted when being copied.
    • Encrypted data include:
      • Data at rest inside the volume
      • Data in transit between the volume and the instance
      • All snapshots created from the volume
      • All volumes created from those snapshots

EFS (Elastic File System)

  • Region-specific
  • Traditional filesystem hierarchy
  • The main differences between EBS and EFS is that EBS is only accessible from a single EC2 instance in your particular Region, while EFS allows you to mount the file system across multiple Regions and instances.

Elastic Beanstalk

  • PaaS based on EC2, using CloudFormation under the hood.

  • Application

    • Application version lifecycle settings (opens in a new tab)
      • If you don't delete versions that you no longer use, you will eventually reach the application version quota and be unable to create new versions of that application.
      • You can avoid hitting the quota by applying an application version lifecycle policy to your applications.
    • Removing application will also trigger removal of all associated resources such as environment, EC2 Instance, etc.
  • Environment

    • You can run either a web server environment or a worker environment.
    • Use Validate VPC Settings button in Environment tab to troubleshoot network.
    • If you associate an existing RDS instance to an existing EB environment, the RDS instance must be launched from a snapshot.
    • Environment type can be Load Balanced or Single Instance.
    • When you terminate an environment, you can save its configuration to recreate it later.
    • HTTPS
      • The simplest way to use HTTPS with an Elastic Beanstalk environment is to assign a server certificate to your environment's load balancer.
  • Configuration (all under project root)

    • .ebextensions (opens in a new tab) directory

      • Configuration files are YAML or JSON-formatted documents with a .config file extension.

      • Options can be specified as below, and is overridden as per precedence (opens in a new tab) rules

        option_settings:
          - namespace: namespace
            option_name: option name
            value: option value
          - namespace: namespace
            option_name: option name
            value: option value
    • .elasticbeanstalk directory

      • Saved configuration

        • Saved configurations are YAML formatted templates that define an environment's platform version, tier, configuration option settings, and tags.
        • Saved configurations are located under .elasticbeanstalk > saved_configs in project directory.
    • Config files in the project directory

      • env.yaml

        You can include a YAML formatted environment manifest in the root of your application source bundle to configure the environment name, solution stack and environment links to use when creating your environment.

      • cron.yaml (Worker environment)

        You can define periodic tasks in a file named cron.yaml in your source bundle to add jobs to your worker environment's queue automatically at a regular interval.

    • Elastic Beanstalk supports CloudFormation functions (Ref, Fn::GetAtt, Fn::Join), and one Elastic Beanstalk-specific function, Fn::GetOptionSetting.

  • Platform (opens in a new tab)

    • Docker

      • Single-container
      • Multi-container
    • Custom platform

      • A custom platform lets you develop an entire new platform from scratch, customizing the operating system, additional software, and scripts that Elastic Beanstalk runs on platform instances.
      • To create a custom platform, you build an AMI from one of the supported operating systems and add further customizations.
  • EB CLI

    • Installation (opens in a new tab)

      • Install python3
      • Install pip3
      • Install awsebcli
    • Useful commands (opens in a new tab)

      • eb status

        Gets environment information and status

      • eb printenv

        Shows the environment variables

      • eb list

        Lists all environments

      • eb setenv <env-variable-value-pairs>

        Sets environment variables

        eg: eb setenv HeapSize=256m Site_Url=mysite.elasticbeanstalk.com

      • eb ssh

        Opens the SSH client to connect to an instance

  • AWS CLI (opens in a new tab)

  • Deployment Strategies (opens in a new tab)

    • Update existing instances

      • All-at-once

        Deploy the new version to all instances simultaneously.

      • Rolling

        Updates are applied in a batch to running instances. The batch will be out of service while being updated. Once the batch is completed, the next batch will be started.

      • Rolling with an additional batch

        The same as Rolling, except launching an additional batch of instances of the old version to rollback in case of failure. This option can maintain full capacity. When the deployment completes, Elastic Beanstalk terminates the additional batch of instances.

    • Deploying to new instances

      • Immutable

        Instances of the new version are deployed as instances of the old version are terminated. There's no update to existing instances.

      • Traffic-splitting (opens in a new tab)

        Elastic Beanstalk launches a full set of new instances just like during an immutable deployment. It then forwards a specified percentage of incoming client traffic to the new application version for a specified evaluation period. If the new instances stay healthy, Elastic Beanstalk forwards all traffic to them and terminates the old ones.

    • Blue/Green deployment (opens in a new tab)

      A new environment will be created for the new version (Green) independent of the current version (Blue). When the Green environment is ready, you can swap the CNAMEs of the environments to redirect traffic to the newer running environment.

      • Blue/green deployments require that your environment runs independently of your production database, if your application uses one.
    • Summary

      MethodImpact of Failed DeploymentDeploy TimeZero DowntimeNo DNS ChangeRollback ProcessCode Deployed To
      All-at-onceDowntimeRedeployExisting instances
      RollingSingle batch out of service; any successful batches before failure running new application version⌚⌚RedeployExisting instances
      Rolling with additional batchMinimal if first batch fails; otherwise, similar to Rolling⌚⌚⌚RedeployExisting instances
      Blue/GreenMinimal⌚⌚⌚⌚Swap URLNew instances
      ImmutableMinimal⌚⌚⌚⌚RedeployNew instances
  • Java

    • Default port 5000, to change that, update PORT environment variable.
    • From Management Console, the application to be uploaded must be an executable JAR file containing all the compiled bytecode, packaged in a ZIP archive.

CodeCommit

  • Region specific

  • No public access

  • Authentication

    • SSH

      Dedicated SSH key pair of current user for CodeCommit only

    • HTTPS

      Dedicated HTTPS Git credentials of current user for CodeCommit only

    • MFA

  • Authorization

    • IAM

      You must have an CodeCommit managed policy attached to your IAM user, belong to a CodeStar project team, or have the equivalent permissions.

  • Cross-Account access to a different account

    • Create a policy for access to the repository
    • Attach this policy to a role in the same account
    • Allow other users to assume this role
  • Notifications

    • Events that trigger notifications (opens in a new tab) (CloudWatch Events)
      • Comments
        • On commits
        • On pull requests
      • Approvals
        • Status changed
        • Rule override
      • Pull request
        • Source updated
        • Created
        • Status changed
        • Merged
      • Branches and tags
        • Created
        • Deleted
        • Updated
    • Targets
      • SNS topic
      • AWS Chatbot (Slack)
  • Triggers

    • Triggers do not use CloudWatch Events rules to evaluate repository events. They are more limited in scope.
    • Use case
      • Send emails to subscribed users every time someone pushes to the repository.
      • Notify an external build system to start a build after someone pushes to the main branch of the repository.
    • Events
      • Push to existing branch
      • Create branch or tag
      • Delete branch or tag
    • Target
      • SNS
      • Lambda

CodeBuild

  • When setting up CodeBuild projects to access VPC, choose private subnets only.

  • Need access to S3 for code source, therefore 2 approach

    1. NAT Gateway (additional charge)
    2. S3 Gateway Endpoint
  • Caching Dependencies (opens in a new tab)

    • S3

      stores the cache in an S3 bucket that is available across multiple build hosts

    • Local

      stores a cache locally on a build host that is available to that build host only

      • Docker layer cache

        Caches existing Docker layers so they can be reused. Requires privileged mode.

      • Source cache

        Caches .git metadata so subsequent builds only pull the change in commits.

      • Custom cache

        Caches directories specified in the buildspec file.

CodeDeploy

  • Application Revision

    • A revision contains a version of the source files CodeDeploy will deploy to your instances or scripts CodeDeploy will run on your instances.
  • AppSpec (opens in a new tab)

    • Configuration: appspec.yml must be present in the root directory of the application revision archive.
    • files section (opens in a new tab)
      • The paths used in source are relative to the appspec.yml file, which should be at the root of your revision.
  • Compute platforms

  • Deployment types

    • In-place

    • Blue/green

      • Only EC2 not on-premises instances support blue/green deployment.

      • All Lambda and ECS deployments are blue/green.

      • Deployment configurations

        • EC2

          • One at a time

            Routes traffic to one instance in the replacement environment at a time.

        • ECS (opens in a new tab)

          • All at once

          • Canary

            Traffic is shifted in two increments, 10% in the first increment, and the remaining 90% after 5 / 15 minutes.

          • Linear

            Traffic is shifted in equal increments (10%) with a fixed interval (1 / 3 minutes).

        • Lambda (opens in a new tab)

          • All at once

          • Canary

            Traffic is shifted in two increments, 10% in the first increment, and the remaining 90% after 5 / 10 / 15 / 30 minutes.

          • Linear

            Traffic is shifted in equal increments (10%) with a fixed interval (1 / 2 / 3 / 10 minutes).

  • Deployment Group

    • A deployment group contains individually tagged instances, EC2 instances in EC2 Auto Scaling groups, or both.
    • EC2 instances must have tags to be added into a deployment group.
  • CodeDeploy agent

  • Deployment

    • Rollback (opens in a new tab)

      • CodeDeploy rolls back deployments by redeploying a previously deployed revision of an application as a new deployment.

      • CodeDeploy first tries to remove from each participating instance all files that were last successfully installed, namely the instances which caused the deployment failure, and all other untouched instances will be involved later.

      • Automatic rollback

        • The last known good version of an application revision is deployed.
      • Steps

        1. First tries to remove from each participating instance all files that were last successfully installed.

        2. In the case of detecting exsting files, the options are as follows.

          1. Fail the deployment
          2. Overwrite the content
          3. Retain the content
  • Resources

CodePipeline

  • In a default setup, a pipeline is kicked-off whenever a change in the configured pipeline source is detected. CodePipeline currently supports sourcing from CodeCommit, GitHub, ECR, and S3.

  • When using CodeCommit, ECR, or S3 as the source for a pipeline, CodePipeline uses a CloudWatch Event to detect changes in the source and immediately kick off a pipeline.

  • When using GitHub as the source for a pipeline, CodePipeline uses a webhook to detect changes in a remote branch and kick off the pipeline.

  • CodePipeline also supports beginning pipeline executions based on periodic checks, although this is not a recommended pattern.

  • To customize the logic that controls pipeline executions in the event of a source change, you can introduce a custom CloudWatch Event.

  • The pipeline stops when it reaches the manual approval action. If an SNS topic ARN was included in the configuration of the action, a notification is published to the SNS topic, and a message is delivered to any subscribers to the topic or subscribed endpoints, with a link to review the approval action in the console.

  • Resources

ECR

ECR - Cheatsheet

Docker login to ECR

aws ecr get-login-password --region <region> | \
  docker login \
    --username AWS \
    --password-stdin <account-id>.dkr.ecr.<region>.amazonaws.com

Describe repositories

aws ecr describe-repositories \
  --query 'sort_by(repositories,&repositoryName)[].{Repo:repositoryName,URI:repositoryUri}' \
  --output table

Describe images

local repoName=<repo-name>
aws ecr describe-images --repository-name $repoName \
  --query 'reverse(sort_by(imageDetails,&imagePushedAt))[].{Repo:repositoryName,Tag:imageTags[] | [0],Digest:imageDigest,PushedAt:imagePushedAt}' \
  --output table

Find images with the given digest

local repoName=<repo-name>
local sha256Hash=<sha256-hash>
aws ecr describe-images --repository-name $repoName \
  --query 'imageDetails[?imageDigest==`sha256:$sha256Hash`].{Repo:repositoryName,Tag:imageTags[] | [0],Digest:imageDigest,PushedAt:imagePushedAt}' \
  --output table

Find images with the given tag

local repoName=<repo-name>
local tagKeyword=<tagKeyword>
aws ecr describe-images --repository-name $repoName \
  --query 'imageDetails[?contains(imageTags, $tagKeyword>)].{Repo:repositoryName,Tag:imageTags[] | [0],Digest:imageDigest,PushedAt:imagePushedAt}' \
  --output table

ECS

  • Container Instance

    • If you terminate a container instance in the RUNNING state, that container instance is automatically removed, or deregistered, from the cluster. However, if you terminate a container instance in the STOPPED state, that container instance isn't automatically removed from the cluster.
  • ECS Container Agent (opens in a new tab)

    • ECS_ENABLE_TASK_IAM_ROLE

      Whether IAM roles for tasks should be enabled on the container instance for task containers with the bridge or default network modes.

  • EC2 Launch Type

    • An ECS Cluster is a logical group of EC2 instances, also called container instance.

    • Each container instance has an ECS container agent (a Docker container) installed.

    • Container instance can only use Amazon Linux AMI

    • ECS container agent registers the container instance to the cluster.

    • ECS container agent configuration

      • /etc/ecs/ecs.config
    • Load balancing

      • ALB and NLB supports dynamic host port mapping (opens in a new tab), allowing you to have multiple tasks from a single service on the same container instance.

      • To enable dynamic host port mapping, host port must be set to 0 or empty in task definition.

      • CLB does not allow you to run multiple copies of a task on the same instance because the ports conflict.

    • Task definition (opens in a new tab)

      • A task is similar to a pod in Kubernetes.
      • Container definitions (opens in a new tab)
        • Define one or multiple containers
        • Standard parameters: Name, Image, Memory, Port Mappings
      • Every container in a task definition must land on the same container instance.
      • Need to specify resources needed
      • Need to specify configuration specific to the task
      • Need to specify the IAM role that your task should use
    • Task placement (opens in a new tab)

      • Strategy (opens in a new tab)

        • binpack

          Tasks are placed on container instances so as to leave the least amount of unused CPU or memory to minimize the number of container instances in use.

        • random

          Random places tasks on instances at random. This still honors the other constraints that you specified, implicitly or explicitly. Specifically, it still makes sure that tasks are scheduled on instances with enough resources to run them.

        • spread

          Tasks are placed evenly based on the specified value.

      • Constraint (opens in a new tab)

        • distinctInstance

          Place each task on a different container instance.

        • memberOf

          Place tasks on container instances that satisfy an Cluster query expression.

      • Cluster query language (opens in a new tab)

        • Cluster queries are expressions for targeting container instances, which can be used in task placement memberOf constraint.
  • Fargate Launch Type

    • Fully managed
    • Serverless
  • IAM

  • Resources

EKS

Lambda

Lambda - Invocation Models

Lambda - Invocation Models - synchronous

Synchronous invocation (default) (opens in a new tab)

  • RPC style

  • Invocation Type: RequestResponse

  • Services

    • ELB (Application Load Balancer)
    • Cognito
    • Lex
    • Alexa
    • API Gateway
    • CloudFront (Lambda@Edge)
    • Kinesis Data Firehose
  • Details about the function response, including errors, are included in the response body and headers.

Lambda - Invocation Models - asynchronous

Asynchronous invocation (opens in a new tab)

  • Invocation Type: Event

  • Services

    • S3
    • SNS
    • SES
    • CloudFormation
    • CloudWatch Logs
    • CloudWatch Events
    • CodeCommit
    • AWS Config
  • Lambda adds events to a queue before sending them to your function. If your function does not have enough capacity to keep up with the queue, events may be lost.

  • Suitable for services producing events at a lower rate than the function can process, as there is usually no message retention and message loss would happen if function is overwhelmed.

  • For higher throughput, consider using SQS or Kinesis and Lambda event source mapping.

  • DLQ (opens in a new tab)

    • Either a SNS topic or a SQS queue, as the destination for all failed invocation events.
    • An alternative to an on-failure destination, but a part of a function's version-specific configuration, so it is locked in when you publish a version.
  • Destinations for asynchronous invocation (opens in a new tab)

    • Types

      • SQS – A standard SQS queue
      • SNS – A SNS topic
      • Lambda – A Lambda function
      • EventBridge – An EventBridge event bus
    • You can configure condition of the destination to be on success or on failure.

Lambda - event source mapping

Event source mapping (poll-based) (opens in a new tab)

  • A Lambda integration setup for poll-based event sources (with data in potentially large volume) such as queues and streams.

  • Lambda pulls records from the data stream of event sources and invokes your function synchronously with an event that contains stream records. Lambda reads records in batches and invokes your function to process records from the batch.

  • Process items from a stream or queue in services that don't invoke Lambda functions directly

  • Event source mappings that read from a stream are limited by the number of shards in the stream.

  • Services

    • SQS
    • DynamoDB Streams
    • Kinesis
    • MQ
    • MSK (Managed Streaming for Apache Kafka)
    • Self-managed Apache Kafka
  • Parallelization Factor

    • Kinesis and DynamoDB Streams only

Lambda - authorization

  • Execution permissions

    • Assigned to Lambda function
    • Enable the Lambda function to access other AWS resources in your account.
  • Invocation permissions

    • Assigned to event source
    • Enable the event source to communicate with your Lambda function.

Lambda - runtime

Custom runtime (opens in a new tab)

  • You can implement a Lambda custom runtime in any programming language.

  • A runtime is a program that runs a Lambda function's handler method when the function is invoked. You can include a runtime in your function's deployment package in the form of an executable file named bootstrap.

  • A runtime is responsible for running the function's setup code, reading the handler name from an environment variable, and reading invocation events from the Lambda runtime API. The runtime passes the event data to the function handler, and posts the response from the handler back to Lambda.

  • The runtime can be included in your function's deployment package, or in a layer.

  • Scripting language runtime such as Node.js and Python runtime have better native support than Java, as some tooling support enables deploying source code directly.

  • Resources

Lambda - execution environment lifecycle

Execution environment lifecycle (opens in a new tab)

  • Init

    • Happens at the time of the first function invocation

    • In advance of function invocations if you have enabled provisioned concurrency.

    • 3 Tasks

      • Extension Init

      • Runtime Init

      • Function Init

        Runs the function’s initialization code (the code outside the main handler)

  • Invoke

  • Shutdown

Lambda - function deployment

Lambda - function handler

Lambda - function configuration

  • The total size of all environment variables doesn't exceed 4 KB.

  • Memory

    • From 128 MB to 3008 MB in 64-MB increments
    • You can only directly configure the memory for your function, and Lambda allocates CPU power in proportion to the amount of memory configured.
  • Timeout

    • Default is 3 seconds, and max is 15 minutes (900 seconds).
    • AWS charges based on execution time in 100-ms increments.
  • Network

    • Network configuration
      • default
      • VPC
    • A Lambda function in your VPC has no internet access.
    • Deploying a Lambda function in a public subnet doesn't give it internet access or a public IP.
    • Deploying a Lambda function in a private subnet gives it internet access if you have a NAT Gateway / Instance.
    • Use VPC endpoints to privately access AWS services without a NAT.
  • Concurrency

    • By default, the concurrent execution limit is enforced against the sum of the concurrent executions of all functions.

    • By default, the account-level concurrency within a given Region is set with 1000 concurrent execution as a maximum to provide you 1000 concurrent functions to execute. You can open a support ticket with AWS to request an increase in your account level concurrency limit.

    • Lambda requires at least 100 unreserved concurrent executions per account.

    • Concurrency = (average requests per second) * (average request duration in seconds)

    • Reserved concurrency

      Applies to the entire function, including all versions and aliases

    • Provisioned concurrency (opens in a new tab)

      • To enable a function to scale without fluctuations in latency.
      • Provisioned concurrency cannot exceeds reserved concurrency.
      • Provisioned concurrency simply initializes the assigned capactity upfront to avoid a cold-start, hence without noticeable latency.
    • Parallelization Factor (opens in a new tab)

      • For stream processing (event source mapping), one Lambda function invocation processes one shard at a time, namely Parallelization Factor is 1.
      • Parallelization Factor can be set to increase concurrent Lambda invocations for each shard, which by default is 1.
  • Version (opens in a new tab)

    • Each Lambda function version has a unique ARN. After you publish a version, it is immutable, so you cannot change it.

    • A function version includes:

      • function code and all associated dependencies
      • Lambda runtime that invokes the function
      • All of the function settings, including the environment variables
      • A unique ARN to identify the specific version of the function
  • Alias (opens in a new tab)

    • An alias is a pointer to a version, and therefore it also has a unique ARN. Assign an alias to a particular version and use that alias in the application to avoid updating all references to the old version.

    • An alias cannot point to $LATEST.

    • Weighted alias

      • An alias allows you to shift traffic between 2 versions based on specified weights (%).
  • Layers (opens in a new tab)

    • A layer is a .zip file archive that contains libraries, a custom runtime, or other dependencies. With layers, you can use libraries in your function without needing to include them in your deployment package.
    • A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250 MB.
    • Layers are extracted to the /opt directory in the function execution environment. Each runtime looks for libraries in a different location under /opt, depending on the language.
  • Environment variables (opens in a new tab)

    • X-Ray

      • _X_AMZN_TRACE_ID

        X-Ray tracing header

      • AWS_XRAY_CONTEXT_MISSING: RUNTIME_ERROR (default), LOG_ERROR

        Lambda sets this to LOG_ERROR to avoid throwing runtime errors from the X-Ray SDK.

Lambda - monitoring

  • Metrics (opens in a new tab)

    • Invocations

      the number of requests billed

    • Duration

      the amount of time that your function code spends processing an event

Lambda - service integration

Using AWS Lambda with other services (opens in a new tab)

Step Functions

  • Workflow type is either Standard or Express (opens in a new tab), and cannot be changed once created.

  • Standard Workflow

    • Maximum execution time: 1 year
    • Priced per state transition. A state transition is counted each time a step in your execution is completed.
  • Express Workflow

    • Maximum execution time: 5 minutes

    • Priced by the number of executions you run, their duration, and memory consumption.

    • Types

      • Synchronous
      • Asynchronous
  • States (opens in a new tab)

IAM

IAM - Access Analyzer

IAM - Access Advisor

IAM - User

  • Uniquely identified identity

  • Long-term effective

  • Access

    • Programmatic (Access key ID and Secret Access key)
    • Web (Web Management Console)

IAM - Role

  • Similar to a User with attached Permissions policies

  • Not uniquely identified, but a distinct identity with its own permissions

  • Temporarily effective for a designated timeframe

  • If an IAM user assumes a Role, only the policies of the assumed Role are evaluated. The user's own policies wouldn't be evaluated.

  • Cannot be added to IAM groups

  • Trust policy specifies who can assume a Role.

  • An IAM role is both an identity and a resource that supports resource-based policies (Trust policy).

  • Service-Linked Role

  • Cross account access (opens in a new tab) can be given by allowing principals in account A to assume roles in account B.

    • When the principal and the resource are in different AWS accounts, an IAM administrator in the trusted account must also grant the principal entity (user or role) permission to access the resource.

    • Trust policy to authorize the specified account to assume the role. For roles from a different account, the Principal ARN contains its AWS account ID.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": {
              "AWS": ["arn:aws:iam::<another-Account-ID>:role/<DesiredRoleName>"]
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }
    • Example

      • Account A

        • Trust policy to authorize a Role in Account B
      • Account B

        • Identity-based policy to authorize a User in Account B to access the resource in Account A
  • Instance profile (opens in a new tab)

    • EC2 uses an instance profile as a container for an IAM role.
    • If you use the AWS Management Console to create a role for EC2, the console automatically creates an instance profile and gives it the same name as the role.
    • An instance profile is not an AWS CLI profile.

IAM - Policy

AWS Organizations (opens in a new tab)

  • Features

    • Centralized management of all of your AWS accounts
    • Consolidated billing for all member accounts
    • Hierarchical grouping of your accounts to meet your budgetary, security, or compliance needs
    • Service control policies (SCPs)
    • Tag policies
    • AI services opt-out policies
    • Backup policies
    • Free to use

Service control policies (SCP) (opens in a new tab)

  • Affect only the member accounts in an Organization
  • SCPs offer central control over the maximum available permissions for all accounts in an Organization.
  • SCPs are similar to IAM permission policies and use almost the same syntax. However, an SCP never grants permissions. Instead, SCPs are JSON policies that specify the maximum permissions for the affected accounts.
  • SCP can be used to restrict root account.

STS (Security Token Service) (opens in a new tab)

  • GetSessionToken (opens in a new tab)

    • Returns a set of temporary credentials for an AWS account or IAM user. The credentials consist of an access key ID, a secret access key, and a security token.
    • Using the temporary credentials that are returned from the call, IAM users can then make programmatic calls to API operations that require MFA authentication.
    • Credentials based on account credentials can range from 900 seconds (15 minutes) up to 3600 seconds (1 hour), with a default of 1 hour.
  • AssumeRole (opens in a new tab)

    Returns a set of temporary security credentials that you can use to access AWS resources that you might not normally have access to. These temporary credentials consist of an access key ID, a secret access key, and a security token.

  • DecodeAuthorizationMessage (opens in a new tab)

    Decodes additional information about the authorization status of a request from an encoded message returned in response to an AWS request.

STS - Cheatsheet

STS - Get Caller Identity

  • GetCallerIdentity returns details about the IAM user or role whose credentials are used to call the operation.

    aws sts get-caller-identity

STS - View the maximum session duration setting for a role

S3

  • Data Consistency (opens in a new tab)

    • Strong read-after-write (GET or LIST) consistency for PUTs and DELETEs of objects
    • Strong read consistency for S3 Select, S3 Access Control Lists, S3 Object Tags, and object metadata
    • Updates to a single object key are atomic, and there is no way to make atomic updates across keys.
    • High availability by replicating data across multiple servers within AWS data centers.
    • Bucket configurations have an eventual consistency model.
    • Wait for 15 minutes after enabling versioning before issuing write operations (PUT or DELETE) on objects in the bucket.
    • S3 does not support object locking for concurrent writers.

S3 - Bucket

  • S3 lists all buckets, but bucket is created specific to a region, but Cross-Region Replication (CRR) can be used to replicate objects (and their respective metadata and object tags) into other Regions.
  • Flat structure, folders in S3 are simply shared name prefix
  • Bucket name must be globally unique, and cannbot be changed once created.
  • Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).
  • To ensure Bucket names are DNS-friendly, it's preferable to avoid dots in names.
  • Objects in Bucket are private by default.
  • There are no limits to the number of prefixes in a bucket.

S3 - Bucket - Versioning

  • Buckets can be in one of 3 states

    • Unversioned (default)
    • Versioning-enabled
    • Versioning-suspended
  • Once you enable versioning on a bucket, it can never return to the unversioned state. You can, however, suspend versioning on that bucket.

  • If you have not enabled versioning, S3 sets the value of the version ID to null.

  • Objects stored in your bucket before you set the versioning state have a version ID of null.

  • Suspend

    This suspends the creation of object versions for all operations but preserves any existing object versions.

S3 - Bucket - Lifecycle

  • Multiple lifecycle rules

    • Permanent deletion > Transition > Creation of delete markers (versioned bucket)

    • Transition

      S3 Glacier Flexible Retrieval > S3 Standard-IA / S3 One Zone-IA

S3 - Bucket - Object Lock

Object Lock (opens in a new tab)

  • Prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.

  • Object Lock works only in versioned buckets, and retention periods and legal holds apply to an individual object version.

  • Use Object Lock to meet regulatory requirements that require WORM storage, or add an extra layer of protection against object changes and deletion.

  • Retention mode

    • Compliance mode

      The protected object version can't be overwritten or deleted by any user, including the root user in your AWS account. When an object is locked in compliance mode, its retention mode can't be changed, and its retention period can't be shortened.

    • Governance mode

      You protect objects against being deleted by most users, but you can still grant some users permission to alter the retention settings or delete the objects if necessary. You can also use governance mode to test retention-period settings before creating a compliance-mode retention period.

S3 - Bucket - Replication

Replication (opens in a new tab)

  • Both source and destination buckets must have versioning enabled.

  • Destination buckets can be in different Regions or within the same Region as the source bucket.

  • New objects

    • Replicate new objects as they are written to the bucket
    • Use live replication such as CRR or SRR
    • CRR and SRR are implemented with the same API, and differentiated by the destination bucket configuration.
  • Existing objects

    • Use S3 Batch Operations

S3 - Bucket - Static Website Hosting

Static website hosting (opens in a new tab)

  • index document must be specified, and error document is optional.
  • If you create a folder structure in your bucket, you must have an index document at each level. In each folder, the index document must have the same name, for example, index.html.
  • S3 website endpoints do not support HTTPS. Use CloudFront in that case.
  • Access a website hosted in a S3 bucket with a custom domain
    • The Bucket is configured as a static website.
    • Bucket name must match the domain name exactly.
    • Add an alias record in Route53 to route traffic for the domain to the S3 Bucket

S3 - Bucket - Event Notifications

S3 Event Notifications (opens in a new tab)

  • Destination
    • Lambda function
    • SNS topic
    • SQS standard queue (FIFO queue not supported)
    • EventBridge event bus
  • If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent.
  • If you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket. With versioning, every successful write will create a new version of your object and will also send an event notification.

S3 - Bucket - Management - Inventory

S3 Inventory (opens in a new tab)

  • Audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs.
  • Generates inventories of the objects in the bucket on a daily or weekly basis, and the results are published to a flat file.
  • The bucket that is inventoried is called the source bucket, and the bucket where the inventory flat file is stored is called the destination bucket.
  • The destination bucket must be in the same Region as the source bucket.
  • S3 inventory gives you a complete list of your objects. This list will be published to the destination bucket, and can be given in Parquet, ORC or CSV formats, therefore can be analyzed with Athena.

S3 - Bucket - Select

S3 Select (opens in a new tab)

  • Use a subset of SQL statements to filter the contents of S3 objects and retrieve just the subset of data that you need.
  • By using S3 Select to filter this data, you can reduce the amount of data that S3 transfers, which reduces the cost and latency to retrieve this data.
  • S3 Select works on objects stored in CSV, JSON, or Apache Parquet format with compression of GZIP or BZIP2.
  • You can only query one object at a time.
  • If you use FileHeaderInfo.USE, you can only reference column with column name.
  • Column name must be quoted with " if it contains special characters or is a reserved word. e.g. SELECT s."column name" FROM S3Object s

S3 - Bucket - Transfer Acceleration

Transfer Acceleration (opens in a new tab)

  • Use the edge locations of CloudFront network to accelerate transfer between your client and the specified S3 bucket.
  • Not recommended for small files or close proximity to the S3 Region.

S3 - Bucket - Analytics

S3 Analytics (opens in a new tab)

  • You use storage class analysis to observe your data access patterns over time to gather information to help you improve the lifecycle management of your STANDARD_IA storage.
  • Analyze storage access patterns to help you decide when to transition the right data to the right storage class.

S3 - Bucket - Access Points

Access Points (opens in a new tab)

  • Simplify managing data access at scale for shared datasets in S3, enabling different teams to access shared data with different permissions.

  • Traits

    • Access points are named network endpoints attached to buckets that you can use to perform S3 object operations, such as GetObject and PutObject.

    • For S3 object operations, you can use the access point ARN in place of a bucket name.

    • Each access point has distinct permissions and network controls that S3 applies for any request that is made through that access point.

    • You can only use access points to perform operations on objects.

    • S3 operations compatible with access points

      Access point compatibility with S3 operations (opens in a new tab)

S3 - Bucket - Access Points - Object Lambda

S3 Object Lambda (opens in a new tab)

S3 - Object

  • At-Rest Encryption

    • S3 only supports symmetric CMKs, not asymmetric CMKs.

    • Server-side Encryption (opens in a new tab)

      • Adding the x-amz-server-side-encryption header to the HTTP request to demand server-side encryption.

      • SSE-S3 (opens in a new tab)

        • Use AWS managed CMK to generate data key for encryption, user intervention not needed
        • x-amz-server-side-encryption: AES256
      • SSE-KMS (opens in a new tab)

        • Use a CMK you created in KMS to generate data key for encryption, requiring permission for KMS access

        • x-amz-server-side-encryption: aws:kms

        • When you upload an object, you can specify the AWS KMS CMK using the x-amz-server-side-encryption-aws-kms-key-id header. If the header is not present in the request, S3 assumes the AWS managed CMK.

        • Permissions

          • kms:GenerateDataKey (opens in a new tab)

            Returns a plaintext copy of the data key and a copy that is encrypted under a customer master key (CMK) that you specify.

          • kms:Decrypt (opens in a new tab)

            Multipart uploading needs this permission to decrypt the encrypted data key kept with the encrypted data as the plain text one is deleted after the first part is uploaded.

      • SSE-C (opens in a new tab)

        • Provide your own data key upon every encryption and decryption action

        • Must use HTTPS

        • S3 does not store the encryption key you provide. Instead, it stores a randomly salted HMAC value of the encryption key to validate future requests.

        • x-amz-server-side​-encryption​-customer-algorithm

          must be AES256

        • x-amz-server-side​-encryption​-customer-key

          the 256-bit, base64-encoded encryption key

        • x-amz-server-side​-encryption​-customer-key-MD5

          message integrity check to ensure that the encryption key was transmitted without error

    • Client-side Encryption (opens in a new tab)

      • Encryption and decryption happen on the client side with S3 only saving your data.
      • You can use your CMK stored locally or CMK stored in KMS.
    • As an analogy, suppose you go to work on any business day, and need to figure out how to have lunch.

      • Client-side encryption is like having lunch at home.
      • SSE-S3 is like ordering takeaway from your office.
      • SSE-KMS is like having lunch at your company's onsite canteen.
      • SSE-C is like bringing your lunch from home to work.
  • S3 Batch Operations

    S3 Batch Operations (opens in a new tab)

    • Large-scale batch operations on S3 objects
    • EB scale
    • Requires S3 Inventory to be enabled
  • Uploading

    • When a file is over 100 MB, multipart upload is recommended as it will upload many parts in parallel, maximizing the throughput of your bandwidth and also allowing for a smaller part to retry in case that part fails.
    • You can upload a single object up to 5 GB. More than 5 GB, you must use multipart upload.
    • Part size: 5 MB to 5 GB. There is no size limit on the last part of your multipart upload.
    • Object size: 0 to 5 TB
    • To perform a multipart upload with encryption using an AWS KMS key, the requester must have kms:GenerateDataKey permissions to initiate the upload, and kms:Decrypt permissions to upload object parts. The requester must have kms:Decrypt permissions so that newly uploaded parts can be encrypted with the same key used for previous parts of the same object.
  • Quota

    • 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix in a bucket
    • No limits to the number of prefixes in a bucket

S3 - Object - Presigned URL

Presigned URL (opens in a new tab)

  • Grant URL caller temporary access to the specified S3 object without authentication and authorization.

  • Generated programmatically

  • GET for downloading and PUT for uploading

  • As a general rule, AWS recommends using bucket policies or IAM policies for access control. ACLs is a legacy access control mechanism that predates IAM.

  • S3 stores access logs as objects in a bucket. Athena supports analysis of S3 objects and can be used to query S3 access logs.

S3 - Security

S3 - Security - Block public access

Block public access (opens in a new tab)

A shortcut switch to block all public access granted in Bucket Policy or ACLs.

S3 - Security - ACL

Access Control List (opens in a new tab)

  • Can define which AWS accounts or groups are granted access and the type of access.
  • Can manage permissions of Objects.

S3 - Bucket - Permissions - CORS

CORS (opens in a new tab)

  • To configure your bucket to allow cross-origin requests, you create a CORS configuration.

S3 - Storage Lens

Cloud storage analytics solution with support for AWS Organizations to give you organization-wide visibility into object storage, with point-in-time metrics and trend lines as well as actionable recommendations.

All these things combined in an interactive dashboard will help you discover anomalies, identify cost efficiencies, and apply data protection best practices across accounts.

S3 - Storage classes

Storage classes (opens in a new tab)

S3 - Storage classes - S3 Standard

S3 - Storage classes - S3 Intelligent-Tiering

S3 Intelligent-Tiering (opens in a new tab)

  • Characteristics

    • No retrieval charges

    • Automatic storage cost savings when data access patterns change, without performance impact or operational overhead

    • Access tiers

      • Frequent Access tier

        Objects uploaded to S3 Intelligent-Tiering are stored in the Frequent Access tier.

      • Infrequent Access tier

        Objects not accessed for 30 consecutive days are automatically moved to the Infrequent Access tier.

      • Archive Instant Access tier

        Objects not accessed for 90 consecutive days are automatically moved to the Archive Instant Access tier.

    • Frequent Access, Infrequent Access, and Archive Instant Access tiers have the same low-latency and high-throughput performance of S3 Standard

    • The Infrequent Access tier saves up to 40% on storage costs

    • The Archive Instant Access tier saves up to 68% on storage costs

  • Use cases

    • Suitable for objects with unknown or changing access patterns
    • Suitable for objects equal to or larger than 128 KB
  • Anti patterns

    • Objects smaller than 128 KB will not be monitored and will always be charged at the Frequent Access tier rates, with no monitoring and automation charge.
    • Data retrieval or modification is more frequent than the transition intervals.
    • Access patterns are predictable and you can manage the storage classes transitions explicitly.

S3 - Storage classes - S3 Standard-IA

  • For data that is accessed less frequently, but requires rapid access when needed.
  • Incurs a data retrieval fee

S3 - Storage classes - S3 One Zone-IA (S3 One Zone-Infrequent Access)

  • Stores data in a single AZ and costs 20% less than S3 Standard-IA
  • Incurs a data retrieval fee

S3 on Outposts

S3 - CLI Cheatsheet

S3 Glacier

S3 Glacier (opens in a new tab)

S3 Glacier - Instant Retrieval

  • Ideal for long-lived archive data accessed once or twice per quarter with instant retrieval in milliseconds
  • The lowest cost archive storage with milliseconds retrieval
  • Offer a cost savings compared to the S3 Standard-IA, with the same latency and throughput performance as the S3 Standard-IA.
  • Higher data access costs than S3 Standard-IA
  • Min storage duration of 90 days

S3 Glacier - Flexible Retrieval

  • Ideal for long-lived archive data accessed once a year with retrieval times of minutes to hours

  • Min storage duration of 90 days

  • Archive Retrieval Options

    • Expedited: 1–5 minutes

      • Incurs a data retrieval fee
    • Standard: 3–5 hours

      • Incurs a data retrieval fee
    • Bulk: 5–12 hours

      • Free data retrieval

S3 Glacier - Deep Archive

  • Ideal for long-lived archive data accessed less than once a year with retrieval times of hours
  • Default retrieval time of 12 hours
  • Min storage duration of 180 days
  • Incurs a data retrieval fee

CloudFront

  • Distribution (opens in a new tab)

  • Lambda@Edge (opens in a new tab)

    • Lambda functions of Python and Node.js runtime can be deployed at CloudFront edge locations
    • Lambda@Edge allows you to pass each request through a Lambda to change the behaviour of the response.
    • Authorization@Edge (opens in a new tab): You can use Lambda@Edge to help authenticate and authorize users for the premium pay-wall content on your website, filtering out unauthorized requests before they reach your origin infrastructure.
  • Origin access (opens in a new tab)

    • Benefits

      • Restricts access to the AWS origin so that it's not publicly accessible
    • Origin type

      • S3

        • OAC / Origin Access Control

          • S3 SSE-KMS

          • Dynamic requests (PUT and DELETE) to S3

        • OAI / Origin Access Identity (legacy)

          • Restricting Access to S3 content by using an Origin Access Identity, a special CloudFront user, which the target S3 bucket can reference in bucket policy. Once set up, users can only access files through CloudFront, not directly from the S3 bucket.
      • MediaStore

  • Serving private content (opens in a new tab)

    • To use signed URLs or signed cookies, you need a signer. A signer is either a trusted key group (Recommended) that you create in CloudFront, or an AWS account that contains a CloudFront key pair (can only be created by root user).

    • You cannot use either signed URLs or signed cookies if original URL contains Expires, Policy, Signature, Key-Pair-Id query parameters.

    • Signed URL (opens in a new tab)

      • Uses a JSON policy statement (canned or custom) to specify the restrictions of the signed URL
      • Use signed URLs when you want to restrict access to individual files.
      • Use signed URLs when your users are using a client that doesn't support cookies.
    • Signed cookies (opens in a new tab)

      • Use signed cookies when you want to provide access to multiple restricted files.
      • Use signed cookies when you don't want to change your current URLs.
  • Using HTTPS with CloudFront (opens in a new tab)

    • Both connections between viewers and CloudFront, and connections between CloudFront and origin can be encrypted by using HTTPS.
    • You can't use a self-signed SSL certificate for HTTPS communication between CloudFront and your origin, and the certificate must be managed by ACM.
    • You don't need to add an SSL certificate if you only require HTTPS for communication between the viewers and CloudFront (default certificate provided by CloudFront).
  • Availability

    • Origin failover (opens in a new tab)
      • an origin group with two origins: a primary and a secondary. If the primary origin is unavailable, or returns specific HTTP response status codes that indicate a failure, CloudFront automatically switches to the secondary origin.
      • To set up origin failover, you must have a distribution with at least 2 origins.

RDS

AWS Docs - RDS (Relational Database Service) (opens in a new tab)

  • Authentication

    • IAM database authentication (opens in a new tab)
      • Only works with MySQL and PostgreSQL.
      • Instead of password, an authentication token is generated by RDS when you connect to a DB instance.
      • Each authentication token has a lifetime of 15 minutes.
      • Recommended as a temporary and personal access
  • Read Replicas (opens in a new tab) (for Scalability)

    • Operates as a DB instance that only allows read-only connections; applications can connect to a read replica just as they would to any DB instance.

    • Asynchronous replication to a Read Replica

    • Uses a different DB connection string than the one used by the master instance

      To be able to switch at runtime, it'd need 2 connection pools in the application respectively .

    • Can be promoted to the master

    • Support Cross-Region read replicas (opens in a new tab)

  • Multi-AZ deployments (opens in a new tab) (for High Availability)

    • Synchronous replication to a standby instance in a different AZ

    • In case of an infrastructure failure, RDS performs an automatic failover to the standby instance (or to a read replica in the case of Amazon Aurora), so that you can resume database operations as soon as the failover is complete.

    • The endpoint for your DB instance remains the same after a failover

    • The failover mechanism automatically changes the DNS CNAME record of the DB instance to point to the standby instance.

    • The standby instance cannot be used as a read replica.

    • Multi-AZ DB instance deployment

      • 1 standby DB instance
      • failover support
      • no read traffic support
    • Multi-AZ DB cluster deployment

      • 3 DB instances
      • failover support
      • read traffic support
    • Resources

  • Snapshot

    • When you perform a restore operation to a point in time or from a DB snapshot, a new DB instance is created with a new endpoint (the old DB instance can be deleted if so desired). This is done to enable you to create multiple DB instances from a specific DB snapshot or point in time.

    • Automated backups are limited to a single Region while manual snapshots and read replicas are supported across multiple Regions.

    • Manual snapshot

      • When you delete a DB instance, you can create a final DB snapshot upon deletion.
      • Manual snapshots are kept after the deletion of the DB instance.
    • Automated snapshot

      • Configurable retention period with 7 day by default up to 35 days
      • Cannot be manually deleted, automatically deleted when the DB instance is deleted
      • Stored in S3
      • Storage of automated snapshots are free as long as the DB instance is running. If the DB instance is stopped, the storage of automated snapshots would be charged as per standard pricing.
  • Encryption

  • Monitoring

    • Enhanced Monitoring (opens in a new tab)
      • RDS provides metrics in real time for the OS that your DB instance runs on.
      • Enhanced Monitoring metrics are stored in the CloudWatch Logs instead of in Cloudwatch Metrics.
      • After you have enabled Enhanced Monitoring for your DB instance, you can view the metrics for your DB instance using CloudWatch Logs, with each log stream representing a single DB instance being monitored.
      • CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.

RDS - Aurora

AWS Docs - Aurora (opens in a new tab)

  • Serverless, fully managed RDBMS compatible with MySQL and PostgreSQL.
  • Up to 5 times the throughput of MySQL and up to 3 times the throughput of PostgreSQL without requiring changes to most of your existing applications.
  • Up to 15 read replica
  • Automatic backup

RDS - RDS Proxy

RDS Proxy (opens in a new tab)

  • Establishes a database connection pool and reuses connections in this pool.
  • Makes applications more resilient to database failures by automatically connecting to a standby DB instance while preserving application connections.

RDS - Cheatsheet

List clusters

aws rds describe-db-clusters \
--query 'sort_by(DBClusters,&DBClusterIdentifier)[].{ClusterID:DBClusterIdentifier, ClusterARN:DBClusterArn, Port:Port, Engine:Engine, Version:EngineVersion, Status:Status}' \
--output table

List DB instances

aws rds describe-db-instances \
--query 'sort_by(DBInstances,&DBInstanceIdentifier)[].{InstanceID:DBInstanceIdentifier, InstanceARN:DBInstanceArn, Engine:Engine, Version:EngineVersion, Status:DBInstanceStatus}' \
--output table

DynamoDB

  • Schemaless, you can only specify keys upon creation of tables, non-key attributes can only be added as part of new records.

DynamoDB - Availability

  • Region specific
  • Data replicated among multiple AZs in a Region

DynamoDB - Table Class

  • Standard

    • Offers lower throughput costs than DynamoDB Standard-IA and is the most cost-effective option for tables where throughput is the dominant cost.
  • Standard-IA

    • Offers lower storage costs than DynamoDB Standard, and is the most cost-effective option for tables where storage is the dominant cost.
    • When storage exceeds 50% of the throughput (reads and writes) cost of a table using the DynamoDB Standard table class, the DynamoDB Standard-IA table class can help you reduce your total table cost.

DynamoDB - Primary Key

DynamoDB - GSI

  • To speed up queries on non-key attributes
  • An index with a partition key and a sort key that can be different from those on the base table
  • It is considered global because queries on the index can span all of the data in the main table across all partitions.
  • The main table's primary key attributes are always projected into an index.
  • Up to 20 GSI / table (soft limit)
  • Can be created after table creation
  • RCU and WCU provisioned independently of main table, and therefore a Query operation on a GSI consumes RCU from the GSI, not the main table. When you change items in a table, the GSI on that table are also updated. These index updates consume WCU from the GSI, not from the main table.
  • If the writes are throttled on the GSI, the write activity on the main table will also be throttled.
  • Only support eventual consistent reads (cannot provide strong consistency)
  • In a DynamoDB table, each key value must be unique. However, the key values in a GSI do not need to be unique.

DynamoDB - LSI

  • An index with the same Partition key but a different Sort key
  • Up to 5 LSI / table (hard limit)
  • Cannot be created after table creation
  • Use the WCU and RCU of the base table
  • No special throttling considerations
  • Supports both strong and eventual consistent reads
  • A LSI lets you query over a single partition, as specified by the partition key value in the query.

DynamoDB - Read Consistency

  • Read committed isolation level

  • base table

    • Strongly consistent read
    • Eventually consistent read
  • LSI

    • Strongly consistent read
    • Eventually consistent read
  • GSI

    • Eventually consistent read
  • DynamoDB streams

    • Eventually consistent read

DynamoDB - Capacity

  • Throughput mode: Provisioned or On-Demand

  • Read Capacity Unit (RCU)

    • 1 RCU = 1 strongly consistent read/s or 2 eventually consistent read/s, for an item up to 4 KB in size.
    • For item size more than 4 KB, it would take an additional RCU.
    • For item size less than 4 KB, it would still take one RCU.
    • Calculation
      • strongly consistent
        1. Round data up to nearest 4
        2. Divide data by 4
        3. Multiplied by number of reads
      • eventual consistent
        1. Round data up to nearest 4
        2. Divide data by 4
        3. Multiplied by number of reads
        4. Divide final number by 2
        5. Round up to the nearest whole number
  • Write Capacity Unit (WCU)

    • 1 WCU = 1 write/s for an item up to 1 KB in size.
    • For item size more than 1 KB, it would take an additional WCU.
    • For item size less than 1 KB, it would still take 1 WCU.
    • Calculation
      1. Round data up to nearest 1
      2. Multiplied by number of writes
  • If your application consumes more throughput than configured in the provisioned throughput settings, application requests start throttling.

  • Adaptive Capacity (opens in a new tab)

    • Boost Throughput Capacity to High-Traffic Partitions
      • Enables your application to continue reading and writing to hot partitions without being throttled, provided that traffic does not exceed your table’s total provisioned capacity or the partition maximum capacity.
    • Isolate Frequently Accessed Items
      • If your application drives disproportionately high traffic to one or more items, adaptive capacity rebalances your partitions such that frequently accessed items don't reside on the same partition.
  • To retrieve consumed capacity by an operation, parameter ReturnConsumedCapacity (opens in a new tab) can be included in the request to API, with 3 options: INDEXES, TOTAL, NONE.

DynamoDB - Query

Query (opens in a new tab)

  • Query requires the partition key value and returns all items with it. Optionally, you can provide a sort key attribute and use a comparison operator to refine the search results.
  • A filter expression determines which items within the Query results should be returned to you. This happens after the itmes are returned therefore doesn't improve performance.
  • A single Query operation can retrieve a maximum of 1 MB of data.
  • Query results are always sorted by the sort key value, by default in ascending order.

DynamoDB - Scan

Scan (opens in a new tab)

  • Reads every item in a table or a secondary index
  • By default, a Scan operation returns all of the data attributes for every item in the table or index.
  • If the total number of scanned items exceeds the maximum dataset size limit of 1 MB (default page size), the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.
  • a Scan operation reads an entire page (by default, 1 MB), you can reduce the impact of the scan operation by setting a smaller page size.
  • Each Query or Scan request that has a smaller page size uses fewer read operations and creates a "pause" between each request.
  • Scan uses Limit parameter to set the page size for your request.
  • Parallel Scan
    • The table size is 20 GB or larger.
    • The table's provisioned RCU is not being fully used.
    • Default sequential Scan operations are too slow.

DynamoDB - TTL

TTL (opens in a new tab)

  • Must identify a specific attribute name that the service will look for when determining if an item is eligible for expiration.
  • The attribute should be a Number data type containing time in epoch format.
  • Once the timestamp expires, the corresponding item is deleted from the table in the background.

DynamoDB - Data type

DynamoDB - DAX

AWS Docs - DynamoDB Accelerator (DAX) (opens in a new tab)

  • Characteristics

    • A fully managed in-memory write through cache for DynamoDB that runs in as a cluster in your VPC.
    • Should be provisioned in the same VPC as the EC2 instances that are accessing it.
  • Pros

    • Fastest response times possible to microseconds
    • Apps that read a small number of items more frequently
    • Apps that are read intensive
  • Cons

    • Reads must be eventually consistent, therefore apps requiring strongly consistent reads cannot use DAX
    • Not suitable for apps that do not require microsecond read response times
    • Not suitable for apps that are write intensive, or that do not perform much read activity
  • Supports following read operations in eventually consistent read mode

  • The following DAX API operations are considered write-through

    • BatchWriteItem
    • UpdateItem
    • DeleteItem
    • PutItem
  • Misc

    • ElastiCache can be used with other DBs and applications, while DAX is for DynamoDB only.

DynamoDB - Transaction

  • Supports transactions via the TransactWriteItems and TransactGetItems API calls.
  • Transactions let you query multiple tables at once and are an all-or-nothing approach.

DynamoDB - Global table

Global table (opens in a new tab)

  • HA and fault tolerance
  • Lower latency for users in different Regions
  • With global tables you can specify the Regions where you want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.
  • DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates, and therefore doesn't support optimistic locking.

DynamoDB - Streams

  • Capture item-level changes in your table, and push the changes to a DynamoDB stream. You then can access the change information through the DynamoDB Streams API.

  • View type

    • Keys only

      Only the key attributes of the modified item

    • New image

      The entire item, as it appears after it was modified

    • Old image

      The entire item, as it appeared before it was modified

    • New and old image

      Both the new and the old images of the item

  • Streams do not consume RCUs.

  • All data in DynamoDB Streams is subject to a 24-hour lifetime.

DynamoDB - Conditional operations

DynamoDB - Atomic counter

Atomic counter (opens in a new tab)

  • A numeric attribute that is incremented unconditionally, without interfering with other write requests
  • The numeric value increments each time you call UpdateItem.
  • An atomic counter would not be appropriate where overcounting or undercounting can't be tolerated.

DynamoDB - Quota

  • The maximum item size is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (binary length). The attribute name counts towards the size limit.

DynamoDB - Point-in-time recovery (PITR)

  • Continuous backup with per-second granularity so that you can restore to any given second in the preceding 35 days.
  • Using PITR, you can back up tables with hundreds of TB of data, with no impact on the performance or availability of your production applications.

DynamoDB - Resources

ElastiCache

  • ElastiCache is only accessible to resource operating within the same VPC to ensure low latency.

  • Caching Strategies (opens in a new tab)

    • Lazy Loading

      • On-demand loading of data from database if a cache miss occurs
    • Write-Through

      • Update cache whenever data is written to the database, ensuring cache is never stale.
    • TTL specifies the number of seconds until the key expires.

  • Memcached

    • Simple key/value store, only supports string, therefore suitable for static, small data such as HTML code fragments
    • Multi-threaded, scaling will cause loss of data
    • Marginal performance advantage because of simplicity
  • Redis

    • Supports advanced data structures
    • Single-threaded, scaling causes no loss of data
    • Finer-grained control over eviction
    • Supports persistence, transactions and replication
  • Use case

  • Resources

Route 53

  • Supported DNS record types (opens in a new tab)

    • A (Address) records

      Associate a domain name or subdomain name with the IPv4 address of the corresponding resource

    • AAAA (Address) records

      Associate a domain name or subdomain name with the IPv6 address of the corresponding resource

    • CAA

      A CAA record specifies which certificate authorities (CAs) are allowed to issue certificates for a domain or subdomain. Creating a CAA record helps to prevent the wrong CAs from issuing certificates for your domains.

    • CNAME (opens in a new tab)

      • Reroute traffic from one domain name (example.net) to another domain name (example.com)
      • The DNS protocol does not allow you to create a CNAME record for the top node of a DNS namespace (zone apex).
    • DS

      A delegation signer (DS) record refers a zone key for a delegated subdomain zone. You might create a DS record when you establish a chain of trust when you configure DNSSEC signing.

    • MX (Mail server) records

      Route traffic to mail servers

    • NAPTR

      A Name Authority Pointer (NAPTR) is a type of record that is used by Dynamic Delegation Discovery System (DDDS) applications to convert one value to another or to replace one value with another.

    • NS

      An NS record identifies the name servers for the hosted zone.

    • PTR

      A PTR record maps an IP address to the corresponding domain name.

    • SOA

      A start of authority (SOA) record provides information about a domain and the corresponding Amazon Route 53 hosted zone.

    • SPF

      Deprecated, TXT is recommended instead.

    • SRV

      SRV records are used for accessing services, such as a service for email or communications.

    • TXT

      A TXT record contains one or more strings that are enclosed in double quotation marks (").

  • Alias records (opens in a new tab)

    • Unlike a CNAME record, you can create an alias record at the top node of a DNS namespace (zone apex).
    • To route domain traffic to an ELB load balancer, use Route 53 to create an alias record that points to your load balancer.
    • A zone apex record is a DNS record at the root of a DNS zone, and the zone apex must be an A record.
  • Routing policy

    • Simple routing policy

      Use for a single resource that performs a given function for your domain, for example, a web server that serves content for the example.com website.

    • Failover routing policy

      Use when you want to configure active-passive failover.

    • Geolocation routing policy

      Use when you want to route traffic based on the location of your users.

    • Geoproximity routing policy

      Use when you want to route traffic based on the location of your resources and, optionally, shift traffic from resources in one location to resources in another.

    • Latency routing policy

      Use when you have resources in multiple Regions and you want to route traffic to the region that provides the best latency.

    • Multivalue answer routing policy

      Use when you want Route 53 to respond to DNS queries with up to eight healthy records selected at random.

    • Weighted routing policy

      Use to route traffic to multiple resources in specified proportions.

  • TTL

    • DNS records cache has a TTL. Any DNS update will not be visible until TTL has elapsed.
    • TTL should be set to strike a balance between how long the value should be cached vs how much pressure should go on the DNS.
  • Health checks

    • Health checks that monitor an endpoint
    • Health checks that monitor other health checks (calculated health checks)
    • Health checks that monitor CloudWatch alarms

Route 53 Resolver (opens in a new tab)

  • A Route 53 Resolver automatically answers DNS queries for:

    • Local VPC domain names for EC2 instances

      e.g. ec2-192-0-2-44.compute-1.amazonaws.com

    • Records in private hosted zones

      e.g. acme.example.com

    • For public domain names, Route 53 Resolver performs recursive lookups against public name servers on the internet.

Route53 - Cheatsheet

Update the given DNS record(s)

aws route53 change-resource-record-sets \
--hosted-zone-id <hosted-zone-id> \
--change-batch \
'{
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "<old-DNS-name>",
        "Type": "CNAME",
        "TTL": 300,
        "ResourceRecords": [
          {
            "Value": "<new-DNS-name>"
          }
        ]
      }
    }
  ]
}'

Get the key-signing keys (KSKs) public key and DS record of your parent hosted zone

# Reference: https://repost.aws/knowledge-center/route-53-configure-dnssec-domain
aws route53 get-dnssec --hosted-zone-id <hosted-zone-id>

CloudWatch

CloudWatch Events (Amazon EventBridge)

  • Rule

    • Event Source

      • Timing
        • Event Pattern
        • Schedule
      • Supported services
    • Target

      • A variety of AWS services
  • AWS service events are free

  • Custom events (PutEvents actions) may incur additional charges.

  • EventBridge

    • supports a lot more targets, meaning you can integrate between a wider variety of services
    • Its cross-account delivery capability further amplifies its reach. It’s easy to distribute events to Kinesis, Step Functions, and many other services running in another AWS account.
    • supports native AWS events as well as third-party partner events.
    • supports content-based filtering.
    • supports input transformation.
    • has built-in schema discovery capabilities.

CloudWatch Metrics

  • Metrics are Region based.

  • Default CloudWatch metrics (opens in a new tab)

  • Namespace

  • Dimension

    • A dimension is a unique identifier of metrics, such as instanceID.
    • Up to 10 dimensions per metric, and each dimension is defined by a name and value pair.
  • Custom Metrics (opens in a new tab)

    • Can only be published to CloudWatch using the AWS CLI or an API.
    • Use PutMetricData API action programmatically
  • Metric Math (opens in a new tab)

    • Enables you to query multiple CloudWatch metrics and use math expressions to create new time series based on these metrics.
  • Resolution

    • Predefined Metrics produced by AWS services are standard resolution.
    • When you publish a Custom Metric, you can define it as either standard resolution or high resolution.
    • Standard resolution: 1 minute granularity
    • High resolution: 1 second granularity
  • EC2 (opens in a new tab)

    • CloudWatch AWS/EC2 namespace (opens in a new tab)

      • These metrics are collected by CloudWatch Metrics under namespace AWS/EC2. 2 modes for metrics collection, basic monitoring or detailed monitoring.

        • Basic Monitoring

          • EC2 sends metric data to CloudWatch in 5-minute periods at no charge.
        • Detailed Monitoring

          • EC2 sends metric data to CloudWatch in 1-minute periods for an additional charge.

          • Enable detailed monitoring using AWS CLI

            aws ec2 monıtor-ınstances --ınstance-ıds <instance-IDs>

    • Metrics collected by the CloudWatch Agent (opens in a new tab)

      • For metrics not available under namespace AWS/EC2, they can be collected by CloudWatch Agent.
      • The collected metrics is available under namespace CWAgent in CloudWatch Metrics.
      • CloudWatch Agent also can collect logs.
  • List AWS services publishing CloudWatch Metrics (opens in a new tab)

    • aws cloudwatch list-metrics [--namespace <namespace>] [--metric-name <metric-name>]

CloudWatch Alarms

  • Metric

    • An Alarm watches a single metric over a specified time period, and performs one or more specified actions, based on the value of the metric relative to a threshold over time.

    • A value of the metric is a data point.

    • Period for AWS Metrics cannot be lower than 1 minute.

    • Alarm on High Resolution Custom Metrics

      Alarm PeriodMetrics Standard Resolution (60 Seconds)Metrics High Resolution (1 Second)
      10 Seconds✅ (additional charge)
      30 Seconds✅ (additional charge)
      60 Seconds
  • Evaluation (opens in a new tab)

    • Period

      The length of time in seconds to evaluate the metric or expression to create each individual data point for an alarm

    • Evaluation Periods

      The number of the most recent periods, or data points, to evaluate when determining alarm state.

    • Data points to alarm

      Define the number of data points within the evaluation period that must be breaching to cause the alarm to go to ALARM state.

  • Action

    • a notification sent to a SNS topic
    • Auto Scaling actions
    • EC2 actions (only applicable to EC2 Per-Instance Metrics)
  • States

    • ALARM

      The metric is within the defined threshold

    • INSUFFICIENT

      The metric is beyond the defined threshold

    • OK

      The alarm has only just been configured, the metric is unavailable, or we do not have sufficient data for the metric to determine the alarm state.

CloudWatch Logs

CloudWatch - Logs Insights

CloudWatch - Application Signals

CloudWatch - Application Signals - Synthetics Canaries

  • Synthetic monitoring works by issuing automated, simulated transactions from a robot client to your application in order to mimic what a typical user might do.
  • Based on Puppeteer

CloudWatch - Cheatsheet

List all metrics

  • aws cloudwatch list-metrics

List all metrics of a namespace

  • aws cloudwatch list-metrics --namespace <namespace>

    e.g. aws cloudwatch list-metrics --namespace "AWS/Route53"

CloudTrail

  • Trail
    • Applies to all Regions, recording events in all Regions
    • Applies to one Region, recording events in that Region only
    • Organization trail (opens in a new tab)
      • If you have created an Organization, you can also create a trail that will log all events for all AWS accounts in that Organization.
      • Organization trails can apply to all Regions or one Region.
      • Organization trails must be created in the management account.
      • Member accounts will be able to see the Organization trail, but cannot modify or delete it.
      • By default, member accounts will not have access to the log files for the Organization trail in the S3 bucket.
  • Events (opens in a new tab)
    • Management events
    • Data events (additional charges apply)
    • CloudTrail Insights events

CloudTrail - Data Events

  • High-volume activities and include operations such as S3 object level API operations and Lambda function invoke API.

CloudTrail - CloudTrail Lake

CloudTrail Lake (opens in a new tab)

  • Converts existing events in row-based JSON format to ORC format

X-Ray

  • A distributed tracing solution, especially for apps built using a microservices architecture

  • Segment

    • At a minimum, a segment records the name, ID, start time, trace ID, and end time of the request.
    • A segment document can be up to 64 KB and contain a whole segment with subsegments, a fragment of a segment that indicates that a request is in progress, or a single subsegment that is sent separately. You can send segment documents directly to X-Ray by using the PutTraceSegments API.
    • When you instrument your application with the X-Ray SDK, the SDK generates segment documents for you. Instead of sending segment documents directly to X-Ray, the SDK transmits them over a local UDP port to the X-Ray daemon.
  • Subsegment

    • Subsegment provides more granular timing information and details about downstream calls that your app made to fulfill the original request.

    • Subsegments can contain other subsegments, so a custom subsegment that records metadata about an internal function call can contain other custom subsegments and subsegments for downstream calls.

    • A subsegment records a downstream call from the point of view of the service that calls it.

    • Field

      • namespace - aws for AWS SDK calls; remote for other downstream calls.
  • Service Graph is a flow chart visualization of average response for microservices and to visually pinpoint failure.

  • Trace collects all Segments generated by a single request so you can track the path of requests through multiple services.

    • Trace ID in HTTP header (Tracing header) is named X-Amzn-Trace-Id.
  • Sampling is an algorithm that decides which requests should be traced. By default, X-Ray records the first request each second and 5% of any additional requests.

  • Annotations

    • Use Annotations (opens in a new tab) to record information on Segments or Subsegments that you want indexed for search.
    • Annotations support 3 data types: String, Number and Boolean.
    • Keys must be alphanumeric in order to work with filters. Underscore is allowed. Other symbols and whitespace are forbidden and ignored.
    • X-Ray indexes up to 50 annotations per trace.
  • Use Metadata to record data you want to store in the trace but don't need to use for searching traces.

  • Daemon

    • X-Ray deamon gathers raw segment data, and relays it to the X-Ray API

    • The daemon works in conjunction with the X-Ray SDKs and must be running so that data sent by the SDKs can reach the X-Ray service.

    • By default listens on UDP port 2000

    • -r, --role-arn: Assume the specified IAM role to upload segments to a different account.

    • ECS

      create a Docker image that runs the X-Ray daemon, upload it to a Docker image repository, and then deploy it to your ECS cluster.

  • Environment variables (opens in a new tab)

  • Instrumentation (opens in a new tab)

    • Automatic
    • Manual

KMS

  • Multi-tenant key store management service operated by AWS.

  • KMS can use its own hardware security modules (HSMs) or a customer managed CloudHSM key store.

  • Region specific, a key that is created in one region can't be used in another region

  • KMS centrally stores and manages the encryption keys called KMS Key, and KMS Keys are stored in plain text, by default is symmetric.

  • Encrypt, Decrypt and ReEncrypt API actions are designed to encrypt and decrypt data keys, as they use KMS Key and can only encrypt up to 4 KB data.

  • Data over 4 KB can only be encrypted with Envelope Encryption using a data key.

  • Types of KMS Key

    DescriptionCustomer-managedAWS-managedAWS-owned
    Key creationcustomerAWS on behalf of customerAWS
    Key usageCustomer can control key usage through the KMS and IAM policycan be used only with specific AWS services where KMS is supportedimplicitly used by AWS to protect customer data; customer can't explicitly use it
    Key rotationmanually configured by customerrotated automatically once a yearrotated automatically by AWS without any explicit mention of the rotation schedule
    Key deletioncan be deletedcan't be deletedcan't be deleted
    User accesscontrolled by the IAM policycontrolled by the IAM policycan't be accessed by users
    Key access policymanaged by customermanaged by AWSN/A
  • Encryption options in KMS

    • AWS managed keys

      • Encryption Method (AWS managed)
      • Keys Storage (AWS managed)
      • Keys Management (AWS managed)
    • Customer managed keys

      • Encryption Method (Customer managed)
      • Keys Storage (AWS managed, CloudHSM)
      • Keys Management (Customer managed)
    • Custom key stores

      • Encryption Method (Customer managed)
      • Keys Storage (Customer managed)
      • Keys Management (Customer managed)
  • API

    • Encrypt (opens in a new tab)

      Encrypts plaintext into ciphertext by using a KMS CMK.

    • Decrypt (opens in a new tab)

      Decrypts ciphertext that was encrypted by a KMS CMK.

    • GenerateDataKey (opens in a new tab)

      • Generates a unique symmetric data key for client-side encryption, including a plaintext copy of the data key and a copy that is encrypted under a CMK that you specify.

      • To encrypt data outside of KMS:

        • Use the GenerateDataKey operation to get a data key.
        • Use the plaintext data key (in the Plaintext field of the response) to encrypt your data outside of KMS (Using any 3rd party cryptography library)
        • Erase the plaintext data key from memory.
        • Store the encrypted data key (in the CiphertextBlob field of the response) with the encrypted data.
      • To decrypt data outside of KMS:

        • Use the Decrypt operation to decrypt the encrypted data key. The operation returns a plaintext copy of the data key.
        • Use the plaintext data key to decrypt data outside of KMS.
        • Erase the plaintext data key from memory.
    • GenerateDataKeyWithoutPlaintext (opens in a new tab)

      The same result as GenerateDataKey, only without the plaintext copy of the data key.

  • Symmetric and asymmetric CMKs (opens in a new tab)

    • All AWS services that encrypt data on your behalf require a symmetric CMK.

    • Symmetric key

      • Encrypt / Decrypt
    • Asymetric key

      • Encrypt / Decrypt
      • Sign / Verify
      • Doesn't support automatic key rotation
      • The standard asymmetric encryption algorithms that KMS uses do not support an encryption context.

KMS - Cross account access

Allowing users in other accounts to use a KMS key (opens in a new tab)

  • Cross-account access requires permission in the key policy of the KMS key and in an IAM policy in the external user's account.

    • Add a key policy statement in the local account
    • Add IAM policies in the external account
  • Cross-account permission is effective only for certain API operations

CloudHSM

AWS Config

  • By default, the configuration recorder records all supported resources in the Region where AWS Config is running.
  • AWS Config Rules (opens in a new tab)
    • AWS Config Rules represent your ideal configuration settings. AWS Config continuously tracks the configuration changes. Any resource violating a rule will be flagged as non-compliant.
  • Costs
    • You are charged service usage fees when AWS Config starts recording configurations.
    • To control costs, you can stop recording by stopping the configuration recorder. After you stop recording, you can continue to access the configuration information that was already recorded. You will not be charged AWS Config usage fees until you resume recording.

Secrets Manager

  • Automatic secrets rotation without disrupting applications

Service Catalog

Systems Manager (formerly SSM)

Automation (opens in a new tab)

  • Automation helps you to build automated solutions to deploy, configure, and manage AWS resources at scale.

Parameter Store (opens in a new tab)

  • Centralized configuration data management and secrets management
  • You can store values as plain text (String) or encrypted data (SecureString).
  • For auditing and logging, CloudTrail captures Parameter Store API calls.
  • Parameter Store uses KMS CMKs (opens in a new tab) to encrypt and decrypt the parameter values of SecureString parameters when you create or change them.
  • You can use the AWS managed CMK that Parameter Store creates for your account or specify your own customer managed CMK.

Parameter Store - Cheatsheet

Search for a parameter with name containing the given keyword
local keyword=<keyword>
aws ssm describe-parameters --parameter-filters "Key=Name,Option=Contains,Values=$keyword" \
--query 'sort_by(Parameters,&Name)[]' --output table

CloudFormation

  • Template (opens in a new tab)

    • Use a JSON or YAML file called Template to specify a declarative, static definition of AWS service stack.

    • The Template file must be uploaded to S3 before being used.

    • Parameters

      • Parameter Type (opens in a new tab)
      • You use the Ref intrinsic function to reference a Parameter, and AWS CloudFormation uses the Parameter's value to provision the stack. You can reference Parameter from the Resources and Outputs sections of the same template.
      • Pseudo parameters
        • Pseudo parameters are Parameters that are predefined by CloudFormation.
        • Use them the same way as you would a Parameter, as the argument for the Ref function.
        • Their names start with AWS:: such as AWS::Region.
    • Resources

      • The only mandatory section
    • Conditions

      • The optional Conditions section contains statements that define the circumstances under which entities are created or configured.
      • Other sections such as Resource and Output can reference the conditions defined in Condition section.
      • Use Condition function (opens in a new tab) to define conditions.
    • Mappings

      • The optional Mappings section matches a key to a corresponding set of named values, essentially a Map using String as key.
      • Fn::FindInMap
        • !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]
    • Outputs

      • To share information between stacks, export a stack's output values. Other stacks that are in the same AWS account and Region can import the exported values.
      • To export a stack's output value, use the Export field in the Output section of the stack's template. To import those values, use the Fn::ImportValue function in the template for the other stacks.
      • Exported output names must be unique within your Region.
    • Intrinsic function (opens in a new tab)

      • Fn::Ref (opens in a new tab)
        • The intrinsic function Ref returns the value of the specified Parameter or Resource.
        • When you Ref the logical ID of another Resource in your template, Ref returns what you could consider as a default attribute for that type of Resource. So using Ref for an EC2 instance will return the instance ID, Ref an S3 bucket, it will return the bucket name.
      • Fn::GetAtt (opens in a new tab): The Fn::GetAtt intrinsic function returns the value of an attribute from a resource in the template.
      • Fn::FindInMap (opens in a new tab): The intrinsic function Fn::FindInMap returns the value corresponding to keys in a two-level map that is declared in the Mappings section.
      • Fn::ImportValue (opens in a new tab): The intrinsic function Fn::ImportValue returns the value of an output exported by another stack. You typically use this function to create cross-stack references.
      • Fn::Join (opens in a new tab): The intrinsic function Fn::Join appends a set of values into a single value, separated by the specified delimiter. If a delimiter is the empty string, the set of values are concatenated with no delimiter.
      • Fn::Sub (opens in a new tab): The intrinsic function Fn::Sub substitutes variables in an input string with values that you specify.
    • Helper scripts (opens in a new tab)

      • CloudFormation provides Python helper scripts that you can use to install software and start services on an EC2 instance that you create as part of your stack.
  • Stack (opens in a new tab)

    • Change set (opens in a new tab)

      • Change sets allow you to preview how proposed changes to a stack might impact your running resources.
      • Similar to a diff to the stack.
  • StackSet (opens in a new tab)

    • StackSets extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and regions with a single operation.
  • CLI

    • package (opens in a new tab)

      • This command is only needed when there is local artifacts.
      • The command performs the following tasks:
        • Packages the local artifacts (local paths) that your CloudFormation template references.
        • Uploads local artifacts, such as source code for an Lambda function or a Swagger file for an API Gateway REST API, to an S3 bucket. Note it is the local artifacts being uploaded, not the template.
        • Returns a copy of your template, replacing references to local artifacts with the S3 location where the command uploaded the local artifacts.
    • deploy (opens in a new tab)

      Deploys the specified CloudFormation template by creating and then executing a change set.

  • Resources

SQS (Simple Queue Service)

  • A queue from which consumers pull data pushed by producers.

  • Messages more than 256 KB (opens in a new tab) must be sent with the SQS Extended Client Library for Java, which uses S3 for message storage, supporting payload size up to 2 GB.

  • Number of messages (up to 10) can be specified before retrieving.

  • SQS message retention period ranges from 1 minute to 14 days, by default 4 days.

  • Visibility timeout (opens in a new tab)

    • After a message is polled by a consumer, it becomes invisible to other consumers.
    • Message visibility timeout is the time for consumer to process the message, and it is 30 seconds by default.
    • If not deleted within the visibility timeout window, the message will become visible to other consumers again.
    • ChangeMessageVisibility action can be used to prolong visibility timeout window.
    • If visibility timeout is too high, and consumer crashes meanwhile, reprocessing will take time.
    • If visibility timeout is too low, consumers may get duplicate messages.
  • Delivery delay (opens in a new tab)

    • Delay happens before message being consumed.
    • If you create a delay queue, any messages that you send to the queue remain invisible to consumers for the duration of the delay period. The default (minimum) delay for a queue is 0 seconds. The maximum is 15 minutes.
  • Polling (opens in a new tab)

    • SQS provides short polling and long polling to receive messages from a queue. By default, queues use short polling.
    • Long polling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application.
    • Long polling is preferable to short polling.
    • Long polling can have a wait time from 1 to 20 second.
  • Queue type

    • Standard queues

      • Default queue type
      • Almost unlimited throughput, up to 120000 in-flight messages
      • at-least-once message delivery, requiring manual deduplication
      • Out-of-order message delivery
    • FIFO queue

      • Throughput: 3000 messages / second, up to 20000 in-flight messages
      • Queue name must end with .fifo.
      • exactly-once message delivery
      • Message ordering via message grouping
        • Ordering across groups is not guaranteed.
        • Messages that share a common message group ID will be in order within the group.
      • Deduplication
        • If you retry the SendMessage action within the 5-minute deduplication interval, SQS doesn't introduce any duplicates into the queue.
        • If a message with a particular message deduplication ID is sent successfully, any messages sent with the same message deduplication ID are accepted successfully but aren't delivered during the 5-minute deduplication interval.
        • If your application sends messages with unique message bodies, you can enable content-based deduplication.
      • Cannot subscribe to a SNS topic
  • Dead-letter queue (DLQ)

    • The DLQ of a FIFO queue must also be a FIFO queue.
    • The DLQ of a standard queue must also be a standard queue.
    • The DLQ and its corresponding queue must be in the same region and created by the same AWS account.
    • Redrive policy
      • Redrive policy specifies the source queue, the DLQ, and the conditions under which SQS moves messages from the former to the latter if the consumer of the source queue fails to process a message a specified number of times.
      • As long as a consumer starts polling, the message Receive count will increment by 1 no matter the processing is successful or not, therefore Receive count is essentially receive attempt count.
      • If a message Receive count is more than the specified Maximum receives, the message will be sent to the specified DLQ.
      • SQS counts a message you view in the AWS Management Console against the queue’s redrive policy, because every attempt to view a message in the queue requires Poll for messages, and that will increment Receive count.
  • Resources

SNS

  • Max message size: 256 KB, extended client library supporting 2 GB.

SNS - Topic

  • A Topic allows multiple receivers of the message to subscribe dynamically for identical copies of the same notification.
  • By default, SNS offers 10 million subscriptions per Topic and 100,000 Topics per account.

SNS - Subscription

  • A subscriber receives messages that are published only after they have subscribed to the Topic. The Topics do not buffer messages.

  • When several SQSs act as a subscriber, a publisher sends a message to an SNS topic and it distributes this topic to many SQS queues in parallel. This concept is called fanout.

Cognito

API Gateway

  • REST API

    • Stage variables

      • A stage is a named reference to a deployment, which is a snapshot of the API.
      • Stage variables are name-value pairs that you can define as configuration attributes associated with a deployment stage of a REST API. They act like environment variables and can be used in your API setup and mapping templates.
      • A stage variable can be used anywhere in a mapping template: ${stageVariables.<variable_name>}
    • Integration type (opens in a new tab)

      • AWS (Lambda custom integration)

        expose AWS service actions, must configure both the integration request and integration response.

      • AWS_PROXY (Lambda proxy integration)

        • This is the preferred integration type to call a Lambda function through API Gateway and is not applicable to any other AWS service actions, including Lambda actions other than the function-invoking action.

        • In Lambda proxy integration (opens in a new tab), API Gateway requires the backend Lambda function to return output according to the following JSON format.

          {
              "isBase64Encoded": true|false,
              "statusCode": httpStatusCode,
              "headers": { "headerName": "headerValue", ... },
              "multiValueHeaders": { "headerName": ["headerValue", "headerValue2", ...], ... },
              "body": "..."
          }
      • HTTP

        expose HTTP endpoints in the backend, must configure both the integration request and integration response.

      • HTTP_PROXY

        expose HTTP endpoints in the backend, but you do not configure the integration request or the `integration response.

      • MOCK

        API Gateway return a response without sending the request further to the backend, useful for testing integration set up.

    • Quota (opens in a new tab)

      • Integration timeout: 50 milliseconds to 29 seconds for all integration types.
    • API Gateway responses (opens in a new tab)

      • 502 Bad Gateway

        • Usually an incompatible output returned from a Lambda proxy integration backend
        • Occasionally for out-of-order invocations due to heavy loads.
      • 504 INTEGRATION_TIMEOUT

      • 504 INTEGRATION_FAILURE

  • Canary release (opens in a new tab)

    Total API traffic is separated at random into a production release and a canary release with a pre-configured ratio.

  • Mapping template

    • A script expressed in Velocity Template Language (VTL) and applied to the payload using JSONPath expressions to perform data transformation.
  • API cache

    • API Gateway caches responses from your endpoint for a specified TTL period, in seconds.
    • Default TTL is 300 seconds, and TTL=0 means caching is disabled.
    • Client can invalidate an API Gateway cache entry by specifying Cache-Control: max-age=0 header, and authorization can be enabled to ignore unauthorized requests.
  • Throttling

    • Server-side throttling limits are applied across all clients.
    • Per-client throttling limits are applied to clients that use API keys associated with your usage plan as client identifier.
  • Usage plan

    • Uses API keys to identify API clients and meters access to the associated API stages for each key.

    • Configure throttling limits and quota limits that are enforced on individual client API keys.

    • Throttling

      • Rate

        • Number of requests per second that can be served
        • The rate is evenly distributed across given time period.
      • Burst

        • Maximum number of concurrent request submissions that API Gateway can fulfill at any moment without returning 429 Too Many Requests error responses
        • Burst essentially means the maxium number of requests that can be queued for processing. Once Burst is exceeded, request will be dropped.
      • As an analogy, imagine you are in a bank branch waiting to be served, Rate is the number of customers that are being served at that same time. Burst is the number of customers that can wait in a queue in the branch lobby. How long the queue can be is limited by the lobby space. Therefore if there are more customers not able to queue in the lobby, they must wait outside or choose another time to come to the branch.

  • Security

    • IAM (opens in a new tab)

    • Cognito user pool (opens in a new tab)

      • Authentication: Cognito user pool
      • Authorization: API Gateway methods
      • Seamless integration, no custom code needed
    • Lambda authorizer (opens in a new tab)

      • Authentication: 3rd-party (invoked by Lambda authorizer)

      • Authorization: Lambda function

      • Authorizer type

        • TOKEN authorizer

          Token-based Lambda authorizer receives the caller's identity in a bearer token, such as a JWT or an OAuth token.

        • REQUEST authorizer

          Request parameter-based Lambda authorizer receives the caller's identity in a combination of headers, query string parameters, stageVariables, and $context variables. WebSocket only supports REQUEST authorizer.

  • Metrics (opens in a new tab)

    • 4XXError

      number of client-side errors captured in a given period

    • 5XXError

      number of server-side errors captured in a given period

    • Count

      total number of API requests in a given period

    • IntegrationLatency

      the responsiveness of the backend

    • Latency

      the overall responsiveness of your API calls

    • CacheHitCount & CacheMissCount

      optimize cache capacities to achieve a desired performance.

  • CORS

    • To enable CORS support, you may or may not need to implement the CORS preflight response depending on the situation.

      • Lambda or HTTP non-proxy integrations and AWS service integrations

        Manual adding CORS response headers could be needed

      • Lambda or HTTP proxy integrations

        Manual adding CORS response headers is required

  • Resources

SAM

  • The declaration Transform: AWS::Serverless-2016-10-31 is required for SAM template files.

  • Globals section is unique to SAM templates.

  • Resource type

    • AWS::Serverless::Api
      • API Gateway
    • AWS::Serverless::Application
      • Embeds a serverless application
    • AWS::Serverless::Function
      • Lambda function
    • AWS::Serverless::HttpApi
      • API Gateway HTTP API
    • AWS::Serverless::LayerVersion
      • Creates a Lambda LayerVersion that contains library or runtime code needed by a Lambda Function.
    • AWS::Serverless::SimpleTable
      • a DynamoDB table with a single attribute primary key.
    • AWS::Serverless::StateMachine
      • an Step Functions state machine
  • Installation

  • Notes

    • Use SAM CLI for local Lambda function development. (sam local invoke)
    • Don't use SAM CLI for deployment as it creates additional resources.
    • Use CloudFormation for unified deployment and provisioning.
    • Use container image for deployment but not for local development as it's slow to build image, IntelliJ also does not support debugging Lambda function packaged as an image.
  • Resources

CDK (Cloud Development Kit)

  • Assets

    Assets are local files, directories, or Docker images that can be bundled into AWS CDK libraries and apps; eg: a directory that contains the handler code for an AWS Lambda function. Assets can represent any artifact that the app needs to operate.

  • Bootstrapping

    • Deploying AWS CDK apps into an AWS environment (a combination of an AWS account and region) may require that you provision resources the AWS CDK needs to perform the deployment. These resources include an S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments. The process of provisioning these initial resources is called bootstrapping.
    • cdk bootstrap aws://<Account-ID>/<Region>

Billing and Cost Management

Savings Plans (opens in a new tab)

  • Types

    • Compute
    • EC2 Instance
    • SageMaker
  • Pricing

    • No upfront
    • Partial upfront
    • All upfront

Code Samples

Java project scaffolding

  • Maven

    mvn -B archetype:generate \
        -DarchetypeGroupId=software.amazon.awssdk \
        -DarchetypeArtifactId=archetype-lambda \
        -Dservice=s3 \
        -Dregion=US_EAST_1 \
        -DgroupId=cq.aws \
        -DartifactId=playground-aws

Best Practices

  • Tagging (opens in a new tab)

    • Both keys and values are case sensitive.
    • Using Tags to index resources, so they can be found easily.
    • Typical tags
      • Name
      • Project
      • Environment
      • Version
      • Owner

Resources