Advanced Guide to AWS Fault Injection Service Actions

by Justin Cook

One of the things many of my clients struggle with is adapting to the action parameters and the required IAM permissions needed to build out Resilence Hub. This guide will help you break down the list commands and the CLI commands required to set up FIS.

aws:fis:inject-api-internal-error

Injects Internal Errors into requests made by the the target IAM role.

Resource type

  • aws:iam:role

Parameters

  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • service โ€“ The target AWS API namespace. The supported value is ec2.
  • percentage โ€“ The percentage (1-100) of calls to inject the fault into.
  • operations โ€“ The operations to inject the fault into, separated using commas. For a list of the API actions for the ec2 namespace, see Actions in the Amazon EC2 API Reference.

Permissions

  • fis:InjectApiInternalError

aws:fis:inject-api-throttle-error

Injects throttling errors into requests made by the target IAM role.

Resource type

  • aws:iam:role

Parameters

  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • service โ€“ The target AWS API namespace. The supported value is ec2.
  • percentage โ€“ The percentage (1-100) of calls to inject the fault into.
  • operations โ€“ The operations to inject the fault into, separated using commas. For a list of the API actions for the ec2 namespace, see Actions in the Amazon EC2 API Reference.

Permissions

  • fis:InjectApiThrottleError

aws:fis:inject-api-unavailable-error

Injects Unavailable errors into requests made by the target IAM role.

Resource type

  • aws:iam:role

Parameters

  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • service โ€“ The target AWS API namespace. The supported value is ec2.
  • percentage โ€“ The percentage (1-100) of calls to inject the fault into.
  • operations โ€“ The operations to inject the fault into, separated using commas. For a list of the API actions for the ec2 namespace, see Actions in the Amazon EC2 API Reference.

Permissions

  • fis:InjectApiUnavailableError

Wait action

AWS FIS supports the following wait action.

aws:fis:wait

Runs the AWS FIS wait action.

Parameters

  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • None

Amazon CloudWatch actions

AWS FIS supports the following Amazon CloudWatch action.

aws:cloudwatch:assert-alarm-state

Verifies that the specified alarms are in one of the specified alarm states.

Resource type

  • None

Parameters

  • alarmArns โ€“ The ARNs of the alarms, separated by commas. You can specify up to five alarms.
  • alarmStates โ€“ The alarm states, separated by commas. The possible alarm states are OK, ALARM, and INSUFFICIENT_DATA.

Permissions

  • cloudwatch:DescribeAlarms

Amazon DynamoDB actions

AWS FIS supports the following Amazon DynamoDB action.

aws:dynamodb:global-table-pause-replication

Pauses Amazon DynamoDB global table replication to any replica table. Tables may continue to be replicated for up to 5 minutes after action begins.

The following statement will be dynamically appended to the policy for the target DynamoDB global table:

{

   “Statement”:[

      {

         “Sid”: “DoNotModifyFisDynamoDbPauseReplicationEXPxxxxxxxxxxxxxxx”

         “Effect”:”Deny”,

         “Principal”:{

            “AWS”:”arn:aws:iam::123456789012:role/aws-service-role/replication.dynamodb.amazonaws.com/AWSServiceRoleForDynamoDBReplication”

         },

         “Action”:[

            “dynamodb:GetItem”,

            “dynamodb:PutItem”,

            “dynamodb:UpdateItem”,

            “dynamodb:DeleteItem”,

            “dynamodb:DescribeTable”,

            “dynamodb:UpdateTable”,

            “dynamodb:Scan”,

            “dynamodb:DescribeTimeToLive”,

            “dynamodb:UpdateTimeToLive”

         ],

         “Resource”:”arn:aws:dynamodb:us-east-1:123456789012:table/ExampleGlobalTable”,

         “Condition”: {

            “DateLessThan”: {

            “aws:CurrentTime”: “2024-04-10T09:51:41.511Z”

         }

       }

      }

   ]

}

The following statement will be dynamically appended to the policy for stream for the target DynamoDB global table:

{

   “Statement”:[

      {

         “Sid”: “DoNotModifyFisDynamoDbPauseReplicationEXPxxxxxxxxxxxxxxx”

         “Effect”:”Deny”,

         “Principal”:{

            “AWS”:”arn:aws:iam::123456789012:role/aws-service-role/replication.dynamodb.amazonaws.com/AWSServiceRoleForDynamoDBReplication”

         },

         “Action”:[

            “dynamodb:GetRecords”,

            “dynamodb:DescribeStream”,

            “dynamodb:GetShardIterator”

         ],

         “Resource”:”arn:aws:dynamodb:us-east-1:123456789012:table/ExampleGlobalTable/stream/2023-08-31T09:50:24.025″,

         “Condition”: {

            “DateLessThan”: {

            “aws:CurrentTime”: “2024-04-10T09:51:41.511Z”

         }

      }

   ]

}

If a target table or stream does not have any attached resource polices, a resource policy is created for the duration of the experiment, and automatically deleted when the experiment ends. Otherwise, the fault statement is inserted into an existing policy, without any additional modifications to the existing policy statements. The fault statement is then removed from the policy at the end of the experiment.

Resource type

  • aws:dynamodb:global-table

Parameters

  • duration โ€“ In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • dynamodb:PutResourcePolicy
  • dynamodb:DeleteResourcePolicy
  • dynamodb:GetResourcePolicy
  • dynamodb:DescribeTable
  • tag:GetResources

Amazon EBS actions

AWS FIS supports the following Amazon EBS action.

aws:ebs:pause-volume-io

Pauses I/O operations on target EBS volumes. The target volumes must be in the same Availability Zone and must be attached to instances built on the Nitro System. The volumes can’t be attached to instances on an Outpost.

To initiate the experiment using the Amazon EC2 console, see Fault testing on Amazon EBS in the Amazon EC2 User Guide.

Resource type

  • aws:ec2:ebs-volume

Parameters

  • duration โ€“ The duration, from one second to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute, PT5S represents five seconds, and PT6H represents six hours. In the AWS FIS console, you enter the number of seconds, minutes, or hours. If the duration is small, such as PT5S, the I/O is paused for the specified duration, but it might take longer for the experiment to complete due to the time it takes to initialize the experiment.

Permissions

  • ec2:DescribeVolumes
  • ec2:PauseVolumeIO
  • tag:GetResources

Amazon EC2 actions

AWS FIS supports the following Amazon EC2 actions.

Actions

AWS FIS also supports fault injection actions through the AWS Systems Manager SSM Agent. Systems Manager uses an SSM document that defines actions to perform on EC2 instances. You can use your own document to inject custom faults, or you can use pre-configured SSM documents. For more information, see Use Systems Manager SSM documents with AWS FIS.

aws:ec2:api-insufficient-instance-capacity-error

Injects InsufficientInstanceCapacity error responses on requests made by the target IAM roles. Supported operations are RunInstances, CreateCapacityReservation, StartInstances, CreateFleet calls. Requests that include capacity asks in multiple Availability Zones are not supported. This action doesn’t support defining targets using resource tags, filters, or parameters.

Resource type

  • aws:iam:role

Parameters

  • duration โ€“ In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • availabilityzoneIdentifiers โ€“ The comma separated list of Availability Zones. Supports Zone IDs (e.g. “use1-az1, use1-az2”) and Zone names (e.g. “us-east-1a”).
  • percentage โ€“ The percentage (1-100) of calls to inject the fault into.

Permissions

  • ec2:InjectApiErrorwith condition key ec2:FisActionId value set to aws:ec2:api-insufficient-instance-capacity-error and ec2:FisTargetArns condition key set to target IAM roles.

For an example policy, see Example: Use condition keys for ec2:InjectApiError.

aws:ec2:asg-insufficient-instance-capacity-error

Injects InsufficientInstanceCapacity error responses on requests made by the target Auto Scaling groups. This action only supports Auto Scaling groups using launch templates. To learn more about insufficient instance capacity errors, see the Amazon EC2 user guide.

Resource type

  • aws:ec2:autoscaling-group

Parameters

  • duration โ€“ In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • availabilityzoneIdentifiers โ€“ The comma separated list of Availability Zones. Supports Zone IDs (e.g. “use1-az1, use1-az2”) and Zone names (e.g. “us-east-1a”).
  • percentage โ€“ Optional. The percentage (1-100) of the target Auto Scaling group’s launch requests to inject the fault. The default is 100.

Permissions

  • ec2:InjectApiErrorwith condition key ec2:FisActionId value set to aws:ec2:asg-insufficient-instance-capacity-error and ec2:FisTargetArns condition key set to target Auto Scaling groups.
  • autoscaling:DescribeAutoScalingGroups

For an example policy, see Example: Use condition keys for ec2:InjectApiError.

aws:ec2:reboot-instances

Runs the Amazon EC2 API action RebootInstances on the target EC2 instances.

Resource type

  • aws:ec2:instance

Parameters

  • None

Permissions

  • ec2:RebootInstances
  • ec2:DescribeInstances

AWS managed policy

aws:ec2:send-spot-instance-interruptions

Interrupts the target Spot Instances. Sends a Spot Instance interruption noticeto target Spot Instances two minutes before interrupting them. The interruption time is determined by the specified durationBeforeInterruption parameter. Two minutes after the interruption time, the Spot Instances are terminated or stopped, depending on their interruption behavior. A Spot Instance that was stopped by AWS FIS remains stopped until you restart it.

Immediately after the action is initiated, the target instance receives an EC2 instance rebalance recommendation. If you specifieddurationBeforeInterruption, there could be a delay between the rebalance recommendation and the interruption notice.

For more information, see Tutorial: Test Spot Instance interruptions using AWS FIS. Alternatively, to initiate the experiment by using the Amazon EC2 console, see Initiate a Spot Instance interruption in the Amazon EC2 User Guide.

Resource type

  • aws:ec2:spot-instance

Parameters

  • durationBeforeInterruption โ€“ The time to wait before interrupting the instance, from 2 to 15 minutes. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT2M represents two minutes. In the AWS FIS console, you enter the number of minutes.

Permissions

  • ec2:SendSpotInstanceInterruptions
  • ec2:DescribeInstances

AWS managed policy

aws:ec2:stop-instances

Runs the Amazon EC2 API action StopInstances on the target EC2 instances.

Resource type

  • aws:ec2:instance

Parameters

  • startInstancesAfterDuration โ€“ Optional. The time to wait before starting the instance, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours. If the instance has an encrypted EBS volume, you must grant AWS FIS permission to the KMS key used to encrypt the volume, or add the experiment role to the KMS key policy.
  • completeIfInstancesTerminated โ€“ Optional. If true, and if startInstancesAfterDuration is also true, this action will not fail when targeted EC2 instances have been terminated by a separate request outside of FIS and cannot be restarted. For example, Auto Scaling groups may terminate stopped EC2 instances under their control before this action completes. The default is false.

Permissions

  • ec2:StopInstances
  • ec2:StartInstances
  • ec2:DescribeInstances โ€“ Optional. Required withcompleteIfInstancesTerminated to validate instance state at end of action.
  • kms:CreateGrant โ€“ Optional. Required withstartInstancesAfterDuration to restart instances with encrypted volumes.

AWS managed policy

aws:ec2:terminate-instances

Runs the Amazon EC2 API action TerminateInstances on the target EC2 instances.

Resource type

  • aws:ec2:instance

Parameters

  • None

Permissions

  • ec2:TerminateInstances
  • ec2:DescribeInstances

AWS managed policy

Amazon ECS actions

AWS FIS supports the following Amazon ECS actions.

Actions

aws:ecs:drain-container-instances

Runs the Amazon ECS API action UpdateContainerInstancesState to drain the specified percentage of underlying Amazon EC2 instances on the target clusters.

Resource type

  • aws:ecs:cluster

Parameters

  • drainagePercentage โ€“ The percentage (1-100).
  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • ecs:DescribeClusters
  • ecs:UpdateContainerInstancesState
  • ecs:ListContainerInstances
  • tag:GetResources

AWS managed policy

aws:ecs:stop-task

Runs the Amazon ECS API action StopTask to stop the target task.

Resource type

  • aws:ecs:task

Parameters

  • None

Permissions

  • ecs:DescribeTasks
  • ecs:ListTasks
  • ecs:StopTask
  • tag:GetResources

AWS managed policy

aws:ecs:task-cpu-stress

Runs CPU stress on the target tasks. Uses the AWSFIS-Run-CPU-Stress SSM document. The tasks must be managed by AWS Systems Manager. For more information, see Use the ECS task actions.

Resource type

  • aws:ecs:task

Parameters

  • duration โ€“ The duration of the stress test, in ISO 8601 format.
  • percent โ€“ Optional. The target load percentage, from 0 (no load) to 100 (full load). The default is 100.
  • workers โ€“ Optional. The number of stressors to use. The default is 0, which uses all stressors.
  • installDependencies โ€“ Optional. If this value is True, Systems Manager installs the required dependencies on the sidecar container for the SSM agent, if they are not already installed. The default is True. The dependency is stress-ng.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

aws:ecs:task-io-stress

Runs I/O stress on the target tasks. Uses the AWSFIS-Run-IO-Stress SSM document. The tasks must be managed by AWS Systems Manager. For more information, see Use the ECS task actions.

Resource type

  • aws:ecs:task

Parameters

  • duration โ€“ The duration of the stress test, in ISO 8601 format.
  • percent โ€“ Optional. The percentage of free space on the file system to use during the stress test. The default is 80%.
  • workers โ€“ Optional. The number of workers. Workers perform a mix of sequential, random, and memory-mapped read/write operations, forced synchronizing, and cache dropping. Multiple child processes perform different I/O operations on the same file. The default is 1.
  • installDependencies โ€“ Optional. If this value is True, Systems Manager installs the required dependencies on the sidecar container for the SSM agent, if they are not already installed. The default is True. The dependency is stress-ng.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

aws:ecs:task-kill-process

Stops the specified process in the tasks, using the killall command. Uses the AWSFIS-Run-Kill-Process SSM document. The task definition must have pidMode set to task. The tasks must be managed by AWS Systems Manager. For more information, see Use the ECS task actions.

Resource type

  • aws:ecs:task

Parameters

  • processName โ€“ The name of the process to stop.
  • signal โ€“ Optional. The signal to send along with the command. The possible values are SIGTERM (which the receiver can choose to ignore) andSIGKILL (which cannot be ignored). The default is SIGTERM.
  • installDependencies โ€“ Optional. If this value is True, Systems Manager installs the required dependencies on the sidecar container for the SSM agent, if they are not already installed. The default is True. The dependency is killall.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

aws:ecs:task-network-blackhole-port

Drops inbound or outbound traffic for the specified protocol and port. Uses the AWSFIS-Run-Network-Blackhole-Port SSM document. The task definition must have pidMode set to task. The tasks must be managed by AWS Systems Manager. You can’t set networkMode to bridge in the task definition. For more information, see Use the ECS task actions.

Resource type

  • aws:ecs:task

Parameters

  • duration โ€“ The duration of the test, in ISO 8601 format.
  • port โ€“ The port number.
  • trafficType โ€“ The type of traffic. The possible values are ingress and egress.
  • protocol โ€“ Optional. The protocol. The possible values are tcp and udp. The default is tcp.
  • installDependencies โ€“ Optional. If this value is True, Systems Manager installs the required dependencies on the sidecar container for the SSM agent, if they are not already installed. The default is True. The dependencies are atd, dig, and iptables.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

aws:ecs:task-network-latency

Adds latency and jitter to the network interface using the tc tool for traffic to or from specific sources. Uses the AWSFIS-Run-Network-Latency-Sources SSM document. The task definition must have pidMode set to task. The tasks must be managed by AWS Systems Manager. You can’t set networkMode to bridge in the task definition. For more information, see Use the ECS task actions.

Resource type

  • aws:ecs:task

Parameters

  • duration โ€“ The duration of the test, in ISO 8601 format.
  • interface โ€“ Optional. The network interface. The default is eth0.
  • delayMilliseconds โ€“ Optional. The delay, in milliseconds. The default is 200.
  • jitterMilliseconds โ€“ Optional. The jitter, in milliseconds. The default is 10.
  • sources โ€“ Optional. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name, DYNAMODB, andS3. If you specify DYNAMODB or S3, this applies only to the Regional endpoint in the current Region. The default is 0.0.0.0/0, which matches all IPv4 traffic.
  • installDependencies โ€“ Optional. If this value is True, Systems Manager installs the required dependencies on the sidecar container for the SSM agent, if they are not already installed. The default is True. The dependencies are atd, dig, jq, and tc.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

aws:ecs:task-network-packet-loss

Adds packet loss to the network interface using the tc tool. Uses the AWSFIS-Run-Network-Packet-Loss-Sources SSM document. The task definition must have pidMode set to task. The tasks must be managed by AWS Systems Manager. You can’t set networkMode to bridge in the task definition. For more information, see Use the ECS task actions.

Resource type

  • aws:ecs:task

Parameters

  • duration โ€“ The duration of the test, in ISO 8601 format.
  • interface โ€“ Optional. The network interface. The default is eth0.
  • lossPercent โ€“ Optional. The percentage of packet loss. The default is 7%.
  • sources โ€“ Optional. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name, DYNAMODB, andS3. If you specify DYNAMODB or S3, this applies only to the Regional endpoint in the current Region. The default is 0.0.0.0/0, which matches all IPv4 traffic.
  • installDependencies โ€“ Optional. If this value is True, Systems Manager installs the required dependencies on the sidecar container for the SSM agent, if they are not already installed. The default is True. The dependencies are atd, dig, jq, and tc.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

Amazon EKS actions

AWS FIS supports the following Amazon EKS actions.

Actions

aws:eks:inject-kubernetes-custom-resource

Runs a ChaosMesh or Litmus experiment on a single target cluster. You must install ChaosMesh or Litmus on the target cluster.

When you create an experiment template and define a target of type aws:eks:cluster, you must target this action to a single Amazon Resource Name (ARN). This action doesn’t support defining targets using resource tags, filters, or parameters.

When you install ChaosMesh, you must specify the appropriate container runtime. Starting with Amazon EKS version 1.23, the default runtime changed from Docker to containerd. Starting with version 1.24, Docker was removed.

Resource type

  • aws:eks:cluster

Parameters

  • kubernetesApiVersion โ€“ The API version of the Kubernetes custom resource. The possible values are chaos-mesh.org/v1alpha1 |litmuschaos.io/v1alpha1.
  • kubernetesKind โ€“ The Kubernetes custom resource kind. The value depends on the API version.
    • chaos-mesh.org/v1alpha1 โ€“ The possible values are AWSChaos | DNSChaos | GCPChaos | HTTPChaos | IOChaos | JVMChaos |KernelChaos | NetworkChaos | PhysicalMachineChaos | PodChaos | PodHttpChaos | PodIOChaos | PodNetworkChaos | Schedule | StressChaos | TimeChaos |
    • litmuschaos.io/v1alpha1 โ€“ The possible value is ChaosEngine.
  • kubernetesNamespace โ€“ The Kubernetes namespace.
  • kubernetesSpec โ€“ The spec section of the Kubernetes custom resource, in JSON format.
  • maxDuration โ€“ The maximum time allowed for the automation execution to complete, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

No AWS Identity and Access Management (IAM) permissions are required for this action. The permissions required to use this action are controlled by Kubernetes using RBAC authorization. For more information, see Using RBAC Authorization in the official Kubernetes documentation. For more information about Chaos Mesh, see the official Chaos Mesh documentation. For more information about Litmus, see the official Litmus documentation.

aws:eks:pod-cpu-stress

Runs CPU stress on the target pods. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • duration โ€“ The duration of the stress test, in ISO 8601 format.
  • percent โ€“ Optional. The target load percentage, from 0 (no load) to 100 (full load). The default is 100.
  • workers โ€“ Optional. The number of stressors to use. The default is 0, which uses all stressors.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.
  • fisPodSecurityPolicy โ€“ Optional. The Kubernetes Security Standardspolicy to use for the fault orchestration pod created by FIS and the ephemeral containers. Possible values are privileged, baseline and restricted. This action is compatible with all policy levels.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:pod-delete

Deletes the target pods. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • gracePeriodSeconds โ€“ Optional. The duration, in seconds, to wait for the pod to terminate gracefully. If the value is 0, we perform the action immediately. If the value is nil, we use the default grace period for the pod.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.
  • fisPodSecurityPolicy โ€“ Optional. The Kubernetes Security Standardspolicy to use for the fault orchestration pod created by FIS and the ephemeral containers. Possible values are privileged, baseline and restricted. This action is compatible with all policy levels.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:pod-io-stress

Runs I/O stress on the target pods. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • duration โ€“ The duration of the stress test, in ISO 8601 format.
  • workers โ€“ Optional. The number of workers. Workers perform a mix of sequential, random, and memory-mapped read/write operations, forced synchronizing, and cache dropping. Multiple child processes perform different I/O operations on the same file. The default is 1.
  • percent โ€“ Optional. The percentage of free space on the file system to use during the stress test. The default is 80%.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.
  • fisPodSecurityPolicy โ€“ Optional. The Kubernetes Security Standardspolicy to use for the fault orchestration pod created by FIS and the ephemeral containers. Possible values are privileged, baseline and restricted. This action is compatible with all policy levels.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:pod-memory-stress

Runs memory stress on the target pods. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • duration โ€“ The duration of the stress test, in ISO 8601 format.
  • workers โ€“ Optional. The number of stressors to use. The default is 1.
  • percent โ€“ Optional. The percentage of virtual memory to use during the stress test. The default is 80%.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.
  • fisPodSecurityPolicy โ€“ Optional. The Kubernetes Security Standardspolicy to use for the fault orchestration pod created by FIS and the ephemeral containers. Possible values are privileged, baseline and restricted. This action is compatible with all policy levels.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:pod-network-blackhole-port

Drops inbound or outbound traffic for the specified protocol and port. Only compatible with the Kubernetes Security Standards privilegedpolicy. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • duration โ€“ The duration of the test, in ISO 8601 format.
  • protocol โ€“ Optional. The protocol. The possible values are tcp and udp. The default is tcp.
  • trafficType โ€“ The type of traffic. The possible values are ingress and egress.
  • port โ€“ The port number.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:pod-network-latency

Adds latency and jitter to the network interface using the tc tool for traffic to or from specific sources. Only compatible with the Kubernetes Security Standards privilegedpolicy. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • duration โ€“ The duration of the test, in ISO 8601 format.
  • interface โ€“ Optional. The network interface. The default is eth0.
  • delayMilliseconds โ€“ Optional. The delay, in milliseconds. The default is 200.
  • jitterMilliseconds โ€“ Optional. The jitter, in milliseconds. The default is 10.
  • sources โ€“ Optional. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name, DYNAMODB, andS3. If you specify DYNAMODB or S3, this applies only to the Regional endpoint in the current Region. The default is 0.0.0.0/0, which matches all IPv4 traffic.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:pod-network-packet-loss

Adds packet loss to the network interface using the tc tool. Only compatible with the Kubernetes Security Standards privilegedpolicy. For more information, see Use the EKS pod actions.

Resource type

  • aws:eks:pod

Parameters

  • duration โ€“ The duration of the test, in ISO 8601 format.
  • interface โ€“ Optional. The network interface. The default is eth0.
  • lossPercent โ€“ Optional. The percentage of packet loss. The default is 7%.
  • sources โ€“ Optional. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name, DYNAMODB, andS3. If you specify DYNAMODB or S3, this applies only to the Regional endpoint in the current Region. The default is 0.0.0.0/0, which matches all IPv4 traffic.
  • kubernetesServiceAccount โ€“ The Kubernetes service account. For information about the required permissions, see Configure the Kubernetes service account.
  • fisPodContainerImage โ€“ Optional. The container image used to create the fault injector pod. The default is to use the images provided by AWS FIS. For more information, see Pod container images.
  • maxErrorsPercent โ€“ Optional. The percentage of targets that can fail before the fault injection fails. The default is 0.
  • fisPodLabels โ€“ Optional. The Kubernetes labels that are attached to the fault orchestration pod created by FIS.
  • fisPodAnnotations โ€“ Optional. The Kubernetes annotations that are attached to the fault orchestration pod created by FIS.

Permissions

  • eks:DescribeCluster
  • ec2:DescribeSubnets
  • tag:GetResources

AWS managed policy

aws:eks:terminate-nodegroup-instances

Runs the Amazon EC2 API action TerminateInstances on the target node group.

Resource type

  • aws:eks:nodegroup

Parameters

  • instanceTerminationPercentage โ€“ The percentage (1-100) of instances to terminate.

Permissions

  • ec2:DescribeInstances
  • ec2:TerminateInstances
  • eks:DescribeNodegroup
  • tag:GetResources

AWS managed policy

Amazon ElastiCache actions

AWS FIS supports the following ElastiCache action.

aws:elasticache:interrupt-cluster-az-power

Interrupts power to nodes in the specified Availability Zone for target Redis Replication Groups. When a primary node is targeted, the corresponding read replica with the least replication lag is promoted to primary. Read replica replacements in the specified Availability Zone are blocked for the duration of this action, which means that target Replication Groups operate with reduced capacity.

Resource type

  • aws:elasticache:redis-replicationgroup

Parameters

  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • elasticache:InterruptClusterAzPower
  • elasticache:DescribeReplicationGroups
  • tag:GetResources

Network actions

AWS FIS supports the following network actions.

Actions

aws:network:disrupt-connectivity

Denies the specified traffic to the target subnets. Uses network ACLs.

Resource type

  • aws:ec2:subnet

Parameters

  • scope โ€“ The type of traffic to deny. When the scope is not all, the maximum number of entries in network ACLs is 20. The possible values are:
    • all โ€“ Denies all traffic entering and leaving the subnet. Note that this option allows intra-subnet traffic, including traffic to and from network interfaces in the subnet.
    • availability-zone โ€“ Denies intra-VPC traffic to and from subnets in other Availability Zones. The maximum number of subnets that can be targeted in a VPC is 30.
    • dynamodb โ€“ Denies traffic to and from the Regional endpoint for DynamoDB in the current Region.
    • prefix-list โ€“ Denies traffic to and from the specified prefix list.
    • s3 โ€“ Denies traffic to and from the Regional endpoint for Amazon S3 in the current Region.
    • vpc โ€“ Denies traffic entering and leaving the VPC.
  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • prefixListIdentifier โ€“ If the scope is prefix-list, this is the identifier of the customer managed prefix list. You can specify a name, an ID, or an ARN. The prefix list can have at most 10 entries.

Permissions

  • ec2:CreateNetworkAcl โ€“ Creates the network ACL with the tag managedByFIS=true.
  • ec2:CreateNetworkAclEntry โ€“ The network ACL must have the tag managedByFIS=true.
  • ec2:CreateTags
  • ec2:DeleteNetworkAcl โ€“ The network ACL must have the tag managedByFIS=true.
  • ec2:DescribeManagedPrefixLists
  • ec2:DescribeNetworkAcls
  • ec2:DescribeSubnets
  • ec2:DescribeVpcs
  • ec2:GetManagedPrefixListEntries
  • ec2:ReplaceNetworkAclAssociation

AWS managed policy

aws:network:route-table-disrupt-cross-region-connectivity

Blocks traffic that originates in the target subnets and is destined for the specified Region. Creates route tables that include all routes for the Region to isolate. To allow FIS to create these route tables, raise the Amazon VPC quota for routes per route table to 250 plus the number of routes in your existing route tables.

Resource type

  • aws:ec2:subnet

Parameters

  • region โ€“ The code of the Region to isolate (for example, eu-west-1).
  • duration โ€“ The length of time the action lasts. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • ec2:AssociateRouteTable
  • ec2:CreateManagedPrefixList โ€ 
  • ec2:CreateNetworkInterface โ€ 
  • ec2:CreateRoute โ€ 
  • ec2:CreateRouteTable โ€ 
  • ec2:CreateTags โ€ 
  • ec2:DeleteManagedPrefixList โ€ 
  • ec2:DeleteNetworkInterface โ€ 
  • ec2:DeleteRouteTable โ€ 
  • ec2:DescribeManagedPrefixLists
  • ec2:DescribeNetworkInterfaces
  • ec2:DescribeRouteTables
  • ec2:DescribeSubnets
  • ec2:DescribeVpcPeeringConnections
  • ec2:DescribeVpcs
  • ec2:DisassociateRouteTable
  • ec2:GetManagedPrefixListEntries
  • ec2:ModifyManagedPrefixList โ€ 
  • ec2:ModifyVpcEndpoint
  • ec2:ReplaceRouteTableAssociation

โ€  Scoped using the tag managedByFIS=true.

AWS managed policy

aws:network:transit-gateway-disrupt-cross-region-connectivity

Blocks traffic from the target transit gateway peering attachments that is destined for the specified Region.

Resource type

  • aws:ec2:transit-gateway

Parameters

  • region โ€“ The code of the Region to isolate (for example, eu-west-1).
  • duration โ€“ The length of time the action lasts. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • ec2:AssociateTransitGatewayRouteTable
  • ec2:DescribeTransitGatewayAttachments
  • ec2:DescribeTransitGatewayPeeringAttachments
  • ec2:DescribeTransitGateways
  • ec2:DisassociateTransitGatewayRouteTable

AWS managed policy

Amazon RDS actions

AWS FIS supports the following Amazon RDS actions.

Actions

aws:rds:failover-db-cluster

Runs the Amazon RDS API action FailoverDBCluster on the target Aurora DB cluster.

Resource type

  • aws:rds:cluster

Parameters

  • None

Permissions

  • rds:FailoverDBCluster
  • rds:DescribeDBClusters
  • tag:GetResources

AWS managed policy

aws:rds:reboot-db-instances

Runs the Amazon RDS API action RebootDBInstance on the target DB instance.

Resource type

  • aws:rds:db

Parameters

  • forceFailover โ€“ Optional. If the value is true, and if instances are Multi-AZ, forces failover from one Availability Zone to another. The default is false.

Permissions

  • rds:RebootDBInstance
  • rds:DescribeDBInstances
  • tag:GetResources

AWS managed policy

Amazon S3 actions

AWS FIS supports the following Amazon S3 action.

Actions

aws:s3:bucket-pause-replication

Pauses replication from target source buckets to destination buckets. Destination buckets can be in different AWS Regions or within the same Region as the source bucket. Existing objects may continue to be replicated for up to one hour after action begins. This action only supports targeting by tags. To learn more about Amazon S3 Replication, see the Amazon S3 user guide.

Resource type

  • aws:s3:bucket

Parameters

  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.
  • region โ€“ The AWS region where destination buckets are located.
  • destinationBuckets โ€“ Optional. Comma separated list of destination S3 bucket(s).
  • prefixes โ€“ Optional. Comma separated list of S3 object key prefixes from replication rule filters. Replication rules of target buckets with a filter based on the prefix(es) will be paused.

Permissions

  • S3:PutReplicationConfiguration with condition key S3:IsReplicationPauseRequest set to True
  • S3:GetReplicationConfiguration with condition key S3:IsReplicationPauseRequest set to True
  • S3:PauseReplication
  • S3:ListAllMyBuckets
  • tag:GetResources

For an example policy, see Example: Use condition keys for aws:s3:bucket-pause-replication.

Systems Manager actions

AWS FIS supports the following Systems Manager actions.

Actions

aws:ssm:send-command

Runs the Systems Manager API action SendCommand on the target EC2 instances. The Systems Manager document (SSM document) defines the actions that Systems Manager performs on your instances. For more information, see Use the aws:ssm:send-command action.

Resource type

  • aws:ec2:instance

Parameters

  • documentArn โ€“ The Amazon Resource Name (ARN) of the document. In the console, this parameter is completed for you if you choose a value from Action type that corresponds to one of the pre-configured AWS FIS SSM documents.
  • documentVersion โ€“ Optional. The version of the document. If empty, the default version runs.
  • documentParameters โ€“ Conditional. The required and optional parameters that the document accepts. The format is a JSON object with keys that are strings and values that are either strings or arrays of strings.
  • duration โ€“ The duration, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • ssm:SendCommand
  • ssm:ListCommands
  • ssm:CancelCommand

AWS managed policy

aws:ssm:start-automation-execution

Runs the Systems Manager API action StartAutomationExecution.

Resource type

  • None

Parameters

  • documentArn โ€“ The Amazon Resource Name (ARN) of the automation document.
  • documentVersion โ€“ Optional. The version of the document. If empty, the default version runs.
  • documentParameters โ€“ Conditional. The required and optional parameters that the document accepts. The format is a JSON object with keys that are strings and values that are either strings or arrays of strings.
  • maxDuration โ€“ The maximum time allowed for the automation execution to complete, from one minute to 12 hours. In the AWS FIS API, the value is a string in ISO 8601 format. For example, PT1M represents one minute. In the AWS FIS console, you enter the number of seconds, minutes, or hours.

Permissions

  • ssm:GetAutomationExecution
  • ssm:StartAutomationExecution
  • ssm:StopAutomationExecution
  • iam:PassRole โ€“ Optional. Required if the automation document assumes a role.

AWS managed policy

Thank you & follow us for more!