AWS Spot Instances with Auto Scaling & FIS

Scalability, cost-effectiveness, and resilience are essential for contemporary cloud-native applications. Although up to 90% less expensive than On-Demand instances, AWS Spot Instances pose a risk to workload availability due to their transient nature. This is where resilience testing with AWS Fault Injection Simulator (FIS) and clever automation with EC2 Auto Scaling come into play.

In this article, we’ll look at how to use EC2 Auto Scaling to automate Spot Instance utilization and how to use AWS FIS to test your system’s fault tolerance and replicate real-world failures.

Why Spot Instances?

Spot Instances allow you to take advantage of unused Amazon EC2 capacity at reduced prices. However, if AWS needs the capacity back, it can be stopped with only two minutes’ notice.

Ideal Use Cases for Spot Instances:

Batch processing jobs
CI/CD workloads
Containerized workloads (e.g., Kubernetes, ECS)
Fault-tolerant microservices

To effectively leverage Spot Instances in production, you need to automate their provisioning and ensure graceful fallback to On-Demand capacity during interruptions.

Automating Spot Instances with EC2 Auto Scaling

Amazon EC2 Auto Scaling automatically adjusts the number of instances in your application’s fleet based on demand, health checks, or schedules. By configuring a mixed instance policy, you can blend Spot and On-Demand capacity to optimize cost and availability.

Step 1: Define a Launch Template

Create an EC2 Launch Template that includes:

Instance type(s)
AMI ID
Key pair
Security group
User data (for bootstrapping)

This template forms the blueprint for launching EC2 instances in your Auto Scaling Group (ASG).

Step 2: Create an Auto Scaling Group with Mixed Instances

In your ASG configuration, select a Mixed Instances Policy. This allows you to set preferences like:

Spot allocation strategy: e.g., capacity-optimised or lowest-price
On-Demand base capacity: Minimum number of On-Demand instances to always have
Percentage split: Define how much of your fleet should be Spot vs On-Demand
Instance pools: Provide flexibility across multiple instance types and availability zones

Example:

“MixedInstancesPolicy”: {

“LaunchTemplate”: {

“LaunchTemplateSpecification”: {

“LaunchTemplateId”: “lt-0abcd1234”,

“Version”: “$Latest”

}

“InstancesDistribution”: {

“OnDemandPercentageAboveBaseCapacity”: 30,

“SpotAllocationStrategy”: “capacity-optimised”

}

Step 3: Attach Scaling Policies

To enable elasticity, attach:

Target tracking policies (e.g., CPU utilisation)
Scheduled actions (scale at specific times)
Step scaling policies (adjust based on thresholds)

This guarantees that your Spot instances automatically scale in and out in real time in response to demand.

The effectiveness of AWS Fault Injection Simulator Automation in enhancing fault tolerance depends on its stress resilience. Presenting AWS Fault Injection Simulator (FIS), a completely managed solution for conducting controlled tests using chaos engineering on workloads hosted on AWS.

Why Use FIS?

FIS helps answer critical questions:

What happens when a Spot instance is interrupted?
Does the Auto Scaling Group replace lost capacity?
Is there failover to On-Demand instances?
Are application metrics and alerts triggered correctly?

By simulating failures, FIS ensures your automation logic behaves predictably and recovers quickly. Common Chaos Scenarios for EC2 Spot Instances:

Terminate EC2 Spot Instances: Simulates AWS reclaiming Spot capacity.
Simulate Network Latency or Packet Loss: Helps identify how dependent services handle degraded performance.
Inject CPU or memory stress to validate that scaling policies are triggered as expected.

Step 1: Set Up IAM Roles

FIS requires a role with permissions to perform actions like:

ec2:TerminateInstances
autoscaling: UpdateAutoScalingGroup
cloudwatch: GetMetricData
Logging to CloudWatch

Attach the FIS role to your experiment templates.

Step 2: Define an Experiment Template

An FIS experiment template contains:

Targets: e.g., EC2 instances in a specific Auto Scaling Group
Actions: e.g., terminate a Spot instance
Stop conditions: CloudWatch alarms that halt the experiment if thresholds are breached

Example:

targets:

spotInstances:

resourceType: aws:ec2:instance

selectionMode: COUNT(1)

filters:

– path: “InstanceLifecycle”

values: [“spot”]

actions:

terminateSpot:

actionId: aws:ec2:terminate-instances

parameters:

instanceIds: “{{spotInstances}}”

Step 3: Run Experiments and Analyse

Execute the experiment and observe:

Auto Scaling group replaces terminated Spot instance
Replacement respects instance type preferences
CloudWatch alarms and logs are triggered
Application availability is unaffected

This proactive testing hardens your system against real-world issues.

Best Practices for Spot + Auto Scaling + FIS

Increased diversity lowers the chance of interruptions. Diversify instance types and AZs.
Stable spot pools are given priority when capacity-optimised allocation is used.
Maintaining on-demand base capacity at all times guarantees a minimum level of availability.

Keep an eye on interruption notices: To gently terminate programs, use Spot instance termination notices.
Automate recovery logic: To handle events brought on by FIS experiments, use System Manager, EventBridge, or AWS Lambda.
Conduct regular experiments with chaos: Plan to include FIS scenarios in your resilience tests or CI/CD pipeline.

Conclusion

Spot Instances offer unmatched cost advantages, but their unpredictable availability can be risky without proper automation and resilience strategies. By combining EC2 Auto Scaling’s mixed instance policies with AWS Fault Injection Simulator’s controlled chaos, you can build an infrastructure that’s both cost-efficient and highly reliable. Whether you’re running stateless microservices, containerised workloads, or batch jobs, this powerful combo empowers you to embrace Spot Instances confidently, while staying prepared for the unexpected.

Facing issues?

Our technical support
engineers can solve it.