On Github mauricio / phillydevops-cf-talk
Just go there and provision a bunch of new machines. And please remember to take them down once the spike is over!
But our systems were not making use of this elasticity...
Revamp the provisioning process.
When you're auto-scaling to meet real time customer demand, you can't waste any time.
k mint spi -b SPI-Bundle-RELEASE-1.5.370-NSDK-4.0.0.242
{ "environment" : "production", "role" : "thumbnailer" }
A separate script reads the user data, calls Chef using the given role and environment. Instance gets configured and services are ready to action.
This is where it gets interesting.
It has to be in CloudWatch but you can push anything there. Using a metric that is already provided by AWS is always simpler.
Alarms trigger actions when their threshold is met.
"ScaleUpWorkerAlarm": { "Type": "AWS::CloudWatch::Alarm", "Properties": { "AlarmDescription": "Scale-Up if queue depth exceeds our limit", "Namespace": "AWS/SQS", "MetricName": "ApproximateNumberOfMessagesVisible", "Dimensions": [ { "Name": "QueueName", "Value": "MyQueue" } ], "Statistic": "Average", "Period": "60", "EvaluationPeriods": "3", "Threshold": 100, "ComparisonOperator": "GreaterThanThreshold", "AlarmActions": [ { "Ref": "WorkerScaleUpPolicy" } ] } }
"WorkerScaleUpPolicy": { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": { "Ref": "WorkerAutoScalingGroup" }, "Cooldown": 300, "ScalingAdjustment": 1 } }
Metrics, alarms and policies work together to make your auto-scaling group grow or shrink as needed. You can have as many alarms, metrics and policies as you want, just make sure they actually represent how you want your app to grow.
Resource creation was all over the place, now only CloudFormations do it.
Now you just open the CloudFormations associated with it and it should be there.
Templates must include their own security policies and allow access only to resources they themselves create, using IAM (Identity and Access Management) profiles.
The service went from being manually provisioned and scaled to a full fledged auto-scaling solution. It now runs at 1/2 of the original cost and served as an example for all new services being created.
and being bitten every once in a while.
If AWS can generate a name for it, do not name it. Use CloudFormation outputs to get their names.
If you really need to do it make sure the dependency tree is shallow or you will have trouble.
Don't place your RDS database at the same template as your webapp auto-scaling group.
And make sure these tools understand how to name stacks and validate parameters.
k cfn id2 server update -e qa -c neat
Whenever you want to deploy something, scale up the group that is not currently scaled and then scale down the one that was.
Yes, I'm repeating this.
We're all humans, send notifications for more than one threshold to make sure they won't be snoozed into oblivion.
Because all machines die.
Want to figure out what will change between the current template and the one deployed? Run it. If Justin Campbell was here he would say Terraform has diffs.
You'll be in for a lot of trouble.
But are tools to use other languages like Python or Ruby to declare templates.
S3 notifications still don't have all the options available at the console/API.
You're investing and you're stuck.
Problemns? Open a ticket and wait.
Check what we did at https://github.com/TheNeatCompany/cfn-bridge
We already have the numbers and the usage patterns are quite consistent, scaling up based on time means customers wait even less when they actually start to use the app.
While all new apps have moved to CF-based setups, our monolith is still work in progress, but we will get there.
Right now the health checks are rudimentar and not very effective and spotting instances that are misbehaving.