“Computing may someday be organized as a public utility just as the telephone system is a public utility, each subscriber needs to pay only for the capacity he actually uses, but he has access to all programming languages characteristic of a very large system”
“The computer utility could become the basis of a new and important industry.”
John McCarthy
1961
The year of John McCarthy's speech.
29 October, 1969
The date two nodes of what would go on to become APRANET were connected.
1997
Professor Ramnath Chellappa
Characteristics
- On demand self service
- Broad network access
- Resource pooling
- Rapid elasticity
- Measured service
Using the NIST definition of cloud computing
Resources?
- Compute
- Storage
- Network
- Telephony
- Queueing
- Caching, Database, etc...
Service Models
IaaS: Google Compute Cloud, Amazon EC2, Microsoft Azure, Rackspace Cloud, GoGrid
PaaS: VMWare CloudFoundry, Google App Engine, Amazon Elastic Beanstalk, Force.com, Redhat Openshift, Heroku, Cloud Services for Windows Azure
SaaS: Google Apps, Zendesk, Twilio, GitHub
Why is cloud computing popular?
Reduces costs
No need to purchase hardware prior to a project.
You don't need to know how much capacity you need up front.
May reduce staff, admin, and licensing costs.
Spot and reserve instances.
Agility
Enables rapid prototyping and experimentation
You want to benchmark a ten node cluster? Go for it. $20 an hour or less.
Acquire resources as you need them.
Elasticity
Scale to cope with variable workloads
No need to spend money on resources that are only there to handle peak loads
Autoscaling - avoid the slashdot effect without human intervention.
Instant Resources!
Have an idea you want to test? Go for it.
Automatable
Cloud services tend to have good APIs.
As simple as git push - you will do this in the demo.
Chef & Opscode
Not just services which are automatable.
Examples of infrastructure automation tool. Not cloud only.
Puppet is an alternative.
$ knife st env start -E test -r us-west-2 -z us-west-2a
This is all it takes to start the entire test environment of 3 app servers and 3 datastore servers, as well as copy the latest backup from production, and run the latest configuration on each node.
Vagrant
Create a vm from a box file and a configuration.
Useful for building a local cluster for testing.
Integrates with puppet and chef. Can use VMWare or VirtualBox.
Free usage
Many providers (eg AWS, Heroku) have free tiers or initial credit you can experiment with.
Students can sometimes get educational credit. If you are doing a project and want to try, email support and ask!
Report bugs for fun and profit! Eg $20 twilio credit for reporting a doc mistake.
Unexpected features
Often worth exploring features as there can be some interesting side benefits. For example, you can host a static website on both GitHub and S3.
Architecting for the Cloud
Cloud != Desktop
Very different way of thinking.
Individual components fail!
Expect this and design for it!
If your design is good, your response will be "Meh."
Redundancy
Things fail. Have spares that take over automatically.
You have unlimited* resources, use them. On demand even.
Fault Tolerance
What happens when a service goes down?
When a node dies?
Example is Cassandra datastore. Gossip protocol - bad node backoff. Can be re-added later with no issues.
SPOF
Think carefully about the things that can go wrong between clients and you.
Can turn up in some unexpected places.
What can break between you and the consumer? How can you mitigate the risk or deal with it when it happens?
eg GoDaddy DNS issue - spidertracks.com did not resolve. Website was working fine. Customer doesnt care - they know they cant access your site but they can access google - therefore internet is fine and it is a problem with you.
Your cloud provider == SPOF?
If your provider fails, what are you going to do about it? Generally, nothing.
Cloud providers will fail. Split your resources usage between data centers and providers if appropriate.
* AWS, AppEngine, and Gmail have all had outages.
* Megaupload - established 2005, seized Jan 2012, allegedly dedicated to copyright infringement. Still unresolved. Paying customers lost data.
* Lavabit - An encrypted mail service shut down by operator last week, because they did not want to "become complicit in crimes against the American people". Cannot legally say why.
Scalability
Your design needs to be scalable horizontally to take full advantage of the cloud.
Add and remove resources as required without having to restart / redeploy.
State is bad, cluster aware are good.
m1.xlarge
to
m2.4xlarge
AWS EC2 allows scaling vertically by changing the node type. If your infrastructure is automated, this is easy!
vCPU, ECU, Memory (GiB), Instance Storage (GB)
General purpose m1.xlarge 64-bit
4 8 15 4 x 420
Memory optimized m2.4xlarge 64-bit
8 26 68.4 2 x 840
State is bad!
The next request from a client may not be serviced by the same machine.
Partition and Cache data
"Cloud scale" data does not fit on one node. It might at the start, but it wont forever. Design for this.
Eg cassandra. redundancy in database storage.
Webhooks
API = pull. Webhook = push.
Basically a callback URL. Many web services use them. Allow integrations between systems.
Provide event support.
Eg Twilio SMS receive - how you respond controls what Twilio does.
Http Request
- SmsSid
- AccountSid
- From
- To
- Body
Makes POST or GET request ot your server.
Http Response
<!--?xml version="1.0" encoding="UTF-8"?-->
<response>
<sms>Thanks for the message!</sms>
</response>
Transient errors
Things fail. Sometimes for no apparent reason.
Vendor lock in
Ensure that you can extract your data from the provider.
Remember, cloud providers are not forever.
“If you dont own it, abstract away the details.”
@ammeep at Code Mania After Dark
There are libraries that will abstract away a particular cloud provider. Eg Fog, a Ruby gem.
Abstraction not always easy when different feature sets are involved.
Interface for your system to use + provider specific implementations. Many cloud service providers provide their client jars for different languages.
Example: Abstraction of messaging provider in GO.
Configuration
Developers may end up doing operations tasks. Installing and configuring software. How do you know you are doing it right and not leaving gaping holes in your infrastructure or creating performance problems?
Cutting Edge
You may not have a full understanding of your dependencies before it goes live.
Be careful with cutting edge technology. You will most likely get cut at some point. What if it gets abandoned by everyone else or a competitor becomes more popular?
Chose an appropriate tool/service for the task, not the latest fad. Be able to justify your choice against similar tools/services.
Backups and Recovery
Have a plan and test it regularly. How do you know all your nodes are being backed up?
Security
Does your industry have any requirements about how data can be stored? HIPAA, etc. Do you have to provide an audit trail?
Are you allowed to store it offshore / in an unknown place?
Govt clouds - dif to normal clouds how?
Do you trust the cloud service provider to get it right? Encrypt data?
Privacy
USA Patriot act allows us govt to request data from US based companies.
PRISM - cooperation of google, microsoft, yahoo, etc in making data available to US govt. Metadata != data?
Testing in the Cloud
Some clound services offer test credentials - authenticate but dont perform operations.
Test / Staging Environments
This will make your life better. Have them.
Simian Army
Simian army is a set of programs that can be run against your AWS infrastructure.
Doctor checks health of nodes.
Janitor - removes unused resources.
Conformity - ensures best practices are used
Latency - simlates latency on services
Security - violations or vulnerabilities
Chaos Monkey randomly terminates instances.
Chaos Gorilla kills an entire availability zone.
Netflix has opensourced a lot of software, much of it related to AWS and java.
Exposing Your Server
Localtunnel
ssh port forwarding to internet addressable server - handy to know
Can also use ssh to set up proxies - useful for testing also.
Running an application on your local machine means it is easier to debug and monitor what is happening.
Proxies - eg Fiddler, Wiretap
Monitoring
Now your cloud application is up and running, how do you keep an eye on its health?
Many providers also provide there own monitoring systems.
Real time alerts - catch issues before they become more serious.
PagerDuty, etc.
Dashboard
Show the health of the application.
Graphs and metrics - see at a glance what is going on.
Log Aggregation
Cloud means that many different machines may be involved in the handling of a request / action. It pays to be able to follow the processing through.
Design with logging in mind. What information do you need?
Having the ability to search logs is incredibly useful.
Different priorities of logging. Some entries only relevant for a short time, some relevant forever, eg audit trail of user actions. Great help to support when you can say exactly what a user did and when.
Spidertracks uses log4j + graylog. Provides searching and real time monitoring of logs. Logs are also archived long term in Amazon S3.
Remote monitoring
Not the same as logging.
Not the same as a dashboard.
Provides remote access to core platform management functionality.
JMX. Can be very useful to be able to remote debug.