The Cloud



The Cloud

0 0


sot-cloud

Presentation for SoT 2013 on "the cloud"

On Github ajesler / sot-cloud

The Cloud

latest tech fad?

“Computing may someday be organized as a public utility just as the telephone system is a public utility, each subscriber needs to pay only for the capacity he actually uses, but he has access to all programming languages characteristic of a very large system” “The computer utility could become the basis of a new and important industry.”

John McCarthy

1961

The year of John McCarthy's speech.

29 October, 1969

The date two nodes of what would go on to become APRANET were connected.

1997

Professor Ramnath Chellappa

Characteristics

  • On demand self service
  • Broad network access
  • Resource pooling
  • Rapid elasticity
  • Measured service

Using the NIST definition of cloud computing

Resources on demand!

Resources?

  • Compute
  • Storage
  • Network
  • Telephony
  • Queueing
  • Caching, Database, etc...

Service Models

  • IaaS
  • PaaS
  • SaaS
IaaS: Google Compute Cloud, Amazon EC2, Microsoft Azure, Rackspace Cloud, GoGrid PaaS: VMWare CloudFoundry, Google App Engine, Amazon Elastic Beanstalk, Force.com, Redhat Openshift, Heroku, Cloud Services for Windows Azure SaaS: Google Apps, Zendesk, Twilio, GitHub

Why is cloud computing popular?

Reduces costs

No need to purchase hardware prior to a project. You don't need to know how much capacity you need up front. May reduce staff, admin, and licensing costs. Spot and reserve instances.

Agility

Enables rapid prototyping and experimentation You want to benchmark a ten node cluster? Go for it. $20 an hour or less. Acquire resources as you need them.

Elasticity

Scale to cope with variable workloads No need to spend money on resources that are only there to handle peak loads Autoscaling - avoid the slashdot effect without human intervention.

For developers

Instant Resources!

Have an idea you want to test? Go for it.

Automatable

Cloud services tend to have good APIs. As simple as git push - you will do this in the demo.

Chef & Opscode

Not just services which are automatable. Examples of infrastructure automation tool. Not cloud only. Puppet is an alternative.

$ knife st env start -E test -r us-west-2 -z us-west-2a

This is all it takes to start the entire test environment of 3 app servers and 3 datastore servers, as well as copy the latest backup from production, and run the latest configuration on each node.

Vagrant

Create a vm from a box file and a configuration. Useful for building a local cluster for testing. Integrates with puppet and chef. Can use VMWare or VirtualBox.

For Students

Free usage

Many providers (eg AWS, Heroku) have free tiers or initial credit you can experiment with. Students can sometimes get educational credit. If you are doing a project and want to try, email support and ask! Report bugs for fun and profit! Eg $20 twilio credit for reporting a doc mistake.

Unexpected features

Often worth exploring features as there can be some interesting side benefits. For example, you can host a static website on both GitHub and S3.

Architecting for the Cloud

Cloud != Desktop

Very different way of thinking.

Loosely coupled services

Individual components fail!

Expect this and design for it!

If your design is good, your response will be "Meh."

Redundancy

Things fail. Have spares that take over automatically. You have unlimited* resources, use them. On demand even.

Fault Tolerance

What happens when a service goes down? When a node dies? Example is Cassandra datastore. Gossip protocol - bad node backoff. Can be re-added later with no issues.

SPOF

Think carefully about the things that can go wrong between clients and you. Can turn up in some unexpected places. What can break between you and the consumer? How can you mitigate the risk or deal with it when it happens? eg GoDaddy DNS issue - spidertracks.com did not resolve. Website was working fine. Customer doesnt care - they know they cant access your site but they can access google - therefore internet is fine and it is a problem with you.

Your cloud provider == SPOF?

If your provider fails, what are you going to do about it? Generally, nothing. Cloud providers will fail. Split your resources usage between data centers and providers if appropriate. * AWS, AppEngine, and Gmail have all had outages. * Megaupload - established 2005, seized Jan 2012, allegedly dedicated to copyright infringement. Still unresolved. Paying customers lost data. * Lavabit - An encrypted mail service shut down by operator last week, because they did not want to "become complicit in crimes against the American people". Cannot legally say why.

Scalability

Your design needs to be scalable horizontally to take full advantage of the cloud. Add and remove resources as required without having to restart / redeploy. State is bad, cluster aware are good.

m1.xlarge

to

m2.4xlarge

AWS EC2 allows scaling vertically by changing the node type. If your infrastructure is automated, this is easy! vCPU, ECU, Memory (GiB), Instance Storage (GB) General purpose m1.xlarge 64-bit 4 8 15 4 x 420 Memory optimized m2.4xlarge 64-bit 8 26 68.4 2 x 840

State is bad!

The next request from a client may not be serviced by the same machine.

Partition and Cache data

"Cloud scale" data does not fit on one node. It might at the start, but it wont forever. Design for this. Eg cassandra. redundancy in database storage.

Webhooks

API = pull. Webhook = push. Basically a callback URL. Many web services use them. Allow integrations between systems. Provide event support. Eg Twilio SMS receive - how you respond controls what Twilio does.

Http Request

  • SmsSid
  • AccountSid
  • From
  • To
  • Body
Makes POST or GET request ot your server.

Http Response

<!--?xml version="1.0" encoding="UTF-8"?-->
<response>
    <sms>Thanks for the message!</sms>
</response>

Downsides to the cloud

Transient errors

Things fail. Sometimes for no apparent reason.

Vendor lock in

Ensure that you can extract your data from the provider. Remember, cloud providers are not forever.
“If you dont own it, abstract away the details.”

@ammeep at Code Mania After Dark

There are libraries that will abstract away a particular cloud provider. Eg Fog, a Ruby gem. Abstraction not always easy when different feature sets are involved. Interface for your system to use + provider specific implementations. Many cloud service providers provide their client jars for different languages. Example: Abstraction of messaging provider in GO.

Configuration Developers may end up doing operations tasks. Installing and configuring software. How do you know you are doing it right and not leaving gaping holes in your infrastructure or creating performance problems?

Cutting Edge

You may not have a full understanding of your dependencies before it goes live. Be careful with cutting edge technology. You will most likely get cut at some point. What if it gets abandoned by everyone else or a competitor becomes more popular? Chose an appropriate tool/service for the task, not the latest fad. Be able to justify your choice against similar tools/services.

Backups and Recovery

Have a plan and test it regularly. How do you know all your nodes are being backed up?

Security

Does your industry have any requirements about how data can be stored? HIPAA, etc. Do you have to provide an audit trail? Are you allowed to store it offshore / in an unknown place? Govt clouds - dif to normal clouds how? Do you trust the cloud service provider to get it right? Encrypt data?

Privacy

USA Patriot act allows us govt to request data from US based companies. PRISM - cooperation of google, microsoft, yahoo, etc in making data available to US govt. Metadata != data?

Testing in the Cloud

Some clound services offer test credentials - authenticate but dont perform operations.

Test / Staging Environments

This will make your life better. Have them.

Simian Army

Simian army is a set of programs that can be run against your AWS infrastructure. Doctor checks health of nodes. Janitor - removes unused resources. Conformity - ensures best practices are used Latency - simlates latency on services Security - violations or vulnerabilities Chaos Monkey randomly terminates instances. Chaos Gorilla kills an entire availability zone. Netflix has opensourced a lot of software, much of it related to AWS and java.

Exposing Your Server

Localtunnel ssh port forwarding to internet addressable server - handy to know Can also use ssh to set up proxies - useful for testing also. Running an application on your local machine means it is easier to debug and monitor what is happening. Proxies - eg Fiddler, Wiretap

Monitoring

Now your cloud application is up and running, how do you keep an eye on its health? Many providers also provide there own monitoring systems. Real time alerts - catch issues before they become more serious. PagerDuty, etc.

Dashboard

Show the health of the application. Graphs and metrics - see at a glance what is going on.

Log Aggregation

Cloud means that many different machines may be involved in the handling of a request / action. It pays to be able to follow the processing through. Design with logging in mind. What information do you need? Having the ability to search logs is incredibly useful. Different priorities of logging. Some entries only relevant for a short time, some relevant forever, eg audit trail of user actions. Great help to support when you can say exactly what a user did and when. Spidertracks uses log4j + graylog. Provides searching and real time monitoring of logs. Logs are also archived long term in Amazon S3.

Remote monitoring

Not the same as logging. Not the same as a dashboard. Provides remote access to core platform management functionality. JMX. Can be very useful to be able to remote debug.

DEMO TIME