Aritificial Intelligence – A Crash Course – Where did it begin?



Aritificial Intelligence – A Crash Course – Where did it begin?

0 0


Artificial-Intelligence-A-Crash-Course-Slides

Artificial Intelligence: A Crash Course Slides

On Github jpdurham / Artificial-Intelligence-A-Crash-Course-Slides

Aritificial Intelligence

A Crash Course

Josh Durham

Aim

  • Talk about Artificial Intelligence (A.I.) and where it came from
  • Discover what constitutes A.I.
  • Problem solving using A.I.
  • Build our own genetic algorithm
  • Build our own neural network
  • Explore some tools

Learn all the things!

What is Artificial Intelligence?

John McCarthy, who coined the term in 1955, defines it as "the science and engineering of making intelligent machines"

A.I. can be broken down into two classifications:

  • Strong - hypothetical artificial intelligence at least as smart as a human (doesn't exist yet)
  • Weak - non-sentient computer intelligence, typically focused on a narrow task (what we currently know as A.I.)
  • Computer/cognitive scientist
  • one of the founders of AI
  • developed Lisp programming language family
  • GEQO PostgreSQL

Where did it begin?

Alan Turing suggested a machine could shuffle symbols as simple as "0" and "1" to simulate any conceivable act of mathematical deduction. Naturally, this inspired researchers to begin considering building an electronic brain

A.I. was founded at a conference on the Dartmouth College in 1956. John McCarthy, along with several other notable attendees were able to achieve impressive results in a relatively short period of time which led to them be overly optimistic about the future of A.I.

A.I. funding was drastically cut in the mid-70s until the early 80s (aka, "A.I. winter") when expert systems took off

By the late 80s, A.I. funding tapered off and thus began the second A.I. winter

In the late 90s, interest in A.I. was revised thanks to increased computational power, emphasis on solving subproblems, and the development of solid mathematical methods and scientific standards.

Thanks to increased computer speed, Garry Kasparov loses a game of chess to Deep Blue on May 11, 1997

Most recently, big data, deep learning, and faster computers have yielded incredible advances in machine learning and perception

Use of A.I. is widespread--in smartphones, games, design, decision-making, etc...

Side note: Dealing with Data

Before we get too far in to this, let's talk about data because it can play a significant role in the success (or failure) of your A.I.

  • Clean your data--make sure it is self-consistent
  • Separate data into training and verification/test data
  • Do not train on verification/text data and do not do validation testing on training data
  • There are a number of tools out there to make data cleaning easier--OpenRefine, Wrangler (free with restrictions), among others

What problmes is A.I. Trying to Solve?

  • Reasoning
  • Planning
  • Learning
  • Natural Language Processing
  • Perception/Recognition
  • Object manipulation
  • General intelligence
Statistical approaches solve specific subproblems and are responsible for many recent A.I. successes

Approaches to A.I.

  • Knowledge based systems
    • Bring in experts to write custom logic (aka, expert systems) that applies to the problem being solved
  • Machine Learning
    • Simulate techniques people use to solve problems
    • Let the machine discover the best way to solve a given problem
  • Logic-based
    • Find the essense of abstract reasoning and problem solving, regardless of whether people used the same algorithms
  • Anti-logic-based (aka, scruffy)
    • Difficult problems require algorithms to be built by hand, complicated concept by complicated concept

Going Further Into Machine Learning

  • Four types:
    • Supervised - outputs are provided and are used to train the algorithm to get the desired outputs
    • Unsupervised - no desired outputs are provided, instead the data is usually clustered into various classes
    • Semi-supervised - the algorithm uses a small amount of "labeled" and a large amount of "unlabeled" data
    • Reinforcement - the algorithm starts with "unlabeled" data and, as it runs, receives "labeled" data about its performance. Finds a balance between data exploration and exploitation
  • Why?
  • What problems does it help solve?
  • Is there a better way? Sometimes...we could finely tune an algorithm to teach a robot to walk, but the next robot off the line might be different enough that the handcoded algorithm is inefficient

Genetic Algorithms

Inspired by nature and evolution and is a great general-purpose learning algorithm

  • First and foremost, optimization problems
  • Teaching a robot to learn to walk
  • Automated design (CAD, industrial equipment, trading systems, etc..)
  • Facial composites of suspects
  • Evolving game AI
  • Global optimization method for training Recurrent Neural Networks (a networks weights are represented as a single chromosome)

Genetic Algoritms: A 1000ft View

  • First, we will generate and test potential solutions. Based on a feedback function, we will determine the fitness of each potential solution to determine how optimal each potential solution is

  • Potentail solutions that are the farthest from optimal will no longer be considered. Close-to-optimal solutions are crossed with other close-to-optimal solutions to create a new set of potential solutions

  • To help reach the best solution, we will introduce random mutations into a few of the chromosomes

  • Repeat (aka, create new generations) until we've reached a stopping point whether it be a limit on the number of generations, a "close enough" solution, or something else

Building Our own GA

Consider a genetic algorithm that learns how to say "Hello, World!" (HWGA)

This will be contrived example but it's a great way to understand the concepts so stay with me

The Chromosome

  • Respresents a solution candidate
  • In the case of our HWGA, this is represented as a string
  • Example chromosomes:
    • Fiht$, RorLd!
    • Hilto, Forld#
    • Hillt, Rorld!
    • Hillo, Rorld!
    • Hello, World!

The Cost/Fitness Function

  • Provides a measure of optimality of a chromosome
    • Cost function = lower scores are favorable
    • Fitness function = higher scores are favorable
  • In the case of the our HWGA, we need to calculate the difference in characters from that of "Hello, World!" so we will use a cost function.
  • How are we going to come up with this cost function?
    We will make it up There isn't a right or wrong answer here so long as you get results. Some may create an optimal chromosome sooner than others, so try experimenting with variations
  • Much of the power of GAs lies in this function because you can reconcile disparate parameters

Defining our cost function

  • Suppose the letter "A" (65 in ASCII) in one of our chromosomes is supposed to be the letter "H" (73 in ASCII). With this information, we can define our cost function as the square of the difference between the ASCII values of the two letters In this case, the cost for "A" is 64 (65 - 73 = -8, and -8^2 = 64) Note: we square the difference so we always end up with a posivite value
  • To calculate the cost of our chromosome, we can sum all of the values for each character in the chromosome
  • Example chromosomes:
    • Fiht$, RorLd! (489)
    • Hilto, Forld# (278)
    • Hillt, Rorld! (107)
    • Hillo, Rorld! (48)
    • Hello, World! (0)
A tale of mating and death...

The Crossover Function

  • This is where chromosomes "mate" and produce the next generation of chromosomes. GAs are based on evolution, and mating is a significant part of that...as is death
    • The best chromosomes mate
    • The weakest chromosomes die off
  • In the case of the our HWGA, we need to define a function that will return a new generation of chromosomes by mating the best chromosomes and pruning out the weakest.
  • How are we going to come up with this crossover function?
    Yup, we are going to make this up, too Like the cost/fitness function, there isn't necessarily a right or wrong answer here. Some functions may breed new generations that will converge to an optimal (or pretty close to) solution faster, so don't be afraid to experiment

Defining our crossover function

  • As far as our HWGA, defining this function is relatively easy. We will randomly pick two chromosomes and split both of them (keeping in mind they are just strings) at some point either randomly or at a hard-coded index. This will be our pivot point.

    Using our pivot point, we will create two new chromosomes by combining the first half of the first chromosome with the second half of the second chromosome and vice versa

  • Example generation x chromosomes:
    • Hello, wprld!
    • Iello, world!
    Using a pivot point of 6 yields the following generation x+1 chromosomes:
    • Iello, wprld!
    • Hello, world!
  • Think of the chromosomes as balls in a very hilly physical setting with lots of peaks and valleys
  • All the balls start at random locations. They will roll down hills into valleys, some of which are lower than others
  • We may not have found the lowest valley, so mutation essentially gives a ball a kick out of the current valley it's in and gives it a chance to go to another lower/higher valley
  • We want the lowest valley because it will minimize our cost function

The Mutation Function

And the problems with inbreeding

  • The mutation function exists to add additional variation to chromosomes that otherwise might not be possilbe by mating alone. The goal of the mutation is to help ensure we reach a "global optimum" and not just a "local optimum"

  • In the case of the our HWGA, we are basically going to irradiate very few of our chromosomes, i.e., randomly change a few of the letters in the string they represent
  • For example, consider the following chromosomes:

    • Hfllo, Gorld!
    • Hfllo, Gorld!

    They are identical and their mating will yield no new opportunity to find a better solution

The population

The population is the number of chromosomes in a generation. Typically, this will remain constant throughout the exectution of a GA.

Population size can range from 10-10,000 or anything else. There are pros and cons to using more/less chromosomes. Try tinkering with the population size.

Summary

GAs work by creating generations, which includes:

  • Calculating the cost/fitness score for each chromosome
  • Sorting the chromosome by cost/fitness score
  • Killing a certain number of the weakest members -- you pick the number of chromosomes that will die
  • Mating a certain number of the strongest members -- again, you pick how you do this
  • Mutating members at random
  • Some kind of completeness test -- ie, how do you determine when to consider the problem "solved"?

Okay, now that we've covered the basics of GAs and our HWGA, let's look at the code!

Neural Networks

Made popular by "Deep Learning"

  • Google Cat
  • Machines Dream
  • Natural language processing
  • Natural language understanding (Parsey McParseface)
  • Perception
  • Deep Neural Networks

The choice of the functions used to compute neurons representations is loosely guided by neuroscientific observations. However, modern neural network research is guided by many mathematical and engineering disciplines, and the goal of neural networks is not to perfectly model the brain

It is best to think of feedforward networks as function approximation machines that are designed toachieve statistical generalization, occasionally drawing some insights from what weknow about the brain, rather than as models of brain function

Neural Networks Overview

  • Neurons are the basic unit of a neural network (NN). They have:
    • Dendrites - input
    • Nucleus - processor
    • Axon - output (normalized to a value between 0 and 1)
  • Activated neurons sum all inputs and if the sum is over a threshold, they fire off output
  • In our neuron, there is always an extra input, the bias (1), which ensures a neuron is always activated
  • The activation function ensures the output is between 0 and 1 using a sigmoid function (s-shaped graph)

This is the simplest architecture, it consists of organizing the neurons in layers, and connecting all the neurons in one layer to all the neurons in the next one, so the output of every layer (which is the output of every neuron in that layer) becomes the input for the next layer.

Learning

  • First (aka, input) layer receives inputs from the environment
  • It then activates all of its neurons
  • It then outputs the normalized value to each neuron of the next layer
  • Each connection has an assigned weight that influences the input into a neuron in the next layer
  • This repeats for each layer unil the last (output) layer where data is then measured and errors in the network are corrected via backpropagation

Now let's have a look at a neural network that solves xor!

Code for the xor trainer

OpenCV source

Tools

ConvNetJS

Additional Resources

The Future of A.I.

  • Can strong A.I. exist without a physical body?
  • Is there a single learning algorithm that can learn in the way humans do?
  • Are humans the best models for building smarter A.I.?
  • Are Isaac Asimov's Laws of Robotics Reasonable?
  • What governs current A.I. ethics?

Questions?

Thanks for attending my session! Don't forget to fill out the session evaluation survey!

Slides and code will be available on https://github.com/jpdurham

Josh Durham

Aritificial Intelligence A Crash Course Josh Durham