What is Big Data? – Light introduction – Data Scientists to rescue!



What is Big Data? – Light introduction – Data Scientists to rescue!

0 0


cafe-web-nov-2015

Talk for meetup in Manresa Nov 2015

On Github iskracat / cafe-web-nov-2015

What is Big Data?

Light introduction

Ramon Navarro

Berta Capdevila

Created by Iskra Big Data Solutions / @iskraTIC

LOTS of INFORMATION

Can we manage to get information from our data ? What can we do to get what is needed ? Exemple : Business inteligence - Data analysis - Machine learning

NO MAGIC

Can we manage to get information from our data ? What can we do to get what is needed ?

Data Scientists to rescue!

Data Scientist (n.): Person who is worst at statistics that any statistician and wordst at software engineering than any software engineer

@chdoig

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Team work

Software

Software Data Scientist Modeler

  • R
  • SKlearn
  • PyBrain
  • TextBlob
  • Theano
  • TensorFlow

Software Data Scientific Computing

  • SciPy
  • NumPy
  • Numba

Software Data Analytics

  • Pandas
  • Postgresql
  • Excel

Software Distributed System

  • Spark
  • Hadoop

Software Data Web

  • Pyramid
  • Bokeh
  • Plone

Machine Learning

"Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”

Arthur Samuel (1959)

"A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

Tom Mitchell (1997)

  • Task T: Predict traffic patterns at a busy intersection
  • Experience E: Run past data throught a machine learning algorithm
  • If correct:
  • Performance measure P: Better predicting future traffic patterns

Predicting house pricing

150.000 $

200.000 $

"The goal of ML is never to make “perfect” guesses, because ML deals in domains where there is no such thing. The goal is to make guesses that are good enough to be useful.”

George E. P. Box

Steps

1. Reading in the data and cleaning it

2. Exploring and understanding the input data

3. Analyzing how best to present the data to the learning algorithm

4. Choosing the right model and learning algorithm

5. Measuring the performance correctly

Application Fields

Data mining

  • Web click data
  • Medical records
  • Informatic Biology
  • Engineering

Applications you cannot write by hand

  • Handwriting recognition
  • Natural Language Processing (NLP) or Computer Vision

Self-customizing programs

Human learning

IS MACHINE LEARNING MAGIC?

NO MAGIC

Can we manage to get information from our data ? What can we do to get what is needed ?
"Machine Learning is the extraction of knowledge from data. You have a question and you are trying to answer, and you think the answer is in the data".

THE END

Gràcies

iskra.cat

What is Big Data? Light introduction Ramon Navarro Berta Capdevila Created by Iskra Big Data Solutions / @iskraTIC