Getting Started With Pandas



Getting Started With Pandas

0 0


pandas-101

Pandas 101 tutorial

On Github ChillarAnand / pandas-101

Anand Reddy Pandikunta (@chillaranand) http://avilpage.com

Getting Started With Pandas

Topics

Why Pandas?

Vectorization

Pandas DS - Series, DataFrame, Panel

No extensive data analysis/modeling.

Prerequisites

Python - lists, dict

Why Pandas?

Dont reinvent wheel even if it is easy in Python.

Pandas - Defacto data analysis toolkit.

Useful for cleaning, analysing & modeling data.

Built on top of NumPy

Compact, Fast & Easy data manipulation.

pandas.Series (1 dimensional)

Series - Fixed length ordered dict

In [28]: from pandas  import Series

In [29]: s = Series([1, 2, 3, 4])

In [30]: s
Out[30]: 
0    1
1    2
2    3
3    4
dtype: int64
        

Series

In [17]: s = Series([34, -4, 45, 89], index=['a', 'b', 'c', 'd'])

In [18]: s['b']
Out[18]: -4

In [19]: s > 0
Out[19]: 
a     True
b    False
c     True
d     True
dtype: bool

In [20]: s[s > 0]
Out[20]: 
a    34
c    45
d    89
dtype: int64
        

Vectorization

Batch operations on data without for loops.

In [21]: l = [34, 67, 89]                                                                                                                              

In [22]: l * 2
Out[22]: [34, 67, 89, 34, 67, 89]

In [23]: s = Series([34, -4, 45, 89])

In [24]: s * 2
Out[24]: 
0     68
1     -8
2     90
3    178
dtype: int64
        

pandas.DataFrame (2 dimensional)

In [30]: from pandas import DataFrame

In [31]: df = DataFrame([ [71, -2, -93, -34], [44, 5, 34, 7] ])

In [32]: df
Out[32]: 
                0  1   2   3
0  71 -2 -93 -34
1  44  5  34   7
        

DataFrame

In [34]: df[2]
Out[34]: 
0   -93
1    34
Name: 2, dtype: int64

In [35]: df.ix[1]
Out[35]: 
0    44
1     5
2    34
3     7
Name: 1, dtype: int64

In [36]: df.ix[1][2]
Out[36]: 34

In [37]: df[2].ix[1]
Out[37]: 34
        

DataFrame..


    

pandas.Panel (3 dimensional)

In [64]: from pandas import Series, DataFrame, Panel

In [65]: p = Panel( [ [[1, 4, 1], [4, 66, 7]], [[23, 45, 56], [23, 4, 6]] ])

In [66]: p
Out[66]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 2 (major_axis) x 3 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 1
Minor_axis axis: 0 to 2

In [67]: p[0]
Out[67]: 
               0   1  2
0  1   4  1
1  4  66  7
        

Lets get into real world…

Resources

[Docs] http://pandas.pydata.org/

[Book] Python for data analysis.

Questions?