Why Pandas?
Vectorization
Pandas DS - Series, DataFrame, Panel
No extensive data analysis/modeling.
Python - lists, dict
Dont reinvent wheel even if it is easy in Python.
Pandas - Defacto data analysis toolkit.
Useful for cleaning, analysing & modeling data.
Built on top of NumPy
Compact, Fast & Easy data manipulation.
Series - Fixed length ordered dict
In [28]: from pandas import Series In [29]: s = Series([1, 2, 3, 4]) In [30]: s Out[30]: 0 1 1 2 2 3 3 4 dtype: int64
In [17]: s = Series([34, -4, 45, 89], index=['a', 'b', 'c', 'd']) In [18]: s['b'] Out[18]: -4 In [19]: s > 0 Out[19]: a True b False c True d True dtype: bool In [20]: s[s > 0] Out[20]: a 34 c 45 d 89 dtype: int64
Batch operations on data without for loops.
In [21]: l = [34, 67, 89] In [22]: l * 2 Out[22]: [34, 67, 89, 34, 67, 89] In [23]: s = Series([34, -4, 45, 89]) In [24]: s * 2 Out[24]: 0 68 1 -8 2 90 3 178 dtype: int64
In [30]: from pandas import DataFrame In [31]: df = DataFrame([ [71, -2, -93, -34], [44, 5, 34, 7] ]) In [32]: df Out[32]: 0 1 2 3 0 71 -2 -93 -34 1 44 5 34 7
In [34]: df[2] Out[34]: 0 -93 1 34 Name: 2, dtype: int64 In [35]: df.ix[1] Out[35]: 0 44 1 5 2 34 3 7 Name: 1, dtype: int64 In [36]: df.ix[1][2] Out[36]: 34 In [37]: df[2].ix[1] Out[37]: 34
In [64]: from pandas import Series, DataFrame, Panel In [65]: p = Panel( [ [[1, 4, 1], [4, 66, 7]], [[23, 45, 56], [23, 4, 6]] ]) In [66]: p Out[66]: <class 'pandas.core.panel.Panel'> Dimensions: 2 (items) x 2 (major_axis) x 3 (minor_axis) Items axis: 0 to 1 Major_axis axis: 0 to 1 Minor_axis axis: 0 to 2 In [67]: p[0] Out[67]: 0 1 2 0 1 4 1 1 4 66 7
[Docs] http://pandas.pydata.org/
[Book] Python for data analysis.