Pandas(Python for data science)

Pandas

The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.

Use the following import convention:

>>> import pandas as pd

Pandas Data Structures

Series

A one-dimensional labeled array capable of holding any data type

>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])

DataFrame

A two-dimensional labeled data structure with columns of potentially different types

>>> data = {'Country': ['Belgium', 'India', 'Brazil'],

'Capital': ['Brussels', 'New Delhi', 'Brasília'],

'Population': [11190846, 1303171035, 207847528]}

>>> df = pd.DataFrame(data,

columns=['Country', 'Capital', 'Population'])

I/O

Read and Write to CSV

>>> pd.read_csv('file.csv', header=None, nrows=5)

>>> df.to_csv('myDataFrame.csv')

Read and Write to Excel

>>> pd.read_excel('file.xlsx')

>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')

Read multiple sheets from the same file

>>> xlsx = pd.ExcelFile('file.xls')

>>> df = pd.read_excel(xlsx, 'Sheet1')

Selection

Getting

>>> s['b'] Get one element

-5

>>> df[1:] Get subset of a DataFrame

Country Capital Population

1 India New Delhi 1303171035

2 Brazil Brasília 207847528

Selecting, Boolean Indexing & Setting

By Position

>>> df.iloc([0],[0]) Select single value by row & column

'Belgium'

>>> df.iat([0],[0])

'Belgium'

By Label

>>> df.loc([0], ['Country']) Select single value by row & column labels

'Belgium'

>>> df.at([0], ['Country'])

'Belgium'

By Label/Position

>>> df.ix[2] Select single row of

Country Brazil subset of rows

Capital Brasília

Population 207847528

>>> df.ix[:,'Capital'] Select a single column of

0 Brussels subset of columns

1 New Delhi

2 Brasília

>>> df.ix[1,'Capital'] Select rows and columns

'New Delhi'

Boolean Indexing

>>> s[~(s > 1)] Series s where value is not >1

>>> s[(s < -1) | (s > 2)] s where value is <-1 or >2

>>> df[df['Population']>1200000000] Use filter to adjust DataFrame

Setting

>>> s['a'] = 6 Set index a of Series s to 6

Read and Write to SQL Query or Database Table

>>> from sqlalchemy import create_engine

>>> engine = create_engine('sqlite:///:memory:')

>>> pd.read_sql("SELECT * FROM my_table;", engine)

>>> pd.read_sql_table('my_table', engine)

>>> pd.read_sql_query("SELECT * FROM my_table;", engine)

read_sql()is a convenience wrapper around read_sql_table() and read_sql_query()

>>> pd.to_sql('myDf', engine)

Dropping

>>> s.drop(['a', 'c']) Drop values from rows (axis=0)

>>> df.drop('Country', axis=1) Drop values from columns(axis=1

Sort & Rank

>>> df.sort_index() Sort by labels along an axis

>>> df.sort_values(by='Country') Sort by the values along an axis

>>> df.rank() Assign ranks to entries

Retrieving Series/DataFrame Information

>>> df.shape (rows,columns)

>>> df.index Describe index

>>> df.columns Describe DataFrame columns

>>> df.info() Info on DataFrame

>>> df.count() Number of non-NA values Getting Also see NumPy Arrays Selecting, Boolean Indexing & SettingBasic Information

Summary

>>> df.sum() Sum of values

>>> df.cumsum() Cummulative sum of values

>>> df.min()/df.max() Minimum/maximum values

>>> df.idxmin()/df.idxmax() Minimum/Maximum index value

>>> df.describe() Summary statistics

>>> df.mean() Mean of values

>>> df.median() Median of values

>>> f = lambda x: x*2 Applying Functions

>>> df.apply(f) Apply function

>>> df.applymap(f) Apply function element-wise

Data Alignment

Internal Data Alignment

NA values are introduced in the indices that don’t overlap:

>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])

>>> s + s3

a 10.0

b NaN

c 5.0

d 7.0

Arithmetic Operations with Fill Methods

You can also do the internal data alignment yourself with the help of the fill methods:

>>> s.add(s3, fill_value=0)

a 10.0

b -5.0

c 5.0

d 7.0

>>> s.sub(s3, fill_value=2)

>>> s.div(s3, fill_value=4)

>>> s.mul(s3, fill_value=3)

👉Example problems on Pandas 👈

Pandas(Python for data science)

Post a Comment

HTML Chapter 12

About Us

Contact form

Pandas(Python for data science)

You may like these posts

Post a Comment

Contact form