PANDAS MISSING VALUES

BATHULA PRAVEEN (BP)
0

PANDAS MISSING VALUES


 The most time consuming part of a data science project is data cleaning and preparation. However, there are many powerful tools to expedite this process. One of them is Pandas which is a widely used data analysis library for Python.

Handling missing values is an essential part of data cleaning and preparation process because almost all data in real life comes with some missing values. In this post, I will explain how to detect missing values and handle them in a proper and efficient way using Pandas.

Note: A new missing data type (<NA>) introduced with Pandas 1.0 which is an integer type missing value representation.

np.nan is float so if you use them in a column of integers, they will be upcast to floating-point data type as you can see in “column_a” of the dataframe we created. However, <NA> can be used with integers without causing upcasting. Let’s add one more column to the dataframe using <NA> which can be used by explicitly requesting the dtype Int64Dtype()

Finding Missing Values

Pandas provides isnull()isna() functions to detect missing values. Both of them do the same thing.

df.isna() returns the dataframe with boolean values indicating missing values.

df.isna().any() returns a boolean value for each column. If there is at least one missing value in that column, the result is True.

df.isna().sum() returns the number of missing values in each column.

Handling Missing Values

Not all missing values come in nice and clean np.nan or None format. For example, “?” and “- -“ characters in column_c of our dataframe do not give us any valuable information or insight so essentially they are missing values. However, these characters cannot be detected as missing value by Pandas.

If we know what kind of characters used as missing values in the dataset, we can handle them while creating the dataframe using na_values 

👇CLICK BELOW SEE EXAMPLE PROGRAMS👇

👉EXAMPLE PROGRAMS ON PANDAS MISSING VALUES👈

Post a Comment

0Comments

Post a Comment (0)