Pandas in Python

So, I came across this nifty package called pandas in python which is usually used while doing any data science functionality in python. I find it quite resourceful and it can be used to do plenty of things. So, welcome to the panda world:

1600_pandas_(9).jpg

So, what I have been using it recently for? Here's a list:

1. To read csv and json files

A common way to read csv and json files is through panda and then the data can be plotted, read or inferred as one might please. The method that is used here is read_csv and read_json to read csv files and json files correspondingly.

import pandas as pd

df = pd.read_csv('data.csv')

2. To view data

I can mostly use pandas to read through the data frame to get a quick overview. The method that is used here is head() which returns the header along with specified number of rows as mentioned in code

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(15))

3. Data cleaning or fixing data

Now in a dataset, one can contain empty cells, duplicate data, data in wrong format and wrong data. Pandas help in dealing with these kind of data. We will go through them one by one

1. Empty cell

So, the empty cells or rows can be removed from the dataset using the method dropna(). It removes the rows that contain empty cells

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())

The empty cells can also be replaced with a value using fillna() method. One of the ways to use fillna() method is:

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

2. Duplicate data

Duplicate data can be removed from dataset. To identify duplicates, we use the method duplicated() which returns a boolean value.

/*Returns True for every row that is a duplicate, othwerwise False*/
print(df.duplicated())

/* To remove all duplicates*/
df.drop_duplicates(inplace = True)

3. Wrong data or wrong format

For wrong data or wrong format of data, we can either remove the rows that contain the wrong information or replace it with data containing correct format or value.

Well, that's all for pandas folks! See you in the next blog. Adios!