logo

Pandas Tutorial 2025

Pandas

Pandas is a popular Python library for analyzing data. If you are interested in a specific topic, you can just jump into that topic. If you want to learn the Pandas library from scratch, start reading from the beginning. You can use the Pandas library with other Python libraries. If you want to learn about the Seaborn library, please visit the Seaborn tutorial. If you want to learn about the Scikit-learn library, please visit the Scikit-learn tutorial. If you want to learn about the NumPy library, please visit the NumPy tutorial. If you want to learn about the Matplotlib library, please visit the Matplotlib tutorial. You can use your editor to test your code. Visual Studio Code and Pandas 2.2.2 will be used for the tutorial below.

Installing Pandas

You need to set up a virtual environment in Python. You need to install virtualenv. If you are using pip, run the command below:

pip install virtualenv

If you are using pip3, use pip3 instead of pip.

You need to create a virtual environment in your Python project folder. If you are using pip, run the command below:

python -m venv new_env

If you are using python3, use python3 instead of python. We named the virtual environment "new_env" but you can choose another name.

You can activate the environment:

source new_env/bin/activate

If you are using pip, run the command below:

pip install pandas

If you are using conda, run the command below:

conda install -c conda-forge pandas

To check the version of Pandas library:

import pandas as pd
print(pd.__version__)

If you don't want to install Pandas, you can try the Jupyter Notebook. It's easy and convenient. However, an old version of Pandas is installed and new methods may not work.

If you want to run the codes below without any installation, you can use Google Colab as well.

Pandas DataFrames

After importing the Pandas library, you can create a DataFrame. DataFrame in Python refers to the tabular data (that is organized into a table). You can create, read, edit, delete, and analyze DataFrames with Pandas.

import pandas as pd

Let's create a pandas DataFrame:

df1 = pd.DataFrame({
'Product ID': [1, 2, 3, 4],
'Product Name': ['t-shirt', 't-shirt', 'skirt', 'skirt'],
'Color': ['blue', 'green', 'red', 'black'],
'Price': [10, 15, 17, 20]
})

You can write both rows and columns, like in the example above. There are other ways to create a DataFrame. For example, you can add columns (column names) after the rows:

df2 = pd.DataFrame([
[1, 't-shirt', 'blue', 10],
[2, 'skirt', 'blue', 15],
[3, 'jeans', 'blue', 12],
[4, 'dress', 'blue', 20]],
columns = ['Product ID', 'Product Name', 'Color', 'Price'] )

You can also use csv files:

df = pd.read_csv('data.csv')

You can read the first 5 rows:

df.head()

You can use the tail() function to read the last 5 rows. You can use the info() function to get information about the data:

df.info()

Removing empty cells in Pandas

The DataFrame below shows a restaurant's orders. It displays the id, the food, and the price (in USD) of the orders:

import pandas as pd
food_order = pd.DataFrame([
[1, "pizza", 10],
[2, "pasta", 8],
[3, "lunch box", 12],
[4, "pizza", 8,],
[5, "pasta", 6],
[6, "lunch box", 9],
[7, "dessert", 7]
], columns= ["id", "Food", "Price"])

Let's add another column called "Delivery Method" to our DataFrame:

import numpy as np
food_order["Delivery Method"] = ["Delivery", np.nan, "Click & Collect", np.nan, "Delivery", np.nan, np.nan]
print(food_order)

Adding a new column in Pandas

The Numpy library is used to create NaN values.

You can either remove rows with empty cells or replace them with a value.

You can remove empty cells:

food_order = food_order.dropna()
print(food_order)

the dataframe after removing rows with empty cell in pandas

The method above removes all the rows with empty values. dropna() is a new method and if you are using an old version of Pandas, dropna() method won't work. You need to install 2.0.0 or a newer version of Pandas.

Fill empty cells with a value

If you want to remove empty cells and change the original data, you need to add "inplace=True" argument. You can also replace empty cells with a value.

food_order.fillna("Not available", inplace=True)
print(food_order)

filling empty cells with a value in pandas

Removing duplicates

If there are duplicate rows in your DataFrame, you can remove duplicates using drop_duplicates:

food_order.drop_duplicates(inplace=True)

Pandas - Plotting

import pandas as pd
food_order = pd.DataFrame([
[1, "pizza", 12, 10],
[2, "pasta", 13, 8],
[3, "lunch box", 12, 12],
[4, "salad", 14, 8,],
[5, "soup", 13, 6],
[6, "burger", 15, 9],
[7, "dessert", 16, 7]
], columns= ["id", "Food", "Time", "Price"])

Pandas library uses the plot() method to create diagrams. You can specify the plot type:

food_order.plot(kind="scatter")

You need to import the Matplotlib library to display the plot.

import matplotlib.pyplot as plt

You can also specify labels for the x and y axes:

food_order.plot(kind="scatter", x="Price", y="Time")
plt.show()

plotting a dataframe in pandas

You can see the relationship between price and time variables.

Histogram

You can create a histogram for only one column:

food_order["Price"].plot(kind="hist")

pandas histogram example

You can show only one histogram per column.

How to get specific columns and rows in Pandas

import pandas as pd
food_order = pd.DataFrame([
[1, "pizza", 12, 10],
[2, "pasta", 13, 8],
[3, "lunch box", 12, 12],
[4, "salad", 14, 8,],
[5, "soup", 13, 6],
[6, "burger", 15, 9],
[7, "dessert", 16, 7]
], columns= ["id", "Food", "Time", "Price"])

There are different ways to get a specific column or columns:

food_and_price = food_order[["Food", "Price"]]
print(food_and_price)

displaying selected columns of the dataframe

You can also get a specific row or rows:

selected_rows = food_order.iloc[3:7]
print(selected_rows)

displaying selected rows of the dataframe

You can also select rows with logic statements:

late_and_expensive_orders = food_order[(food_order.Price > 8) & (food_order.Time > 14)]
print(late_and_expensive_orders)

applying logic statement in pandas dataframe

According to the statement above, the price of the orders should be higher than $7, and the time should be after 14.

How to modify DataFrames in Pandas

How to rename columns

You can use the rename() function and change the name of the specified column:

food_order.rename(columns={"Time": "pm"}, inplace=True)
print(food_order)

applying rename function to pandas dataframe

pm is the new name of the Time column.

How to reshape a Pandas DataFrame

import pandas as pd
school_grades = pd.DataFrame([
["Maria", "Biology", 70],
["Jack", "Math", 80],
["Dennis", "Chemistry", 72],
["Melissa", "Math", 80],
["Paul", "Biology", 60],
["Jack", "Biology", 90],
["Maria", "Math", 70]
], columns= ["Name", "Course", "Grade"])

You can use the pivot() function to change the DataFrame:

final_grades = school_grades.pivot(columns="Course", index="Name", values="Grade")

building a pivot table using pandas

You need to specify column, row (in index) and values for pivot function.

Aggregate in Pandas

Adding a column in Pandas

import pandas as pd
food_order = pd.DataFrame([
[1, "pizza", 12, 10],
[2, "pasta", 13, 8],
[3, "lunch box", 12.30, 12],
[4, "salad", 14, 8,],
[5, "soup", 13, 6],
[6, "burger", 15, 9],
[7, "dessert", 16, 7]
], columns= ["id", "Food", "Time", "Price"])

food_order DataFrame will be used for the examples below.

Adding a column to an existing DataFrame:

food_order["Amount"] = [1, 3, 2, 2, 2, 3, 1]
print(food_order)

adding a new column in pandas

Lambda Functions in Pandas

You can use lambda functions to add a new column:

food_order["Order"] = food_order.apply(lambda row: "Budget" if row["Price"] < 10 else "Expensive", axis=1)
print(food_order)

applying lambda function to dataframe

The function above adds a new column called "Order". The "Order" column shows whether the price is expensive or budget.

Let's change the DataFrame and test the groupby method:

import pandas as pd
food_order = pd.DataFrame([
[1, "pizza", 12, 10],
[2, "pasta", 13, 8],
[3, "lunch box", 12, 12],
[4, "pizza", 14, 8],
[5, "pasta", 13, 6],
[6, "lunch box", 15, 9],
[7, "dessert", 16, 7]
], columns= ["id", "Food", "Time", "Price"])

new_table = food_order.groupby(["Food"])["Price"].mean().reset_index()
print(new_table)

The groupby method in the example above displays two columns (Food and Price). The price column shows the average price of the food with the mean() method. You can find the result below:

applying pandas groupby and mean methods

How to merge tables in Pandas

There are different ways to merge DataFrames in Pandas. You need to use the merge() function:

new_table = pd.merge(table_one, table_two)

The syntax above may not work for every table.

import pandas as pd
school_grades1 = pd.DataFrame([
["Jenny", "Biology", 70],
["Dennis", "Math", 80],
["Jack", "Chemistry", 72],
], columns= ["Name", "Course", "Grade"])
school_grades2 = pd.DataFrame([
["Dennis", "Biology", 80],
["Jenny", "Math", 40],
["Jack", "Geography", 32],
["Maria", "Chemistry", 42]
], columns= ["Name", "Course", "Grade"])

Left, right, inner, outer merge in Pandas

You can also merge on a specific column. Let's merge the two DataFrames above:

all_grades = pd.merge(school_grades1, school_grades2, how="left", on="Name")

The left merge function above includes all rows from the first (school_grades1) table but only rows from the second (school_grades2) table that match the first table.

You can find the result below:

pandas mean method

You can use the same syntax for right, outer, and inner merges.
The right merge is just like the left merge. Inner merge only includes matching rows. An outer merge includes all rows from both tables, even if they don't match. Let's try outer merge and inner merge:

all = pd.merge(school_grades1, school_grades2, how="outer", on="Name")
print(all)

pandas library outer merge in pandas

all = pd.merge(school_grades1, school_grades2, how="inner", on="Name")
print(all)

pandas library inner merge in pandas

Concat function

If all the columns are the same, you can also use the concat() function:

pd.concat([school_grades1, school_grades2])

concat method in pandas library