🔄 Quick Recap (Day 20)
You learned to manipulate numerical data with NumPy arrays: creation, operations, indexing, and statistics.
🎯 What You’ll Learn Today
What Pandas is and why DataFrames are useful.
How to create a DataFrame from dictionaries and CSV files.
Basic operations: selecting, filtering, and sorting data.
Simple aggregations:
groupby,mean,sum, anddescribe.
📖 Why Pandas?
Pandas builds on NumPy to provide DataFrame objects—two-dimensional tables with labeled rows and columns:
Ease of use: Read/write CSV, Excel, SQL, and more with simple functions.
Powerful indexing: Label-based and integer-based selection.
Rich functionality: Grouping, reshaping, time series support.
Use Pandas to clean, explore, and analyze tabular data quickly.
📖 Creating DataFrames
Install Pandas if needed:
pip install pandasImport the library:
import pandas as pdFrom a dictionary:
data = {'name': ['Alice','Bob','Charlie'], 'age': [25,30,35], 'score': [85,90,95]} df = pd.DataFrame(data) print(df)From CSV (example file
data.csv):df_csv = pd.read_csv('data.csv') print(df_csv.head()) # first 5 rows
📖 Selecting & Filtering Data
Select column:
ages = df['age']Select row by label:
row0 = df.loc[0]Select by position:
row1 = df.iloc[1]Filter rows:
high_scores = df[df['score'] > 90]
📖 Sorting & Aggregations
Sort by column:
df_sorted = df.sort_values('age')Group and aggregate:
grouped = df.groupby('age')['score'].mean()Summary statistics:
print(df.describe())
🧙♂️ Take the Wand and Try Yourself
Create
pandas_practice.py.DataFrame creation:
Build a DataFrame for sales data:
{'month':['Jan','Feb','Mar'], 'sales':[100,150,130]}.
CSV I/O:
Save the DataFrame to
sales.csv(df.to_csv('sales.csv', index=False)).Read it back into a new DataFrame.
Analysis:
Filter months with sales > 120.
Sort by sales descending.
Print summary statistics using
describe().
Solution Example (pandas_practice.py):
import pandas as pd
data = {'month':['Jan','Feb','Mar'], 'sales':[100,150,130]}
df = pd.DataFrame(data)
df.to_csv('sales.csv', index=False)
# Read CSV
df2 = pd.read_csv('sales.csv')
print(df2)
# Filter and sort
print(df2[df2['sales'] > 120])
print(df2.sort_values('sales', ascending=False))
# Summary
print(df2['sales'].describe())Expected output:
month sales
0 Jan 100
1 Feb 150
2 Mar 130
month sales
1 Feb 150
2 Mar 130
month sales
1 Feb 150
2 Mar 130
0 Jan 100
count 3.000000
mean 126.666667
std 25.166115
min 100.000000
25% 115.000000
50% 130.000000
75% 140.000000
max 150.000000
Name: sales, dtype: float64Run:
python pandas_practice.pyOnce you see matching outputs and summary stats, you’ve conquered Pandas DataFrames!
Up next: Day 22: Data Visualization — bring your data to life with charts.