🔄 Quick Recap (Day 20)

  • You learned to manipulate numerical data with NumPy arrays: creation, operations, indexing, and statistics.

🎯 What You’ll Learn Today

  1. What Pandas is and why DataFrames are useful.

  2. How to create a DataFrame from dictionaries and CSV files.

  3. Basic operations: selecting, filtering, and sorting data.

  4. Simple aggregations: groupby, mean, sum, and describe.

📖 Why Pandas?

Pandas builds on NumPy to provide DataFrame objects—two-dimensional tables with labeled rows and columns:

  • Ease of use: Read/write CSV, Excel, SQL, and more with simple functions.

  • Powerful indexing: Label-based and integer-based selection.

  • Rich functionality: Grouping, reshaping, time series support.

Use Pandas to clean, explore, and analyze tabular data quickly.

📖 Creating DataFrames

  1. Install Pandas if needed:

    pip install pandas
  2. Import the library:

    import pandas as pd
  3. From a dictionary:

    data = {'name': ['Alice','Bob','Charlie'], 'age': [25,30,35], 'score': [85,90,95]}
    df = pd.DataFrame(data)
    print(df)
  4. From CSV (example file data.csv):

    df_csv = pd.read_csv('data.csv')
    print(df_csv.head())  # first 5 rows

📖 Selecting & Filtering Data

  • Select column:

    ages = df['age']
  • Select row by label:

    row0 = df.loc[0]
  • Select by position:

    row1 = df.iloc[1]
  • Filter rows:

    high_scores = df[df['score'] > 90]

📖 Sorting & Aggregations

  • Sort by column:

    df_sorted = df.sort_values('age')
  • Group and aggregate:

    grouped = df.groupby('age')['score'].mean()
  • Summary statistics:

    print(df.describe())

🧙‍♂️ Take the Wand and Try Yourself

  1. Create pandas_practice.py.

  2. DataFrame creation:

    • Build a DataFrame for sales data: {'month':['Jan','Feb','Mar'], 'sales':[100,150,130]}.

  3. CSV I/O:

    • Save the DataFrame to sales.csv (df.to_csv('sales.csv', index=False)).

    • Read it back into a new DataFrame.

  4. Analysis:

    • Filter months with sales > 120.

    • Sort by sales descending.

    • Print summary statistics using describe().

Solution Example (pandas_practice.py):

import pandas as pd

data = {'month':['Jan','Feb','Mar'], 'sales':[100,150,130]}
df = pd.DataFrame(data)
df.to_csv('sales.csv', index=False)

# Read CSV
df2 = pd.read_csv('sales.csv')
print(df2)

# Filter and sort
print(df2[df2['sales'] > 120])
print(df2.sort_values('sales', ascending=False))

# Summary
print(df2['sales'].describe())

Expected output:

  month  sales
0   Jan    100
1   Feb    150
2   Mar    130
  month  sales
1   Feb    150
2   Mar    130
  month  sales
1   Feb    150
2   Mar    130
0   Jan    100
count      3.000000
mean     126.666667
std       25.166115
min      100.000000
25%      115.000000
50%      130.000000
75%      140.000000
max      150.000000
Name: sales, dtype: float64

Run:

python pandas_practice.py

Once you see matching outputs and summary stats, you’ve conquered Pandas DataFrames!

Up next: Day 22: Data Visualization — bring your data to life with charts.

Keep Reading