Data manipulation and analysis are crucial tasks in Python. The pandas library provides powerful data structures for handling and analyzing data, with the DataFrame being one of the most important structures. This section covers the basics of working with DataFrames, including how to read from and write to CSV and Excel files, and various DataFrame operations.
11.1 What is a DataFrame?
A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). Think of it as a table or spreadsheet where you can store and manipulate data. Each column in a DataFrame can have a different data type (e.g., integers, floats, strings).
The pandas library in Python provides the DataFrame class for creating and working with DataFrames.
11.2 Creating a DataFrame
You can create a DataFrame in several ways, such as from a dictionary, a list of lists, or directly from a CSV file.
11.2.1 From a Dictionary
Here's how to create a DataFrame from a dictionary where the keys are column names and the values are lists of data:
import pandas as pd
# Create a DataFrame from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
11.2.2 From a List of Lists
You can also create a DataFrame from a list of lists, specifying column names:
# Create a DataFrame from a list of lists
data = [
['Alice', 25, 'New York'],
['Bob', 30, 'Los Angeles'],
['Charlie', 35, 'Chicago']
]
columns = ['Name', 'Age', 'City']
df = pd.DataFrame(data, columns=columns)
print(df)
11.3 Reading from and Writing to CSV Files
DataFrames can be easily read from and written to CSV files using pandas.
11.3.1 Reading from a CSV File
To read a CSV file into a DataFrame, use the pd.read_csv() function:
# Read a CSV file into a DataFrame
df = pd.read_csv('data.csv')
print(df)
11.3.2 Writing a DataFrame to a CSV File
To write a DataFrame to a CSV file, use the df.to_csv() method:
# Write the DataFrame to a CSV file
df.to_csv('output.csv', index=False) # index=False excludes row numbers from the CSV file
11.4 Reading from and Writing to Excel Files
The pandas library also supports reading from and writing to Excel files. You will need the openpyxl library to handle Excel files (.xlsx format).
11.4.1 Reading from an Excel File
To read an Excel file into a DataFrame, use the pd.read_excel() function. Ensure you have the openpyxl library installed:
# Install openpyxl if not already installed
# pip install openpyxl
# Read an Excel file into a DataFrame
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
print(df)
11.4.2 Writing a DataFrame to an Excel File
To write a DataFrame to an Excel file, use the df.to_excel() method. You can specify the sheet name and whether to include the index:
# Write the DataFrame to an Excel file
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)
11.5 DataFrame Operations
DataFrames support various operations to manipulate and analyze data. Here are some common operations:
11.5.1 Accessing Data
Access rows and columns using labels and indices:
# Access a single column
print(df['Name'])
# Access multiple columns
print(df[['Name', 'City']])
# Access a single row by index
print(df.iloc[1])
# Access a row by label
print(df.loc[1])
11.5.2 Adding and Removing Columns
To add a new column, simply assign values to a new column name. To remove a column, use the drop() method:
# Add a new column
df['Country'] = ['USA', 'USA', 'USA']
print(df)
# Remove a column
df = df.drop('Country', axis=1)
print(df)
11.5.3 Modifying Data
Modify data within a DataFrame using indexing:
# Modify a specific cell
df.at[0, 'City'] = 'San Francisco'
print(df)
11.5.4 Filtering Data
Filter rows based on conditions:
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)
11.5.5 Getting Column Names and Indexes
Retrieve column names and row indexes:
# Get column names
print(df.columns)
# Get row indexes
print(df.index)
Comments
Post a Comment