Pandas
Pandas - What and Why?
Pandas is a popular library for doing data manipulation in Python
Works with tabular data, i.e. rows and columns
Commonly used with Matplotlib, numpy, scipy and many others
DataFrames
DataFrames are the main data type in Pandas and hold data as a set of columns called series.
To create a DataFrame from hard coded values, pass a dictionary to the DataFrame()
function. Each key should correspond to a column name, and each value to a list of row values.
Loading Data From File
Pandas lets us create DataFrames from csv files using the read_csv()
function.
Selecting Columns
Sometimes we may want to work with only a single column from a DataFrame, we can use indexing like we have seen with lists and dictionaries to get a column by its name.
Selecting Rows
We can also select a specific subset of rows from DataFrame using the indexing similar to lists. We use the iloc()
function to enable us to select rows by index.
Filtering Rows
Sometimes the rows we want may not be grouped together or we might not know their indices. In these cases we may be able to create an expression for filtering out the rows we aren't interested in.
Last updated