Task 2
Data preprocessing
Lucky for us, the column names Runkeeper provides are informative, and we don't need to rename any columns.
But, we do notice missing values using the info()
method. What are the reasons for these missing values? It depends. Some heart rate information is missing because I didn't always use a cardio sensor. In the case of the Notes
column, it is an optional field that I sometimes left blank. Also, I only used the Route Name
column once, and never used the Friend's Tagged
column.
We'll fill in missing values in the heart rate column to avoid misleading results later, but right now, our first data preprocessing steps will be to:
Remove columns not useful for our analysis.
Replace the "Other" activity type to "Unicycling" because that was always the "Other" activity.
Count missing values.
Implement the following data preprocessing tasks:
Delete unnecessary columns from
df_activities
with thedrop()
method, setting thecolumns
parameter to thecols_to_drop
list.Calculate the activity type counts using the
value_counts()
method on theType
column.Rename the 'Other' values to 'Unicycling' in the
Type
column usingstr.replace()
.Count the missing values in each column using
isnull().sum()
.
Helpful links:
drop()
function documentationstr.replace()
function documentationisnull()
function documentation
Last updated