Task 2
Data preprocessing
Lucky for us, the column names Runkeeper provides are informative, and we don't need to rename any columns.
But, we do notice missing values using the info() method. What are the reasons for these missing values? It depends. Some heart rate information is missing because I didn't always use a cardio sensor. In the case of the Notes column, it is an optional field that I sometimes left blank. Also, I only used the Route Name column once, and never used the Friend's Tagged column.
We'll fill in missing values in the heart rate column to avoid misleading results later, but right now, our first data preprocessing steps will be to:
Remove columns not useful for our analysis.
Replace the "Other" activity type to "Unicycling" because that was always the "Other" activity.
Count missing values.
Implement the following data preprocessing tasks:
Delete unnecessary columns from
df_activitieswith thedrop()method, setting thecolumnsparameter to thecols_to_droplist.Calculate the activity type counts using the
value_counts()method on theTypecolumn.Rename the 'Other' values to 'Unicycling' in the
Typecolumn usingstr.replace().Count the missing values in each column using
isnull().sum().
Helpful links:
drop()function documentationstr.replace()function documentationisnull()function documentation
Last updated
Was this helpful?