Task 2

Data preprocessing

Lucky for us, the column names Runkeeper provides are informative, and we don't need to rename any columns.

But, we do notice missing values using the info() method. What are the reasons for these missing values? It depends. Some heart rate information is missing because I didn't always use a cardio sensor. In the case of the Notes column, it is an optional field that I sometimes left blank. Also, I only used the Route Name column once, and never used the Friend's Tagged column.

We'll fill in missing values in the heart rate column to avoid misleading results later, but right now, our first data preprocessing steps will be to:

  • Remove columns not useful for our analysis.

  • Replace the "Other" activity type to "Unicycling" because that was always the "Other" activity.

  • Count missing values.

Implement the following data preprocessing tasks:

  • Delete unnecessary columns from df_activities with the drop() method, setting the columns parameter to the cols_to_drop list.

  • Calculate the activity type counts using the value_counts() method on the Type column.

  • Rename the 'Other' values to 'Unicycling' in the Type column using str.replace().

  • Count the missing values in each column using isnull().sum().

Last updated