Task 2
Last updated
Was this helpful?
Last updated
Was this helpful?
Lucky for us, the column names Runkeeper provides are informative, and we don't need to rename any columns.
But, we do notice missing values using the info()
method. What are the reasons for these missing values? It depends. Some heart rate information is missing because I didn't always use a cardio sensor. In the case of the Notes
column, it is an optional field that I sometimes left blank. Also, I only used the Route Name
column once, and never used the Friend's Tagged
column.
We'll fill in missing values in the heart rate column to avoid misleading results later, but right now, our first data preprocessing steps will be to:
Remove columns not useful for our analysis.
Replace the "Other" activity type to "Unicycling" because that was always the "Other" activity.
Count missing values.
Delete unnecessary columns from df_activities
with the drop()
method, setting the columns
parameter to the cols_to_drop
list.
Calculate the activity type counts using the value_counts()
method on the Type
column.
Rename the 'Other' values to 'Unicycling' in the Type
column using str.replace()
.
Count the missing values in each column using isnull().sum()
.
drop()
function
str.replace()
function
isnull()
function