Percentage of Covid numbers changed over the day
This is first in series of Statistical Charts I am going to plot as part of my Towards Building Covid 19 Tracker.
Daily Change is difference between today’s numbers and previous numbers.
Percentage Change would (DailyChange *100/ Today’s numbers)
With Pandas library in python one can do all this calculation with a single function call pct_change
. Here is how you do it
df.pct_change()*100 # where df = data frame
Lets look at our sample data
date,active_count,cure_count,death_count,migrated,source
12/05/20 07:35:04,46008,22454,2293,1,https://www.mohfw.gov.in/
13/05/20 08:56:46,47480,24385,2415,1,https://www.mohfw.gov.in/
14/05/20 14:10:50,49219,26234,2549,1,https://www.mohfw.gov.in/
15/05/20 09:32:04,51401,27919,2649,1,https://www.mohfw.gov.in/
For Daily percentage change we would call as follows
`df = self.dataf.iloc[:,1:-1].pct_change()*100
Once we have this plotting this on Matplotlib needs some manual hand holding since now our data frame is spread across dataframe of pct values and series of date values.
We want to have this plot
There are four parts to this plot.This is an indication to create a Figure using subplots.
We start by getting handles to figure and subplots as follows
fig, ax = plt.subplots()
then we start adding information to figure with subplot handle
a. PCT values vs Recorded Date
ax.plot(date_df, vals) # plot date series on x-axis & pct values on y
b. Grid to track Datapoints over Date
ax.grid(True) # enable grid for the plot
c. Highlighting Data points
To highlight data points, Matplotlib provides format strings. we use “o-” to indicate “o” to be used for data point and “-” for drawing a line between points
So instead of step “a” code we use following code
ax.plot(date_df, vals, '-o', label=txt)# we will come back to label plot
d. Plotting Values at the data points
Plotting values at a data point is not straight forward. Inorder to achieve this we need to use annotate
function. annotate
takes 2 parameters namely
- Value
- XY coordinate
In our case value is the pct value at the data point & xy coordinate is a combination of date index & pct value. Here is how to do that
plt.annotate(f"{round(val,2)}%",xy=[date_df[idx], val])
that’s it. This is all that is needed.
I will add github link soon for complete code.