COVID-19 data analysis
Let us revisit the CDC Covid-19 Case Surveillance Data. There are well over 3 million entries of individual, de-identified patient data. Since this is a large file, I suggest you use vroom to load it and you keep cache=TRUE in the chunk options.
# URL link to CDC to download data
url <- "https://data.cdc.gov/api/views/vbim-akqf/rows.csv?accessType=DOWNLOAD"
covid_data <- vroom(url)%>%
clean_names()
Given the data we have, I would like you to produce two graphs that show death % rate: (Do we need to perform data cleansing on missing values? to be discussed) (Double faceting for both, use facet_grid, check R docu) 1. by age group, sex, and whether the patient had co-morbidities or not sex, age_group, medcond_yn, death_yn!
covid_data_deathrate1 <- covid_data %>%
filter(!(death_yn %in%c("Missing","Unknown")),
!(medcond_yn %in%c("Missing","Unknown")),
!(is.na(age_group)),
!(age_group %in%c("Missing","Unknown")),
!(is.na(sex)),
!(sex %in%c("Missing","Unknown"))) %>%
group_by (age_group, sex, medcond_yn, death_yn) %>%
count(death_yn) %>%
group_by (age_group, sex, medcond_yn) %>%
mutate(death_rate = n*100/sum(n)) %>%
filter(death_yn=="Yes") %>% mutate(comorb = ifelse(medcond_yn=="No","Without Comorbidities","With Comorbidities"), death_rate = round(death_rate,1))
covid_data_deathrate1 %>%
ggplot(aes(x= age_group, y = death_rate, fill=sex))+
geom_col()+
coord_flip()+
facet_grid(col=vars(sex),row=vars(comorb))+
theme(axis.text.x = element_text(angle = 30)) +
geom_text(aes(label=death_rate), position=position_dodge(width=0),hjust = -0.25)+ theme_bw()+
labs(title="Covid death % by age group, sex and presence of co-morbidities",
x="",y="", caption="Source:CDC")+
theme(legend.position = "none")
1. by age group, sex, and whether the patient was admitted to Intensive Care Unit (ICU) or not.
sex, age_group, icu_yn, death_yn!
