Corona Virus- Data Viz

(Credit to WHO)

病毒資料集:

冠狀病毒 or 新冠性病毒 or 新冠肺炎 AKA 武漢肺炎現在正在猖狂的流行,基本上全球都淪陷了,在這段期間大家要小心啊......

資料來源:
https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

資料集是來自WHO(CHO),而John Hopkins University 也將其應用於製作更清楚的dashboard
以利大家參考。

視覺化的表現後,可以讓病毒在全球的蹤跡一覽無遺,也能理解病毒擴散的歷史,
當然也是本篇文章的主題啦,針對武漢肺炎的視覺化練習,Start!

Data exploration

library(tidyverse)
library(highcharter)
library(lubridate)
library(GridExtra)

可樂娜資料集裡有編號 / 日期/ 行政區 / 國家/ 確診(binary)/ 死亡(binary) / 康復(binary)
corona <- read_csv("covid_19_data.csv")
head(corona)
## # A tibble: 6 x 8
##     SNo ObservationDate `Province/State` `Country/Region` `Last Update`
##   <dbl> <chr>           <chr>            <chr>            <chr>        
## 1     1 01/22/2020      Anhui            Mainland China   1/22/2020 17~
## 2     2 01/22/2020      Beijing          Mainland China   1/22/2020 17~
## 3     3 01/22/2020      Chongqing        Mainland China   1/22/2020 17~
## 4     4 01/22/2020      Fujian           Mainland China   1/22/2020 17~
## 5     5 01/22/2020      Gansu            Mainland China   1/22/2020 17~
## 6     6 01/22/2020      Guangdong        Mainland China   1/22/2020 17~
## # ... with 3 more variables: Confirmed <dbl>, Deaths <dbl>,
## #   Recovered <dbl>

為了繪圖的函數,先把國家名字改一下,Mainland China==>China /
 US ==> Unitedstates of America。

但發現Taiwan的病例竟然有1057個很不正常,仔細一看發現實際上的案例變成了
Taipei and environs......可憐哪,某國家主政者整天只會搞些小手段
corona$`Country/Region`[corona$`Country/Region`  == "Mainland China"] = "China"
corona$`Country/Region`[corona$`Country/Region`  == "US"] = "United States of America"

#political sxxt for Taipei and environ
corona %>% group_by(`Country/Region`) %>%
          summarise(Tconfirmed = sum(Confirmed)) %>%
          filter(`Country/Region` == 'Taiwan' | `Country/Region` =='Taipei and environs') ->
Taiwan

Taiwan
## # A tibble: 2 x 2
##   `Country/Region`    Tconfirmed
##   <chr>                    <dbl>
## 1 Taipei and environs         47
## 2 Taiwan                    1057

有沒有人願意說明一下1057是哪裡來的....我先把它當成China來的....
然後把Taipei and environs變成Taiwan
corona$`Country/Region`[corona$`Country/Region`  == "Taiwan"] = "China"
corona$`Country/Region`[corona$`Country/Region`  == "Taipei and environs"] = "Taiwan"

整理一下以國家為組別,並計算目前總感染情況,確診人數 / 康復人數 / 死亡人數等
#Count sum of confirmed/recover/rate of recover
corona %>% 
  janitor::clean_names() %>% 
  filter(!country_region %in% 'Others') %>% 
  group_by(country_region) %>% 
  summarise(total_confirmed = sum(confirmed),
            total_recovered = sum(recovered),
            total_death = sum(deaths)) %>% 
  filter(total_confirmed > 0) %>% 
  mutate(log_total_confirmed = log(total_confirmed)) -> countries_confirmed
head(countries_confirmed)
## # A tibble: 6 x 5
##   country_region total_confirmed total_recovered total_death
##   <chr>                    <dbl>           <dbl>       <dbl>
## 1 Afghanistan                 26               0           0
## 2 Albania                     12               0           0
## 3 Algeria                    131               0           0
## 4 Andorra                      9               0           0
## 5 Argentina                   54               0           3
## 6 Armenia                     10               0           0
## # ... with 1 more variable: log_total_confirmed <dbl>

Data Viz

#plot T-confirmed in world
highchart() %>%
  hc_add_series_map(worldgeojson, countries_confirmed, value = 'total_confirmed', joinBy = c('name','country_region'))  %>% 
  #hc_colors(c("darkorange", "darkgray")) %>% 
  hc_colorAxis(stops = color_stops()) %>% 
  hc_title(text = "Countries with nCov exposure") %>% 
  hc_subtitle(text = 'with Total Confirmed - Actual Figures')



#Top 10 country of confirmed 
countries_confirmed %>%
  group_by(country_region) %>%
  arrange(desc(total_confirmed)) %>% 
  head(10) %>%
  hchart("bar",hcaes(x = country_region,  y =total_confirmed)) %>%
  hc_add_theme(hc_theme_sandsignika())
## Warning: `parse_quosure()` is deprecated as of rlang 0.2.0.
## Please use `parse_quo()` instead.
## This warning is displayed once per session.



以日期為組別
#With Date
corona %>%
  group_by(ObservationDate) %>%
  summarise(total_confirmed = sum(Confirmed),
            total_recovered = sum(Recovered),
            total_deaths = sum(Deaths))->Date_confirmed
Date_confirmed$Date <- as.Date(Date_confirmed$ObservationDate, format = "%m/%d/%y")
head(Date_confirmed)
## # A tibble: 6 x 5
##   ObservationDate total_confirmed total_recovered total_deaths Date      
##   <chr>                     <dbl>           <dbl>        <dbl> <date>    
## 1 01/22/2020                  555              28           17 2020-01-22
## 2 01/23/2020                  653              30           18 2020-01-23
## 3 01/24/2020                  941              36           26 2020-01-24
## 4 01/25/2020                 1438              39           42 2020-01-25
## 5 01/26/2020                 2118              52           56 2020-01-26
## 6 01/27/2020                 2927              61           82 2020-01-27

直接看日期與確診/康復/死亡的歷史
#confirmed

C <- ggplot(Date_confirmed, aes(Date, total_confirmed))+
  geom_line(size = 2, alpha = 0.8,color = "#FC4E07")+
  geom_point(shape=21, color="black", fill="blue", size=6) +
  scale_x_date(date_labels = "%b %d", date_breaks = "7 days")
    
#revovered
R <- ggplot(Date_confirmed, aes(Date, total_recovered))+
  geom_line(size = 2, alpha = 0.8,color = "#FC4E07")+
  geom_point(shape=21, color="black", fill="blue", size=6) +
  scale_x_date(date_labels = "%b %d", date_breaks = "7 days")

#deaths
D <- ggplot(Date_confirmed, aes(Date, total_deaths))+
  geom_line(size = 2, alpha = 0.8,color = "#FC4E07")+
  geom_point(shape=21, color="black", fill="blue", size=6) +
  scale_x_date(date_labels = "%b %d", date_breaks = "7 days")

grid.arrange(C, R, D,ncol=1)  


目前大概想得到是這些,R還有很多專為這次疫情產生的package以後可以慢慢介紹
在此願所有防疫人員平平安安並祈禱疫情趕快過去

PS: 這裡的圖可以直接顯示出每個點上的結果
https://rpubs.com/TimoBoll/583802



更新:
目前John Hopkins已將Taipei and the environ更正為Taiwan
感謝外交部

留言

這個網誌中的熱門文章

Word Vector & Word embedding 初探 - with n-Gram & GLOVE Model

文字探勘之關鍵字萃取 : TF-IDF , text-rank , RAKE

多元迴歸分析- subsets and shrinkage