Corona Virus- Data Viz
(Credit to WHO)
資料來源:
https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
資料集是來自WHO(CHO),而John Hopkins University 也將其應用於製作更清楚的dashboard
以利大家參考。
視覺化的表現後,可以讓病毒在全球的蹤跡一覽無遺,也能理解病毒擴散的歷史,
當然也是本篇文章的主題啦,針對武漢肺炎的視覺化練習,Start!
可樂娜資料集裡有編號 / 日期/ 行政區 / 國家/ 確診(binary)/ 死亡(binary) / 康復(binary)
為了繪圖的函數,先把國家名字改一下,Mainland China==>China /
US ==> Unitedstates of America。
但發現Taiwan的病例竟然有1057個很不正常,仔細一看發現實際上的案例變成了
Taipei and environs......可憐哪,某國家主政者整天只會搞些小手段
有沒有人願意說明一下1057是哪裡來的....我先把它當成China來的....
然後把Taipei and environs變成Taiwan
整理一下以國家為組別,並計算目前總感染情況,確診人數 / 康復人數 / 死亡人數等
以日期為組別
直接看日期與確診/康復/死亡的歷史
目前大概想得到是這些,R還有很多專為這次疫情產生的package以後可以慢慢介紹
在此願所有防疫人員平平安安並祈禱疫情趕快過去
PS: 這裡的圖可以直接顯示出每個點上的結果
https://rpubs.com/TimoBoll/583802
更新:
目前John Hopkins已將Taipei and the environ更正為Taiwan
感謝外交部
病毒資料集:
冠狀病毒 or 新冠性病毒 or 新冠肺炎 AKA 武漢肺炎現在正在猖狂的流行,基本上全球都淪陷了,在這段期間大家要小心啊......資料來源:
https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
資料集是來自WHO(
以利大家參考。
視覺化的表現後,可以讓病毒在全球的蹤跡一覽無遺,也能理解病毒擴散的歷史,
當然也是本篇文章的主題啦,針對武漢肺炎的視覺化練習,Start!
Data exploration
library(tidyverse)
library(highcharter)
library(lubridate)
library(GridExtra)
可樂娜資料集裡有編號 / 日期/ 行政區 / 國家/ 確診(binary)/ 死亡(binary) / 康復(binary)
corona <- read_csv("covid_19_data.csv")
head(corona)
## # A tibble: 6 x 8
## SNo ObservationDate `Province/State` `Country/Region` `Last Update`
## <dbl> <chr> <chr> <chr> <chr>
## 1 1 01/22/2020 Anhui Mainland China 1/22/2020 17~
## 2 2 01/22/2020 Beijing Mainland China 1/22/2020 17~
## 3 3 01/22/2020 Chongqing Mainland China 1/22/2020 17~
## 4 4 01/22/2020 Fujian Mainland China 1/22/2020 17~
## 5 5 01/22/2020 Gansu Mainland China 1/22/2020 17~
## 6 6 01/22/2020 Guangdong Mainland China 1/22/2020 17~
## # ... with 3 more variables: Confirmed <dbl>, Deaths <dbl>,
## # Recovered <dbl>
為了繪圖的函數,先把國家名字改一下,Mainland China==>China /
US ==> Unitedstates of America。
但發現Taiwan的病例竟然有1057個很不正常,仔細一看發現實際上的案例變成了
Taipei and environs......可憐哪,某國家主政者整天只會搞些小手段
corona$`Country/Region`[corona$`Country/Region` == "Mainland China"] = "China"
corona$`Country/Region`[corona$`Country/Region` == "US"] = "United States of America"
#political sxxt for Taipei and environ
corona %>% group_by(`Country/Region`) %>%
summarise(Tconfirmed = sum(Confirmed)) %>%
filter(`Country/Region` == 'Taiwan' | `Country/Region` =='Taipei and environs') ->
Taiwan
Taiwan
## # A tibble: 2 x 2
## `Country/Region` Tconfirmed
## <chr> <dbl>
## 1 Taipei and environs 47
## 2 Taiwan 1057
有沒有人願意說明一下1057是哪裡來的....我先把它當成China來的....
然後把Taipei and environs變成Taiwan
corona$`Country/Region`[corona$`Country/Region` == "Taiwan"] = "China"
corona$`Country/Region`[corona$`Country/Region` == "Taipei and environs"] = "Taiwan"
整理一下以國家為組別,並計算目前總感染情況,確診人數 / 康復人數 / 死亡人數等
#Count sum of confirmed/recover/rate of recover
corona %>%
janitor::clean_names() %>%
filter(!country_region %in% 'Others') %>%
group_by(country_region) %>%
summarise(total_confirmed = sum(confirmed),
total_recovered = sum(recovered),
total_death = sum(deaths)) %>%
filter(total_confirmed > 0) %>%
mutate(log_total_confirmed = log(total_confirmed)) -> countries_confirmed
head(countries_confirmed)
## # A tibble: 6 x 5
## country_region total_confirmed total_recovered total_death
## <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan 26 0 0
## 2 Albania 12 0 0
## 3 Algeria 131 0 0
## 4 Andorra 9 0 0
## 5 Argentina 54 0 3
## 6 Armenia 10 0 0
## # ... with 1 more variable: log_total_confirmed <dbl>
Data Viz
#plot T-confirmed in world
highchart() %>%
hc_add_series_map(worldgeojson, countries_confirmed, value = 'total_confirmed', joinBy = c('name','country_region')) %>%
#hc_colors(c("darkorange", "darkgray")) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "Countries with nCov exposure") %>%
hc_subtitle(text = 'with Total Confirmed - Actual Figures')
#Top 10 country of confirmed
countries_confirmed %>%
group_by(country_region) %>%
arrange(desc(total_confirmed)) %>%
head(10) %>%
hchart("bar",hcaes(x = country_region, y =total_confirmed)) %>%
hc_add_theme(hc_theme_sandsignika())
## Warning: `parse_quosure()` is deprecated as of rlang 0.2.0.
## Please use `parse_quo()` instead.
## This warning is displayed once per session.
以日期為組別
#With Date
corona %>%
group_by(ObservationDate) %>%
summarise(total_confirmed = sum(Confirmed),
total_recovered = sum(Recovered),
total_deaths = sum(Deaths))->Date_confirmed
Date_confirmed$Date <- as.Date(Date_confirmed$ObservationDate, format = "%m/%d/%y")
head(Date_confirmed)
## # A tibble: 6 x 5
## ObservationDate total_confirmed total_recovered total_deaths Date
## <chr> <dbl> <dbl> <dbl> <date>
## 1 01/22/2020 555 28 17 2020-01-22
## 2 01/23/2020 653 30 18 2020-01-23
## 3 01/24/2020 941 36 26 2020-01-24
## 4 01/25/2020 1438 39 42 2020-01-25
## 5 01/26/2020 2118 52 56 2020-01-26
## 6 01/27/2020 2927 61 82 2020-01-27
直接看日期與確診/康復/死亡的歷史
#confirmed
C <- ggplot(Date_confirmed, aes(Date, total_confirmed))+
geom_line(size = 2, alpha = 0.8,color = "#FC4E07")+
geom_point(shape=21, color="black", fill="blue", size=6) +
scale_x_date(date_labels = "%b %d", date_breaks = "7 days")
#revovered
R <- ggplot(Date_confirmed, aes(Date, total_recovered))+
geom_line(size = 2, alpha = 0.8,color = "#FC4E07")+
geom_point(shape=21, color="black", fill="blue", size=6) +
scale_x_date(date_labels = "%b %d", date_breaks = "7 days")
#deaths
D <- ggplot(Date_confirmed, aes(Date, total_deaths))+
geom_line(size = 2, alpha = 0.8,color = "#FC4E07")+
geom_point(shape=21, color="black", fill="blue", size=6) +
scale_x_date(date_labels = "%b %d", date_breaks = "7 days")
grid.arrange(C, R, D,ncol=1)
目前大概想得到是這些,R還有很多專為這次疫情產生的package以後可以慢慢介紹
在此願所有防疫人員平平安安並祈禱疫情趕快過去
PS: 這裡的圖可以直接顯示出每個點上的結果
https://rpubs.com/TimoBoll/583802
更新:
目前John Hopkins已將Taipei and the environ更正為Taiwan
感謝外交部
留言
張貼留言