 Everyone in Germany is speaking about the 2nd wave of Covid-19. Has it already arrived, is it still coming?

The number of new Covid-19 cases is rising for quite a few weeks According to Johns Hopkins University there were more than 1,000 new cases per day in the last days.

That was also the case when there was only a local hotspot in Gütersloh and Warendorf occurred.

But this time there isn’t one hotspot. The infected people live all over Germany.

So I want to visualize the spreading.

## The data

Pavel Mayer produces every day a very detailed table with Covid-19 data per Landkreis. (Landkreis is an sub-area of a Bundesland such as a county of a state in the U.S.)

He uses data provided by German Robert-Koch-Institut (RKI) and adds some data from other sources and tries to clean up the data.

I’m using this data for my visualization. I can’t get exactly the same numbers as Pavel does. I’m not sure what the reason is for the difference. As I’m using the date a case was reported. Maybe he uses the date of infection.

Update 2020-08-14: The reason for the difference between my calculated values and Pavel’s is a different date: I’m calculating the number of infections which are detected on a special day (MeldedatumKlar). Pavel refers his calculated infections to the day they are reported by the RKI (Datenstand). Note: There may be a delay of several days between testing and getting the positive result to the local health department on one hand and sending this data to the RKI. They still use FAX in 2020!

So let’s start:

 ``````1 2 3 4 5 6 7 `````` ``````# Load Libraries suppressMessages(library(tidyverse)) suppressMessages(library(lubridate)) suppressMessages(library(zoo)) # Suppress summarise info options(dplyr.summarise.inform = FALSE) ``````
 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 `````` ``````# Fetch the data url_landkreise_full <- "https://pavelmayer.de/covid/risks/full-data.csv" if(file.exists("data_landkreise_detail.Rda")){ load("data_landkreise_detail.Rda") } else{ data_landkreise_detail <- read_csv(url_landkreise_full, col_types = cols( Bundesland = col_character(), Landkreis = col_character(), Altersgruppe = col_character(), Geschlecht = col_character(), IdLandkreis = col_character(), Datenstand = col_character(), Altersgruppe2 = col_character(), LandkreisName = col_character(), LandkreisTyp = col_character(), NeuerFallKlar = col_character(), RefdatumKlar = col_character(), MeldedatumKlar = col_character(), NeuerTodesfallKlar = col_character(), missingSinceDay = col_integer(), missingCasesInOldRecord = col_integer(), poppedUpOnDay = col_integer() ) ) save(data_landkreise_detail, file = "data_landkreise_detail.Rda") } ``````

The date when an infection is reported is coded in `MeldedatumKlar`. So let’s transform it to a date column:

 ``````1 2 `````` ``````# Set locale to German because there's the German notation of the weekday used Sys.setlocale(category = "LC_ALL", locale = "de_DE.UTF-8") ``````
 ``````1 `````` ``````##  "de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/C" ``````
 ``````1 2 3 4 5 6 7 `````` ``````# change MeldedatumKlar to date data_landkreise_detail_converted <- data_landkreise_detail %>% mutate( MeldedatumKlar = as.Date( strptime(MeldedatumKlar, format = "%a, %d.%m.%Y %H:%M") ) ) ``````

We want to visualize the infections during the last 7 days. So let’s compute them.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 `````` ``````data_landkreise_per_day <- data_landkreise_detail_converted %>% arrange(MeldedatumKlar) %>% group_by(Landkreis, MeldedatumKlar) %>% summarize( infected = sum(AnzahlFall) ) %>% ungroup() %>% complete(MeldedatumKlar, Landkreis, fill = list(infected = 0)) %>% arrange(MeldedatumKlar) %>% group_by(Landkreis) %>% mutate( infected_7 = rollsum(infected, 7, fill = NA, align = "right") ) ``````

In Germany there is a rule that each Landkreis with more than 50 newly infected people per 100,000 residents during the last 7 days must take care of the situation such as restrictions for meeting other people or closing parks, bars and restaurants etc. So we need the number of residents per Landkreis.

Pavel provides this data already in his dataset. He also provides an id (`IdLandkreis`) for each landkreis we will need later. So let’s extract these two fields out of the original data and merge them into the summarized data.frame:

 ``````1 2 3 4 5 6 7 8 9 `````` ``````landkreise <- data_landkreise_detail_converted %>% select(Landkreis, IdLandkreis, Bevoelkerung) %>% unique() data_landkreise_per_day_per_100k <- data_landkreise_per_day %>% left_join(landkreise) %>% mutate( infected_7_per_100k = infected_7/ Bevoelkerung * 100000 ) ``````
 ``````1 `````` ``````## Joining, by = "Landkreis" ``````

## A first map

So now we want to plot this data into a map.

ggplot2 provides some maps such as `map_data("world")`. Unfortunatly German Landkreise aren’t (yet) available.

So we need to fecth them on our own:

 ``````1 2 3 4 5 6 `````` ``````# Don't load the whole raster library because it overwrites the select method from dplyr. # library(raster) suppressMessages(library(rgeos)) suppressMessages(library(maptools)) suppressMessages(library(scales)) suppressMessages(library(ggmap)) ``````
 ``````1 2 3 4 5 6 7 `````` ``````# Fetch data for Landkreise landkreise.shp <- raster::getData("GADM", country = "DEU", level = 2) landkreise.shp.f <- fortify(landkreise.shp, region = "CC_2") landkreise.shp.f <- landkreise.shp.f # Fetch data for Bundesländer laender.shp <- raster::getData("GADM", country = "DEU", level = 1) ``````

Now we need to subset our data to a specific date. Let’s have a look at 2020-03-27:

 ``````1 2 3 4 5 6 7 8 `````` ``````plot_date <- ymd("2020-03-27") mydata <- data_landkreise_per_day_per_100k %>% filter(MeldedatumKlar == plot_date) %>% rename(id = IdLandkreis) ## merge shape file with data merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id") ``````

Now we can plot the data.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 `````` ``````single_map <- function(merge.shp.coef, landkreise.shp.f){ ggplot() + geom_polygon(data = merge.shp.coef, aes(x = long, y = lat, group = group, fill = infected_7_per_100k), color = "black", size = 0.25) + geom_polygon(data = laender.shp, aes(x = long, y = lat, group = group), alpha = 0, color = "black", size = 1.0) + coord_map() + scale_fill_gradient2(midpoint = 15, low = "green", mid = "yellow", high = "red", space = "Lab", limits=c(0, 50), oob=squish, name = "Number of infections per 100k (50 is 50+)") + theme_nothing(legend = TRUE) + labs( title = paste0("Infections per 100.000 Residents during last 7 days at ", format(plot_date, format = "%m/%d/%Y")) ) } ``````
 ``````1 `````` ``````single_map(merge.shp.coef, landkreise.shp.f) ``````
 ``````1 `````` ``````## Regions defined for each Polygons `````` ## Adjustments for Berlin and Göttingen

Wow, great. But if we take a closer look we see three gray parts:

First, Berlin is gray. The reason is that in Pavel’s data Berlin is split into several parts. So we have to combine them for plotting.

The other two gray parts are Göttingen in Lower Saxony. I’m not sure were the error is. Wikipedia lists `03159` as id for Landkreis Göttingen as Pavel does. R fetches two Landkreise: `03152` and `03156` (Göttingen and Osterorde am Harz https://gadm.org/maps/DEU/niedersachsen_2.html).

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 `````` ``````# Fix for Göttingen landkreise.shp.f <- landkreise.shp.f %>% mutate( id = if_else(id == "03152", "03159", id), # fix for Göttingen id = if_else(id == "03156", "03159", id) # fix for Göttingen ) # Fix for Berlin data_landkreise_per_day_per_100k <- data_landkreise_per_day_per_100k %>% mutate( Landkreis = if_else(grepl("Berlin", Landkreis), "Berlin", Landkreis) ) %>% group_by(MeldedatumKlar, Landkreis) %>% summarize( infected = sum(infected, na.rm = TRUE), infected_7 = sum(infected_7, na.rm = TRUE), Bevoelkerung = sum(Bevoelkerung), IdLandkreis = first(IdLandkreis) ) %>% mutate( IdLandkreis = if_else(Landkreis == "Berlin", "11000", IdLandkreis), infected_7_per_100k = infected_7/ Bevoelkerung * 100000 ) # Merge data again plot_date <- ymd("2020-03-27") mydata <- data_landkreise_per_day_per_100k %>% filter(MeldedatumKlar == plot_date) %>% rename(id = IdLandkreis) ## merge shape file with data merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id") ``````
 ``````1 `````` ``````single_map(merge.shp.coef, landkreise.shp.f) ``````
 ``````1 `````` ``````## Regions defined for each Polygons `````` Okay, no more blind spots.

## Animation for the last 4 days

Let’s use `gganimate` to plot different days into one map and generate an animated gif.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 `````` ``````library(gganimate) plot_date <- max(data_landkreise_per_day_per_100k\$MeldedatumKlar) days_of_animation <- 3 mydata <- data_landkreise_per_day_per_100k %>% filter(MeldedatumKlar >= plot_date - days(days_of_animation)) %>% rename(id = IdLandkreis) ## merge shape file with data merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id") animated_map <- single_map(merge.shp.coef, landkreise.shp.f) + transition_manual(MeldedatumKlar) + labs(title = "Date: {current_frame}") ``````
 ``````1 `````` ``````## Regions defined for each Polygons ``````
 ``````1 `````` ``````anim_save("map-3-days.gif", animated_map) ``````
 ``````1 `````` ``````## nframes and fps adjusted to match transition `````` ## Animate the whole pandemic

The more days are animated the more memory is used to compute the gif. I tried to animate all data since the start of the pandemic but my 16G of memory weren’t enough.

So I used only every forth day and waited several minutes:

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 `````` ``````library(gganimate) plot_dates <- seq(min(data_landkreise_per_day_per_100k\$MeldedatumKlar), max(data_landkreise_per_day_per_100k\$MeldedatumKlar), by = 4) mydata <- data_landkreise_per_day_per_100k %>% filter(MeldedatumKlar %in% plot_dates) %>% rename(id = IdLandkreis) ## merge shape file with data merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id") animated_map <- single_map(merge.shp.coef, landkreise.shp.f) + transition_manual(MeldedatumKlar) + labs(title = "Date: {current_frame}") ``````
 ``````1 `````` ``````## Regions defined for each Polygons ``````
 ``````1 `````` ``````# anim_save("map-all-year.gif", animated_map) `````` ## Credits

First I’d like to thank Pavel for his work he did creating his daily table.

I learned much about creating maps with ggplot2 following the steps of Thilo Klein’s page