Joining attribute data with geofi data

Markus Kainu

2024-02-05

This vignettes provides few examples on how to join attribute data from common sources of attribute data. Here we are using data from THL Sotkanet and Paavo (Open data by postal code area).

Installation

geofi can be installed from CRAN using

# install from CRAN
install.packages("geofi")

# Install development version from GitHub
remotes::install_github("ropengov/geofi")

Municipalities

Municipality data provided by get_municipalities()-function contains 77 indicators variables from each of 309 municipalities. Variables can be used either for aggregating data or as keys for joining attribute data.

Population data from Sotkanet

In this first example we join municipality level indicators of Swedish-speaking population at year end from Sotkanet population data,

library(geofi)
muni <- get_municipalities(year = 2023)

library(sotkanet)
library(dplyr)
sotkadata_swedish_speaking_pop <- GetDataSotkanet(indicators = 2433, years = 2000:2022) %>% 
  filter(region.category == "KUNTA") %>% 
  mutate(municipality_code = as.integer(region.code))

This is not obvious to all, but have the municipality names in Finnish among other regional breakdowns which allows us to combine the data with spatial data using municipality_name_fi-variable.

map_data <- right_join(muni, 
                       sotkadata_swedish_speaking_pop, 
                       by = c("municipality_code" = "municipality_code"))

Now we can plot a map showing Share of Swedish-speakers of the population, % and Share of foreign citizens of the population, % on two panels sharing a scale.


library(ggplot2)
map_data %>% 
  ggplot(aes(fill = primary.value)) + 
  geom_sf() + 
  labs(title = unique(sotkadata_swedish_speaking_pop$indicator.title.fi)) +
  theme(legend.position = "top")

Zipcode level

You can download data from Paavo (Open data by postal code area) using pxweb-package. In this example we will download preformatted population data in csv format directly from Statistics Finland and process it to match with spatial zipcode data.

px_data <- read.csv("https://pxdata.stat.fi:443/PxWeb/sq/43d3d0aa-636e-4a4b-bbe1-decae45fc2b4", 
                    header = TRUE, sep = ";", fileEncoding = "Latin1")
px_data$posti_alue <- sub(" .+$", "", px_data$Postinumeroalue)

Before we can join the data, we must extract the numerical postal code from postal_code_area-variable.

# Lets join with spatial data and plot the area of each zipcode
zipcodes19 <- get_zipcodes(year = 2019) 
zipcodes_map <- left_join(zipcodes19, 
                          px_data)
ggplot(zipcodes_map) + 
  geom_sf(aes(fill = X2022), 
          color  = alpha("white", 1/3)) +
  labs(title = "Total number of inhabitants, 2022", 
       fill = NULL)