---
title: "geobounds: Accessing Global Administrative Boundary Data in R"
bibliography: REFERENCES.bib
link-citations: true
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{geobounds: Accessing Global Administrative Boundary Data in R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
[**Attribution**](https://www.geoboundaries.org/index.html#usage) **is required
when using geoBoundaries.**
## Introduction
The **geobounds** package provides a straightforward interface for downloading
and working with global political and administrative boundary data from the
[geoBoundaries](https://www.geoboundaries.org/) project
[@10.1371/journal.pone.0231866].
These datasets are openly licensed ([CC BY
4.0](https://creativecommons.org/licenses/by/4.0/)) and cover countries
worldwide across multiple administrative levels. The package supports different
geoBoundaries release types: gbOpen, gbHumanitarian, and gbAuthoritative, which
vary in validation levels and licensing. With **geobounds**, you can easily
fetch boundary geometries as **sf** objects, explore metadata, cache datasets
locally, and seamlessly integrate the boundaries into your spatial workflows.
## Understanding the data
The geoBoundaries database undergoes a rigorous quality assurance process,
including manual review and hand-digitization of physical maps where necessary.
Its primary goal is to provide the highest possible level of spatial accuracy
for scientific and academic applications.
This precision comes at a cost: some files can be quite large and may take
longer to download. For visualization or general mapping purposes, we recommend
using the simplified datasets available by setting `simplified = TRUE`.
``` r
library(geobounds)
library(ggplot2)
library(dplyr)
# Different resolutions
norway <- gb_get_adm0("NOR") |>
mutate(res = "Full resolution")
print(object.size(norway), units = "Mb")
#> 26.5 Mb
norway_simp <- gb_get_adm0(country = "NOR", simplified = TRUE) |>
mutate(res = "Simplified")
print(object.size(norway_simp), units = "Mb")
#> 1.5 Mb
norway_all <- bind_rows(norway, norway_simp)
# Plot ggplot2
ggplot(norway_all) +
geom_sf(fill = "#BA0C2F", color = "#00205B") +
facet_wrap(vars(res)) +
theme_minimal() +
labs(caption = "Source: www.geoboundaries.org")
```
### Individual country files
The geoBoundaries API provides [individual country
files](https://www.geoboundaries.org/countryDownloads.html), whose aim is to
represent every nation "as they would represent themselves", with no special
identification of disputed areas.
The download of this data is implemented in `gb_get()` and the `?gb_get_adm`
family of functions. It is not guaranteed that borders align perfectly or that
there are no gaps between countries. Additionally, these files do not include a
special identification of disputed areas.
``` r
india_pak <- gb_get_adm0(c("India", "Pakistan"))
# Disputed area: Kashmir
ggplot(india_pak) +
geom_sf(aes(fill = shapeName), alpha = 0.5) +
scale_fill_manual(values = c("#FF671F", "#00401A")) +
labs(
fill = "Country",
title = "Map of India & Pakistan",
subtitle = "Note overlapping in Kashmir region",
caption = "Source: www.geoboundaries.org"
)
```
Note that individual data files are governed by the license or licenses
identified within the metadata for each respective boundary.
``` r
gb_get_metadata(c("India", "Pakistan"), adm_lvl = "ADM0") |>
select(boundaryName, boundaryLicense, boundarySource)
#> # A tibble: 2 × 3
#> boundaryName boundaryLicense boundarySource
#>
#> 1 India CC0 1.0 Universal (CC0 1.0) Public Domain Dedication geoBoundaries, Wikimedia Comm…
#> 2 Pakistan Open Data Commons Open Database License 1.0 OpenStreetMap, Wambacher
```
### Composite files
If you would prefer data where disputed areas are explicitly handled (by
removing overlaps and filling gaps), please use `gb_get_world()`. This function
downloads global composite datasets for administrative boundaries, also known as
CGAZ (Comprehensive Global Administrative Zones). There are three important
distinctions between CGAZ and individual country downloads:
1. Extensive simplification is performed to ensure that file sizes are small
enough to be used in most traditional desktop software.
2. Disputed areas are removed and replaced with polygons following US
Department of State definitions.
3. Gaps between borders have been filled.
``` r
cgaz_india_pak <- gb_get_world(c("India", "Pakistan"))
ggplot(cgaz_india_pak) +
geom_sf(aes(fill = shapeName), alpha = 0.5) +
scale_fill_manual(values = c("#FF671F", "#00401A")) +
labs(
fill = "Country",
title = "Map of India & Pakistan",
subtitle = "CGAZ does not overlap",
caption = "Source: www.geoboundaries.org"
)
```
## Caching and performance
The package provides a built-in mechanism to cache files locally so that
repeated downloads for the same country/level will use the cached version. For
example:
``` r
# Current folder
current <- gb_detect_cache_dir()
#> ℹ 'C:\Users\diego\AppData\Local\Temp\Rtmp6BaCrq'
current
#> [1] "C:\\Users\\diego\\AppData\\Local\\Temp\\Rtmp6BaCrq"
# Change to new
newdir <- file.path(tempdir(), "/geoboundvignette")
gb_set_cache_dir(newdir)
#> ✔ geobounds cache dir is 'C:\Users\diego\AppData\Local\Temp\Rtmp6BaCrq//geoboundvignette'.
#> ℹ To install your `cache_dir` path for use in future sessions run this function with `install = TRUE`.
# Download
example <- gb_get_adm0("Vatican City", quiet = FALSE)
#> ℹ Downloading file from
#> → Cache dir is 'C:\Users\diego\AppData\Local\Temp\Rtmp6BaCrq//geoboundvignette/gbOpen'
# Restore cache dir
gb_set_cache_dir(current)
#> ✔ geobounds cache dir is 'C:\Users\diego\AppData\Local\Temp\Rtmp6BaCrq'.
#> ℹ To install your `cache_dir` path for use in future sessions run this function with `install = TRUE`.
current == gb_detect_cache_dir()
#> ℹ 'C:\Users\diego\AppData\Local\Temp\Rtmp6BaCrq'
#> [1] TRUE
```
To clear the cache, use `gb_clear_cache()`.
Specific cache directories for each function call can be set using the
`cache_dir` argument of each function.
## Use in spatial analysis pipelines
Because the boundaries are returned as **sf** objects, you can easily use them
in combination with other spatial data:
- Clip raster data to administrative units
- Compute zonal statistics
- Create choropleth maps
- Perform spatial joins with survey or tabular data
In this example we would create a choropleth map using the meta data of the
individual files and the boundaries data of CGAZ:
``` r
# Metadata
latam_meta <- gb_get_metadata(adm_lvl = "ADM0") |>
select(boundaryISO, boundaryName, Continent, worldBankIncomeGroup) |>
filter(Continent == "Latin America and the Caribbean") |>
glimpse()
#> Rows: 47
#> Columns: 4
#> $ boundaryISO "ABW", "AIA", "ARG", "ATG", "BES", "BHS", "BLM", "BLZ", "BOL", "BRA…
#> $ boundaryName "Aruba", "Anguilla", "Argentina", "Antigua and Barbuda", "Bonaire S…
#> $ Continent "Latin America and the Caribbean", "Latin America and the Caribbean…
#> $ worldBankIncomeGroup "High-income Countries", "No income group available", "High-income …
# Adjust factors
latam_meta$income_factor <- factor(latam_meta$worldBankIncomeGroup,
levels = c(
"High-income Countries",
"Upper-middle-income Countries",
"Lower-middle-income Countries",
"Low-income Countries"
)
)
# Get the shapes from CGAZ
latam_sf <- gb_get_world(adm_lvl = "ADM0") |>
inner_join(latam_meta,
by =
c("shapeGroup" = "boundaryISO")
)
ggplot(latam_sf) +
geom_sf(aes(fill = income_factor)) +
scale_fill_brewer(palette = "Greens", direction = -1) +
guides(fill = guide_legend(position = "bottom", nrow = 2)) +
coord_sf(
crs = "+proj=laea +lon_0=-75 +lat_0=-15"
) +
labs(
title = "World Bank Income Group",
subtitle = "Latin America and the Caribbean",
fill = "",
caption = "Source: www.geoboundaries.org"
)
```
## Summary
The **geobounds** package makes it easy to fetch, manage and visualize
administrative boundary data worldwide in a reproducible and efficient way.
Whether you're doing mapping, spatial analysis, survey integration, or
geospatial modelling, it gives you a high-quality boundary dataset with minimal
overhead.
## References