Geospatial Workflow
This workflow demonstrates:
Data acquisition with IPUMS.jl
Preprocessing and educational category labeling
Integration of aggregated census metrics with geospatial geometries
Choropleth visualization with colorbar and layout tuning
The scripts in this page mirror the runnable workflow under src/workflows/geospatial_census/src/. The goal is to keep the code easy to read and easy to run: load data, summarize it by region, join to geometry, and plot maps.
Requirement Coverage
This workflow explicitly covers all required tutorial goals:
Data acquisition: load census microdata and metadata through IPUMS.jl (
parse_ddi,load_ipums_extract) and load geospatial boundaries (load_ipums_nhgis).Preprocessing: filter and clean geometry rows, convert join keys, map coded education values to readable labels, and aggregate counts.
Normalization: compute min-max normalized counts across regions before choropleth coloring.
Integration: merge aggregated census summaries with geospatial geometries using key-based joins.
Visualization: produce choropleth maps with multiple color schemes, titles, layout tuning, and a final colorbar for interpretability.
Reproducibility: provide runnable Julia code, fixed script order, and config-driven local file paths.
1. Load Required Packages
using CairoMakie, Chain, CSV, DataFrames, GeoDataFrames, GeoInterfaceMakie, GeoMakie, StatsBase
import IPUMS: load_ipums_extract, load_ipums_nhgis, parse_ddi2. Load IPUMS Census Data
This step reads the metadata dictionary (.xml) and the fixed-width microdata extract (.dat). Both paths are configured in config.toml when you run the script pipeline.
ddi = parse_ddi("poland_data/ipumsi_00001.xml")
df = load_ipums_extract(ddi, "poland_data/ipumsi_00001.dat")3. Metadata Exploration
This section documents data acquisition quality checks. You inspect dataset-level metadata first, then column-level descriptions to verify definitions before deriving indicators.
md_df = metadata(df)
for md in keys(md_df)
println("$(md):\n----------------\n\n $(md_df[md])\n\n")
endThen inspect variable descriptions:
for colname in names(df)
println("$(colname):\n----------------\n")
try
println(" $(colmetadata(df, colname, "description"))\n")
catch
println(" description metadata not available\n")
end
end4. Load and Filter Shapefile for Poland
The boundary file is filtered to Poland, rows without geometry are removed, and the regional key is cast to Int64 so it can be joined to aggregated census results.
geodf = load_ipums_nhgis("shapefiles/ENUTS2_2013.shp").geodataframe
filter!(x -> x.CNTRY_NAME == "Poland", geodf)
dropmissing!(geodf, :geometry)
geodf[!, :ENUTS2] = parse.(Int64, geodf[!, :ENUTS2])5. Aggregate Educational Attainment by Region
This step maps raw education codes into readable categories (PRIMARY, SECONDARY, UNIVERSITY), then computes regional counts for each category.
It also prepares data for comparative mapping by creating grouped regional counts that are later normalized.
using Chain
edu_df = @chain df begin
groupby(_[:, [:ENUTS2_2013, :EDUCPL]], [:ENUTS2_2013, :EDUCPL])
combine(nrow => :Count)
end
primary = [12, 20]
secondary = [40, 41, 42, 43, 50]
university = [70, 71, 72, 73]
edu_df.EDUCPL = convert(Vector{Any}, edu_df.EDUCPL)
replace!(x -> in(x, primary) ? "PRIMARY" : x, edu_df.EDUCPL)
replace!(x -> in(x, secondary) ? "SECONDARY" : x, edu_df.EDUCPL)
replace!(x -> in(x, university) ? "UNIVERSITY" : x, edu_df.EDUCPL)
edu_counts = @chain edu_df begin
filter!(row -> !isa(row.EDUCPL, Real), _)
groupby([:ENUTS2_2013, :EDUCPL])
combine(:Count => sum => :Count)
groupby([:EDUCPL])
end6. Visualize Educational Attainment Across Regions
The workflow keeps both map styles:
rough exploration with multiple color schemes (
:Purples,:Greens,:Blues,:Reds)final cleaned map with
:Wistiaand a colorbar
Inside the plotting loop, counts are normalized to the 0-1 range. This improves comparability between education categories and across regions, independent of absolute category sizes.
# Base region map
fig_regions = Figure(size = (1200, 1400), fontsize = 20)
ax_regions = CairoMakie.Axis(fig_regions[1, 1])
poly!(ax_regions, geodf.geometry, color = 1:16, colormap = :Reds, strokecolor = :black, strokewidth = 3)
Label(fig_regions[:, :, Top()], "Voivodeships of Poland", fontsize = 50)
hidedecorations!(ax_regions)
# Multi-color comparison map
fig_rough = Figure(size = (1200, 1400), fontsize = 20)
axs_rough = [Axis(fig_rough[x, y]) for x in 1:2 for y in 1:2]
colors = [:Purples, :Greens, :Blues, :Reds]
poly!(axs_rough[1], geodf.geometry, color = :white, strokecolor = :black, strokewidth = 3)
hidedecorations!(axs_rough[1])
axs_rough[1].title = "Voivodeships of Poland"
Label(fig_rough[:, :, Top()], "Normalized Education Counts across Poland", fontsize = 50, padding = (0, 0, 30, 0))
for (idx, counts) in enumerate(edu_counts)
geo_counts = outerjoin(counts, geodf; on = [:ENUTS2_2013 => :ENUTS2])
norm_counts = (geo_counts.Count .- minimum(geo_counts.Count)) / (maximum(geo_counts.Count) .- minimum(geo_counts.Count))
cmap = cgrad(colors[idx + 1], norm_counts)
dropmissing!(geo_counts, :geometry)
ax = axs_rough[idx + 1]
poly!(ax, geo_counts.geometry, color = cmap[norm_counts], strokecolor = :black, strokewidth = 3)
ax.title = "$(counts.EDUCPL |> first)"
hidedecorations!(ax)
end
# Final cleaned visualization with a colorbar
fig = Figure(size = (1200, 1400), fontsize = 20)
axs = [Axis(fig[x, y]) for x in 1:2 for y in 1:2]
poly!(axs[1], geodf.geometry, color = :white, strokecolor = :black, strokewidth = 2)
axs[1].title = "Voivodeships of Poland"
hidedecorations!(axs[1])
Label(fig[:, :, Top()], "Normalized Educational Attainment across Poland", fontsize = 50, padding = (0,0,30,0))
for (idx, counts) in enumerate(edu_counts)
geo_counts = outerjoin(counts, geodf; on = [:ENUTS2_2013 => :ENUTS2])
norm_counts = (geo_counts.Count .- minimum(geo_counts.Count)) / (maximum(geo_counts.Count) .- minimum(geo_counts.Count))
cmap = cgrad(:Wistia, norm_counts)
dropmissing!(geo_counts, :geometry)
ax = axs[idx + 1]
poly!(ax, geo_counts.geometry, color = cmap[norm_counts], strokecolor = :black, strokewidth = 2)
ax.title = "$(counts.EDUCPL |> first)"
hidedecorations!(ax)
end
Colorbar(fig[:, 3], limits = (0, 1), colormap = :Wistia)
save(joinpath(OUTPUT_DIR, "voivodeships_poland.png"), fig_regions)
save(joinpath(OUTPUT_DIR, "education_rough_multicolors.png"), fig_rough)
save(joinpath(OUTPUT_DIR, OUTPUT_FIGURE), fig)Additional outputs produced by the scripted workflow:
output/voivodeships_poland.pngoutput/education_rough_multicolors.pngoutput/education_choropleth_poland.png
Reproducible Scripted Run
This repository installs IPUMS.jl from the JuliaHealth GitHub URL in its setup helper so the workflow uses a known package source.
From the workflow directory:
julia --project=. add_ipums.jl
julia --project=. -e "using Pkg; Pkg.instantiate()"
copy config.toml.example config.toml
julia --project=. run.jlOn macOS/Linux, replace copy with cp.
No database is required for this workflow. Inputs are local files configured in config.toml:
IPUMS DDI metadata (
.xml)IPUMS extract data (
.dat)NHGIS shapefile (
.shpand companion files)
The scripted workflow is implemented in:
src/workflows/geospatial_census/src/01_load_data.jlsrc/workflows/geospatial_census/src/02_preprocess.jlsrc/workflows/geospatial_census/src/03_visualize.jl
This keeps acquisition, preprocessing, integration, and visualization steps explicit and reproducible.
Official Documentation Used for Validation
GeoMakie docs: https://geo.makie.org/stable/
CairoMakie backend docs: https://docs.makie.org/stable/explanations/backends/cairomakie
IPUMS.jl docs: https://juliahealth.org/IPUMS.jl/dev/