IPUMS
Documentation for IPUMS.
IPUMS.jl Demo
IPUMS.jl is an in-development package built on OpenAPI.jl for accessing IPUMS data via their API.
Downloading Data from NHGIS
Set up the API client using your key from the NHGIS IPUMS website, then point to an extract definition file.
using IPUMS
api = IPUMSAPI("https://api.ipums.org/", Dict("Authorization" => "Your_Key"))Submit an extract request:
test_extract_definition = "test/testdata/example_extract_request.json"Check the extract's status:
res = extract_submit(api, "nhgis", test_extract_definition)
metadata, defn, msg = extract_info(api, res.number, "nhgis")List all previously submitted extracts:
extract_list(api, "nhgis")Download the result to a local path.
extract_download(api, res.number, "nhgis"; output_path = "file_downloads/")Loading CPS Extract Data
Parse the DDI metadata file and load the extract into a DataFrame using sample data from the IPUMS.jl repo.
ddi = parse_ddi("test/testdata/cps_00157.xml")
df = load_ipums_extract(ddi, "test/testdata/cps_00157.dat")The resulting DataFrame looks like this:
3×8 DataFrame
Row │ YEAR SERIAL MONTH ASECWTH STATEFIP PERNUM ASECWT INCTOT
│ Int64 Int64 Int64 Float64 Int64 Int64 Float64 Int64
─────┼─────────────────────────────────────────────────────────────────────
1 │ 1962 80 3 1475.59 55 1 1475.59 4883
2 │ 1962 80 3 1475.59 55 2 1470.72 5800
3 │ 1962 80 3 1475.59 55 3 1578.75 999999998IPUMS.DDIInfo — Type
DDIInfo(
filepath::String,
conditions::String = "",
citation::String = "",
ipums_project::String = "",
extract_notes::Sring = "",
extract_date::String = "",
variable_info::Vector{DDIVariable} = DDIVariable[]
_xml_doc::EzXML.Document = EzXML.XMLDocument()
_ns::String = ""
data_summary::DataFrame = DataFrame()
)A struct representing the metadata taken from an IPUMS extract. An IPUMS extract contains both file-level metadata (such as the date of export), as well as variable level metadata (such as the name and data type of a variable).
The DDIInfo object is not generally called directly. The parse_ddi() function creates a DDIinfo object after successfully parsing a DDI file from an IPUMS extract.
The DDIInfo object contains file level metadata. The variable_info field of the DDIInfo object contains a vector of DDIVariable objects. DDIVariable objects contain metadata information about individual IPUMS variables.
Keyword Arguments
filepath::String- File system path to the DDI (.xml) file.conditions::String- IPUMS legal specification on the proper use of IPUMS data.citation::String- Information for the citation of IPUMS data.ipums_project::String- Identifier for the IPUMS source of the extract data, such asIPUMS CPS, orIPUMS USA, etc.extract_notes::String- Additional clarifying information or user nodes about the extract.extract_date::String- Date on which the extract was produced.variable_info::Vector{DDIVariable}- a vector ofDDIVariableobjects, which contain metadata on each variable or column in the data file._xml_doc::EzXML.Document- An internal attribute that contains an internal representation of the DDI DOM for parsing._ns::String- An internal attribute to hold any namespaces used in the XML DOM.data_summary::DataFrame- Contains a dataframe that holds summary information about the variables in the dataset, including variable names, data types, variable descriptions, and categorical information.
Returns
DDIInfoobject that contains both file-level and variable-level metadata extracted from an IPUMS DDI (.xml) file.
Example
julia> IPUMS.DDIInfo(filepath = "test_ddi.xml")
IPUMS.DDIInfo("test_ddi.xml", "", "", "", "", "", IPUMS.DDIVariable[], EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x00000000034466d0>)), "", 0×0 DataFrame)References
Information about each variable field is taken from:
https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/fieldleveldocumentationfiles/schemas/codebookxsd/elements/var.html
IPUMS.DDIVariable — Type
DDIVariable(
name::String = "",
position_start::Int64 = 9999,
position_end::Int64 = 9999,
position_width::Int64 = 9999,
labl::String = "",
desc::String = "",
dcml::Int64 = 9999,
var_dtype::DataType = String,
var_interval::String = "",
category_labels::Union{Vector{Pair{Int64, String}}, Nothing} = nothing
coder_instructions::Union{String, Nothing} = nothing
)A struct representing individual variable/column metadata from an IPUMS extract file. This stuct is used for parsing the IPUMS datafile, which may be of fixed width format, hence the position_ fields. The default value for missing strings is an empty string "", while the default value for missing integer values is 9999.
Keyword Arguments
name::String- Name of the variable, as per the column name of the IPUMS extract file. This name is limited to 8 characters.position_start::Int64- The starting position (in columns) of a variable in a fixed width file format.position_end::Int64- The ending position (in columns) of a variable in a fixed width file format.position_width::Int64- The length (in columns) of a variable in a fixed width file format.labl::String- A short description of the variable. Often thelablis used to display a description of the variable in a dataframe or display.desc::String- A longer description of the variable, including information about the use of the variable.dcml::Int64- Identifies the number of decimal points in the variable.var_dtype::DataType- Indentifies the Julia data type of the variable.var_interval::String- Identifies if a numeric variable is discrete or continuous.category_labels::Union{Vector{Pair{Int64, String}}, Nothing}- If a variable is categorical, then this is a vector of (key, value) pairs, where thekeyis a numerical index and thevalueis the category label, for example(1 => "category 1"). If a variable is not categorical, then this attribute has a value ofnothing.coder_instructions::Union{String, Nothing}- Contains any additional information about how the variable was coded and how it should be treated.
Returns
DDIVariableobject specifying the metadata for each variable.
Example
julia> IPUMS.DDIVariable(
name = "YEAR",
position_start = 1,
position_end = 4,
position_width = 4,
labl = "Survey year",
desc = "YEAR reports the year in which the survey was conducted. YEARP is repeated on person records.",
dcml = 0,
var_dtype = String,
var_interval = "continuous",
category_labels = nothing,
coder_instructions = nothing
)
IPUMS.DDIVariable("YEAR", 1, 4, 4, "Survey year", "YEAR reports the year in which the survey was conducted. YEARP is repeated on person records.", 0, Int64, "continuous", nothing, nothing)References
Information about each variable field is taken from:
https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/fieldleveldocumentationfiles/schemas/codebookxsd/elements/stdyDscr.html
IPUMS.DataExtract — Type
DataExtract(;
extractDefinition=nothing,
number=nothing,
status=nothing,
downloadLinks=nothing,
)This function prepares a data extract request for submission to the IPUMS API.
Arguments
extractDefinition::DataExtractDefinition- (Optional) Definition of the extracted data.number::Int64- (Optional) Number of the data.status::String- (Optional) Status of the data extraction (eg. "complete").downloadLinks::DataExtractDownloadLinks- (Optional) Download link for the extracted data.
Returns
This function returns a DataExtract object.
Examples
julia> IPUMS.DataExtract(extractDefinition = IPUMS.DataExtractDefinition(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
geogLevels = ["place_00498"]),
"1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
geogLevels = ["state"])),
timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
"A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
dataFormat = "csv_no_header",
timeSeriesTableLayout = "time_by_row_layout",
shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
geographicExtents = ["united states"],
description = "abc",
version = 2,
collection = "nhgis"),
number = 2,
status = "complete",
downloadLinks = IPUMS.DataExtractDownloadLinks(codebookPreview = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
tableData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
gisData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip")
)
# Output
{
"extractDefinition": {
"datasets": {
"1790_cPop": {
"dataTables": [
"NT1"
],
"geogLevels": [
"place_00498"
]
},
"1800_cPop": {
"dataTables": [
"NT3"
],
"geogLevels": [
"state"
]
}
},
"timeSeriesTables": {
"A00": {
"geogLevels": [
"state"
]
},
"A03": {
"geogLevels": [
"state"
]
}
},
"dataFormat": "csv_no_header",
"timeSeriesTableLayout": "time_by_row_layout",
"shapefiles": [
"https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
],
"geographicExtents": [
"united states"
],
"description": "abc",
"version": 2,
"collection": "nhgis"
},
"number": 2,
"status": "complete",
"downloadLinks": {
"codebookPreview": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
"tableData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
"gisData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
}
}
References
To learn more about the DataExtract type, please visit the IPUMS API DataExtract
IPUMS.DataExtractDefinition — Type
DataExtractDefinition(;
datasets=nothing,
timeSeriesTables=nothing,
dataFormat=nothing,
timeSeriesTableLayout=nothing,
breakdownAndDataTypeLayout=nothing,
shapefiles=nothing,
geographicExtents=nothing,
description=nothing,
version=nothing,
collection=nothing,
)This function creates a definition object that is used for extracting data from an IPUMS dataset.
Arguments
datasets::Dict{String, Dataset}- (Optional) A dictionary containing aDatasetobjecttimeSeriesTables::Dict{String, TimeSeriesTable}- (Optional) A dictionary containing aTimeSeriesTableobjectdataFormat::String- (Optional) The requested format of the datatimeSeriesTableLayout::String- (Optional) The layout of the time series table databreakdownAndDataTypeLayout::String- (Optional) The layout of the dataset data when multiple data types or breakdown combos are present.shapefiles::Vector{String}- (Optional) A list of selected shapefiles.geographicExtents::Vector{String}- (Optional) A list of geographic_instances to use as extents for all datasets on this request.description::String- (Optional) A short description of your extract.version::String- (Optional) the version of the IPUMS API to use (Default:"2")collection::String- (Optional) What IPUMS collection to be queried for the extract (options could include "nhgis", "usa", etc. corresponding to IPUMS NHGIS or NHGIS USA databases).
Returns
The function return a new definition object to extract data from IPUMS dataset.
Examples
julia> IPUMS.DataExtractDefinition(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
geogLevels = ["place_00498"]),
"1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
geogLevels = ["state"])),
timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
"A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
dataFormat = "csv_no_header",
timeSeriesTableLayout = "time_by_row_layout",
shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
geographicExtents = ["united states"],
description = "abc",
version = 2,
collection = "nhgis"
)
# Output
{
"datasets": {
"1790_cPop": {
"dataTables": [
"NT1"
],
"geogLevels": [
"place_00498"
]
},
"1800_cPop": {
"dataTables": [
"NT3"
],
"geogLevels": [
"state"
]
}
},
"timeSeriesTables": {
"A00": {
"geogLevels": [
"state"
]
}
},
"dataFormat": "csv_no_header",
"timeSeriesTableLayout": "time_by_row_layout",
"shapefiles": [
"https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
],
"geographicExtents": [
"united states"
],
"description": "abc",
"version": 2,
"collection": "nhgis"
}
References
To learn more about DataExtractDefinitionobject, consult the IPUMS Developer Docs
IPUMS.DataExtractDownloadLinks — Type
DataExtractDownloadLinks(;
codebookPreview=nothing,
tableData=nothing,
gisData=nothing,
)This function provides the download links for a census extract's codebook, CSV data, and Shapefile.
Arguments
codebookPreview::String- (Optional) HTTP link to preview of the codebook used to encode the census datatableData::String- (Optional) HTTP link to the NHGIS CSV data file for downloadgisData::String- (Optional) HTTP link to the NHGIS Shapefile for download
Returns
The function returns a DataExtractDownloadLinks object containing the links for download.
Examples
julia> IPUMS.DataExtractDownloadLinks(codebookPreview = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
tableData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
gisData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip")
# Output
{
"codebookPreview": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
"tableData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
"gisData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
}References
For additional information on Dataset creation and download, consults the IPUMS Developer Docs
IPUMS.DataExtractPost — Type
DataExtractPost(;
datasets=nothing,
timeSeriesTables=nothing,
dataFormat=nothing,
timeSeriesTableLayout=nothing,
breakdownAndDataTypeLayout=nothing,
shapefiles=nothing,
geographicExtents=nothing,
description=nothing,
)This function prepares a POST object for delivery to an IPUMS POST endpoint.
Arguments
datasets::Dict{String, Dataset}- (Optional) A dictionary containing aDatasetobjecttimeSeriesTables::Dict{String, TimeSeriesTable}- (Optional) A dictionary containing aTimeSeriesTableobjectdataFormat::String- (Optional) a specified data formattimeSeriesTableLayout::String- (Optional) The layout of your time series table data.breakdownAndDataTypeLayout::String- (Optional) The layout of your dataset data when multiple data types or breakdown combos are presentshapefiles::Vector{String}- (Optional) A list of selected shapefilesgeographicExtents::Vector{String}- (Optional) A list of geographic_instances to use as extents for all datasets on this requestdescription::String- (Optional) A short description of the extract.
Returns
This function returns a DataExtractPost object for delivery to an IPUMS POST endpoint.
Examples
julia> IPUMS.DataExtractPost(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
geogLevels = ["place_00498"]),
"1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
geogLevels = ["state"])),
timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
"A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
dataFormat = "csv_no_header",
timeSeriesTableLayout = "time_by_row_layout",
shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
geographicExtents = ["united states"],
description = "abc")
# Output
{
"datasets": {
"1790_cPop": {
"dataTables": [
"NT1"
],
"geogLevels": [
"place_00498"
]
},
"1800_cPop": {
"dataTables": [
"NT3"
],
"geogLevels": [
"state"
]
}
},
"timeSeriesTables": {
"A00": {
"geogLevels": [
"state"
]
},
"A03": {
"geogLevels": [
"state"
]
}
},
"dataFormat": "csv_no_header",
"timeSeriesTableLayout": "time_by_row_layout",
"shapefiles": [
"https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
],
"geographicExtents": [
"united states"
],
"description": "abc"
}
Reference
For additional information on the DataExtractPost object, please refer to the IPUMS Developer Docs
IPUMS.DataExtractPostResponse — Type
DataExtractPostResponse(;
extractDefinition=nothing,
number=nothing,
status=nothing,
downloadLinks=nothing,
)This function returns a response to a DataExtractPost request.
Arguments
extractDefinition::DataExtractDefinition- (Optional) ADataExtractDefinitionfrom the original POST requestnumber::Int64- (Optional) The extract ID numberstatus::String- (Optional) The status of data extractiondownloadLinks::DataExtractDownloadLinks- (Optional) The download links for the data
Returns
This function returns a DataExtractPostResponse object containing the response from the IPUMS API to the DataExtractPost request.
Examples
julia> IPUMS.DataExtractPostResponse(extractDefinition = IPUMS.DataExtractDefinition(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
geogLevels = ["place_00498"]),
"1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
geogLevels = ["state"])),
timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
"A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
dataFormat = "csv_no_header",
timeSeriesTableLayout = "time_by_row_layout",
shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
geographicExtents = ["united states"],
description = "abc",
version = 2,
collection = "nhgis"),
number = 90,
status = "queued",
downloadLinks = IPUMS.DataExtractDownloadLinks(codebookPreview = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
tableData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
gisData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"))
# Output
{
"extractDefinition": {
"datasets": {
"1790_cPop": {
"dataTables": [
"NT1"
],
"geogLevels": [
"place_00498"
]
},
"1800_cPop": {
"dataTables": [
"NT3"
],
"geogLevels": [
"state"
]
}
},
"timeSeriesTables": {
"A00": {
"geogLevels": [
"state"
]
},
"A03": {
"geogLevels": [
"state"
]
}
},
"dataFormat": "csv_no_header",
"timeSeriesTableLayout": "time_by_row_layout",
"shapefiles": [
"https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
],
"geographicExtents": [
"united states"
],
"description": "abc",
"version": 2,
"collection": "nhgis"
},
"number": 90,
"status": "queued",
"downloadLinks": {
"codebookPreview": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
"tableData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
"gisData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
}
}
References
To learn more about DataExtractPostResponse visit the IPUMS Developer Docs
IPUMS.DataTableFull — Type
DataTableFull(;
name=nothing,
nhgisCode=nothing,
description=nothing,
universe=nothing,
sequence=nothing,
datasetName=nothing,
nVariables=nothing,
)Constructor to store information about a given table within a particular IPUMS dataset.
Attributes
name::String- (Optional) The unique identifier for the data table within its dataset.nhgisCode::String- (Optional) The code for this data table that will appear in extract.description::String- (Optional) A short description of the data table.universe::String- (Optional) The statistical population (set of entities) measured by this data table (e.g., persons, families, occupied housing units, etc.).sequence::Int64- (Optional) The order for which this data table will appear in the metadata API and extracts.datasetName::String- (Optional) The name of the dataset from which the data are fromnVariables::Int64- (Optional) A list of variables within the table.
Returns
This returns the name of the data, their nhgisCode, a description,their universe, a sequence, a dataset name and variables.
Examples
julia> IPUMS.DataTableFull(name = "NT1",
nhgisCode = "AAA",
description = "Total Population",
universe= "Persons",
sequence = 1,
datasetName = "1790_cPop",
nVariables = [IPUMS.DataTableFullVariablesInner(name = "NT001",
nhgisCode = "AAA001")])
# Output
{
"name": "NT1",
"nhgisCode": "AAA",
"description": "Total Population",
"universe": "Persons",
"sequence": 1,
"datasetName": "1790_cPop",
"nVariables": [
{
"name": "NT001",
"nhgisCode": "AAA001"
}
]
}Reference
To find out more about the DataTableFull visit the IPUMS Developer Docs
IPUMS.DataTableFullVariablesInner — Type
DataTableFullVariablesInner(;
name=nothing,
nhgisCode=nothing,
)Inner constructor representing the variables within a DataTableFull object.
Attributes
name::String- (Optional) The unique identifier for the data table within its dataset.nhgisCode::String- (Optional) The code for this data table that will appear in extract.
Returns
This returns an inner constructor for a DataTable variable, the DataTableFullVariablesInner object.
Examples
julia> IPUMS.DataTableFullVariablesInner(name = "NT1",
nhgisCode = "AAA")
# Output
{
"name": "NT1",
"nhgisCode": "AAA"
}
Reference
To find out more about DataTableFullVariablesInner visit the IPUMS Developer Docs
IPUMS.DataTableSimple — Type
DataTableSimple(;
name=nothing,
nhgisCode=nothing,
description=nothing,
sequence=nothing,
)Builds an object that contains the representation of an IPUMS table.
Arguments
name::String- (Optional) The unique identifier for the data table within its dataset.nhgisCode::String- (Optional) The code for this data table that will appear in extract.description::String- (Optional) A short description of the data table.sequence::Int64- (Optional) The order for which this data table will appear in the metadata API and extracts.
Examples
julia> IPUMS.DataTableSimple(name = "NT1",
nhgisCode = "AAA",
description = "Total Population",
sequence = 1)
# Output
{
"name": "NT1",
"nhgisCode": "AAA",
"description": "Total Population",
"sequence": 1
}References
For more information about the DataTableSimple object, consult the IPUMS Developer Docs
IPUMS.Dataset — Type
Dataset(;
dataTables=nothing,
geogLevels=nothing,
breakdownValues=nothing,
years=nothing,
)This function creates a new record of a Dataset given geographical levels, breakdown values and years.
Arguments
dataTables::Vector{String}- A list of available data tables for this Dataset.geogLevels::Vector{String}- A list of geographic levels available for the Dataset,(eg. "county","state").breakdownValues::Vector{String}- (Optional) Breakdown values available for this grouping for the available Dataset.years::Vector{String}- (Optional) List of the years if data of multiple years are present.
Returns
This function returns a new Dataset object including geographic and temporal information.
Examples
julia> IPUMS.Dataset(dataTables = ["1790_cPop"],
geogLevels = ["state"],
breakdownValues =["bs32.ge00"],
years = ["1790"])
# Output
{
"dataTables": [
"1790_cPop"
],
"geogLevels": [
"state"
],
"breakdownValues": [
"bs32.ge00"
],
"years": [
"1790"
]
}References
To know more about the Dataset type visit the links:
IPUMS.DatasetFull — Type
DatasetFull(;
name=nothing,
description=nothing,
group=nothing,
sequence=nothing,
dataTables=nothing,
geogLevels=nothing,
hasMultipleDataTypes=nothing,
breakdowns=nothing,
)This function creates a dataset given a dataset description. This function is used to download data from the IPUMS website.
Arguments
name::String- (Optional) The dataset identifierdescription::String- (Optional) a short description of the datasetgroup::String- (Optional) the group of dataset to which it belongssequence::Int64- (Optional) the order of appearence of the dataset in the metadata API and extractdataTables::Vector{DataTableSimple}- (Optional) The list of the available datatable for the datasetgeogLevels::Vector{DatasetFullGeogLevelsInner}- (Optional) A list of the geographic levels available for the datasethasMultipleDataTypes::Bool- (Optional) A boolean indicating if multiple data types exist for the datasetbreakdowns::DatasetFullBreakdowns- (Optional) List of breakdown available for the dataset
Returns
The function returns a new DatasetFull object.
Examples
julia> IPUMS.DatasetFull(name = "2010_SF1a",
description = "SF 1a - P & H Tables [Blocks & Larger Areas]",
group = "2010 Census",
sequence = 4802,
dataTables = [IPUMS.DataTableSimple(name = "P1",
nhgisCode = "H7V",
description = "Total Population",
sequence = 1)],
geogLevels = [ IPUMS.DatasetFullGeogLevelsInner(name = "nation",
description = "Nation",
hasGeogExtentSelection = false)],
hasMultipleDataTypes = false,
breakdowns = IPUMS.DatasetFullBreakdowns(name = "bs32",
type = "Spatial",
description = "Geographic Subarea (2010 Census and American Community Survey)",
breakdownValues = [IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "bs32.ge00",
description = "Total area")]))
# Output
{
"name": "2010_SF1a",
"description": "SF 1a - P & H Tables [Blocks & Larger Areas]",
"group": "2010 Census",
"sequence": 4802,
"dataTables": [
{
"name": "P1",
"nhgisCode": "H7V",
"description": "Total Population",
"sequence": 1
}
],
"geogLevels": [
{
"name": "nation",
"description": "Nation",
"hasGeogExtentSelection": false
}
],
"hasMultipleDataTypes": false,
"breakdowns": {
"name": "bs32",
"type": "Spatial",
"description": "Geographic Subarea (2010 Census and American Community Survey)",
"breakdownValues": [
{
"name": "bs32.ge00",
"description": "Total area"
}
]
}
}References
To learn more about the DatasetFull type visit the IPUMS Developer Docs
IPUMS.DatasetFullBreakdowns — Type
DatasetFullBreakdowns(;
name=nothing,
type=nothing,
description=nothing,
breakdownValues=nothing,
years=nothing,
geographicInstances=nothing,
)This function creates a dataset given the description of the data.
Arguments
name::String- (Optional) The dataset identifiertype::String- (Optional) The type of dat in the datasetdescription::String- (Optional) a short description of the datasetbreakdownValues::Vector{DatasetFullBreakdownsBreakdownValuesInner}- (Optional) List of breakdown available for the datasetyears::Vector{String}- (Optional) List of the years if data of multiple years are presentgeographicInstances::Vector{DatasetFullBreakdownsBreakdownValuesInner}- (Optional) List of geographical extents
Returns
The function returns a new DatasetFullBreakdowns object.
Examples
julia> IPUMS.DatasetFullBreakdowns(name = "bs32",
type = "Spatial",
description = "Geographic Subarea (2010 Census and American Community Survey)",
breakdownValues = [IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "bs32.ge00",
description = "Total area")],
years = ["2010"],
geographicInstances = [IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "010",
description = "Alabama")])
# Output
{
"name": "bs32",
"type": "Spatial",
"description": "Geographic Subarea (2010 Census and American Community Survey)",
"breakdownValues": [
{
"name": "bs32.ge00",
"description": "Total area"
}
],
"years": [
"2010"
],
"geographicInstances": [
{
"name": "010",
"description": "Alabama"
}
]
}References
To find out more about the Dataset type visit the IPUMS Developer Docs.
IPUMS.DatasetFullBreakdownsBreakdownValuesInner — Type
DatasetFullBreakdownsBreakdownValuesInner(;
name=nothing,
description=nothing,
)Inner constructor representing the variables within a DatasetFull object.
Arguments
name::String- (Optional) The dataset identifierdescription::String- (Optional) a short description of the dataset
Returns
This function returns an inner constructor for an DatasetFullBreakdownsBreakdownValuesInner object. This object is used in constructing a DatasetFull object.
Examples
julia> IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "1790_cPop",
description = "1790 Census: Population Data [US, States & Counties]")
# Output
{
"name": "1790_cPop",
"description": "1790 Census: Population Data [US, States & Counties]"
}References
To find out more about the Dataset type visit the IPUMS Developer Docs
IPUMS.DatasetFullGeogLevelsInner — Type
DatasetFullGeogLevelsInner(;
name=nothing,
description=nothing,
hasGeogExtentSelection=nothing,
)This function gives the geographical information about data and its description.
Arguments
name::String- (Optional) The dataset identifierdescription::String- (Optional) A short description of the datasethasGeogExtentSelection::Bool- (Optional) boolean indicating if the dataset has geogrphical extent
Returns
The function returns a DatasetFullGeogLevelsInner object with geographic information related to a dataset.
Examples
julia> IPUMS.DatasetFullGeogLevelsInner(name = "1790_cPop",
description = "1790 Census: Population Data [US, States & Counties]",
hasGeogExtentSelection = 1)
# Output
{
"name": "1790_cPop",
"description": "1790 Census: Population Data [US, States & Counties]",
"hasGeogExtentSelection": true
}
References
To learn more about the DatasetFullGeogLevelsInner visit the IPUMS Developer Docs
IPUMS.DatasetSimple — Type
DatasetSimple(;
name=nothing,
description=nothing,
group=nothing,
sequence=nothing,
)This function creates a dataset reference with a provided name, description, group, and sequence.
Arguments
name::String- (Optional) The dataset identifierdescription::String- (Optional) A short description of the datasetgroup::String- (Optional) The group of datasets to which this dataset belongs.sequence::Int64- (Optional) The order in which the dataset will appear in the metadata API and extracts.
Returns
This function returns a DatasetSimple object with the properties specified by the function arguments.
Examples
julia> IPUMS.DatasetSimple(name = "1790_cPop",
description = "1790 Census: Population Data [US, States & Counties]",
group = "1790 Census",
sequence = 101)
# Output
{
"name": "1790_cPop",
"description": "1790 Census: Population Data [US, States & Counties]",
"group": "1790 Census",
"sequence": 101
}References
To find out more about the Dataset type visit IPUMS API Dataset
IPUMS.Error — Type
Error(;
type=nothing,
status=nothing,
detail=nothing,
)This function returns the error message from a dataset request to the IPUMS API.
Arguments
type::Int64- The error code for the messagestatus::String- The actual error message returneddetail::String- Additional explanation about the cause of the error
Returns
This function return an Error object containing the reasons for the failure of a request to the IPUMS API.
Examples
julia> IPUMS.Error(type = 400,
status = "SemanticValidationError",
detail = "Geographic extents Extent selection is not required for selected geog levels. Please remove the 'geographic_extents' section of you request." )
# Output
{
"type": 400,
"status": "SemanticValidationError",
"detail": "Geographic extents Extent selection is not required for selected geog levels. Please remove the 'geographic_extents' section of you request."
}
References
To learn more about errors, please see:
IPUMS.IPUMSAPI — Method
IPUMSAPI(url::String, headers::Dict)Create an IPUMS API object for querying the IPUMS server.
NOTE: If you need more control of how the connection to the server is made, consider using the
OpenAPI.jlpackage and creating anClientobject that you can pass intoIPUMSAPI()directly.
Arguments
url::String– The URL of the server. This is the base URL that will be used for all API calls.headers::Dict– A dictionary of HTTP headers to be sent with all API calls.
Returns
api::IPUMSAPI – An API object that can be passed to all IPUMS.jl methods that require an API object.
IPUMS.IPUMSSource — Type
IPUMSSource(
proj_name::String,
url_name::String,
collection_type::String,
code_for_api::String = "",
api_support::Bool = false,
home_url::String = ""
)A struct representing sources that IPUMS provides.
Arguments
proj_name::String- Name of the IPUMS project.url_name::String- Name of the project as used in that project's website URL.collection_type::String- Either"microdata"or"aggregate data"indicating the type of data this collection provides.
Keyword Arguments
code_for_api::String- The name of the project used when interacting with the IPUMS API (for collections that are supported by the API). (Default:"")api_support::Bool- Logical indicating whether the collection is supported by the IPUMS API. (Default:false)home_url::String- URL for the project's homepage. (Default:"")
Returns
IPUMSSourceobject specifying the previous parameters
Example
julia> IPUMSSource(
proj_name = "IPUMS USA",
url_name = "usa",
collection_type = "microdata",
api_support = true
)
IPUMS.IPUMSSource("IPUMS USA", "usa", "microdata", "", true, "")IPUMS.Shapefile — Type
Shapefile(;
name=nothing,
year=nothing,
geographicLevel=nothing,
extent=nothing,
basis=nothing,
sequence=nothing,
)This function creates a reference to an NHGIS shapefile for an IPUMS dataset.
Attributes
name::String- (Optional) The unique identifier of the shapefile.year::String- (Optional) The survey year in which the file's represented areas were used for tabulations.geographicLevel::String- (Optional) The geographic level of the shapefile.extent::String- (Optional) The geographic extent which is covered by the shapefile.basis::String- (Optional) The derivation source of the shapefile.sequence::Int64- (Optional) The order the shapefile in which appears in the metadata API.
Returns
This function returns a Shapefile object containing the attributes specified in the function arguments.
Examples
julia> IPUMS.Shapefile(name = "base.tl2000.nongen.us_state_1790",
year = "1790",
geographicLevel = "state",
extent = "united states",
basis = "2000 tiger/line +",
sequence = 1)
# Output
{
"name": "base.tl2000.nongen.us_state_1790",
"year": "1790",
"geographicLevel": "state",
"extent": "united states",
"basis": "2000 tiger/line +",
"sequence": 1
}References
Additional information about this object is available in the IPUMS Developer Docs
IPUMS.TimeSeriesTable — Type
TimeSeriesTable(;
geogLevels=nothing,
years=nothing,
)This function creates a table record with a given geographical level and year information.
Arguments
geogLevels::Vector{String}- A vector containing geographical levels, (eg."state", "county"), for the Time Series Tableyears::Vector{String}- (Optional) A list of years for this Time Series Table.
Returns
This function returns a Time Series Table record giving the geographical level and the years to which the data are referring.
Examples
julia> IPUMS.TimeSeriesTable(geogLevels=["state"],
years =["1790"])
# Outputs
{
"geogLevels": [
"state"
],
"years": [
"1790"
]
}References
To find additional information on the Time Series Table, please refer to:
IPUMS.TimeSeriesTableFull — Type
TimeSeriesTableFull(;
name=nothing,
description=nothing,
geographicIntegration=nothing,
sequence=nothing,
timeSeries=nothing,
geogLevels=nothing,
)This function returns an object containing the attributes for downloading a Time Series Table.
Arguments
name::String- (Optional) The unique variable identifier for the time series table, (eg. "A00", "OWNERSHP").description::String- (Optional) A short description of the time series variable referred to inname.geographicIntegration::String- (Optional) Specifies how the variable accounts for changes in geographic boundaries over time, (eg. "Nominal").sequence::Float32- (Optional) The order of appearence of the dataset in the metadata API and extract.timeSeries::Vector{TimeSeriesTableFullTimeSeriesInner}- (Optional) A list of time series records corresponding to the variable specified inname.geogLevels::Vector{TimeSeriesTableFullTimeSeriesInner}- (Optional) A list of geographic levels available for this time series table.
Returns
This function return a TimeSeriesTableFull object containing the variable name, description, time series, and geographical information of the data.
Examples
julia> IPUMS.TimeSeriesTableFull(name="A00",
description= "Total Population",
geographicIntegration= "Nominal",
sequence= 0.01,
timeSeries=[IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "AA",
description = "Persons: Total",
sequence = 1 )],
geogLevels= [ IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "state",
description = "State",
sequence = 4 ),
IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "county",
description = "State--County",
sequence = 25 )])
# Output
{
"name": "A00",
"description": "Total Population",
"geographicIntegration": "Nominal",
"sequence": 0.01,
"timeSeries": [
{
"name": "AA",
"description": "Persons: Total",
"sequence": 1
}
],
"geogLevels": [
{
"name": "state",
"description": "State",
"sequence": 4
},
{
"name": "county",
"description": "State--County",
"sequence": 25
}
]
}
References
For additional information please refer to the following sources:
IPUMS.TimeSeriesTableFullTimeSeriesInner — Type
TimeSeriesTableFullTimeSeriesInner(;
name=nothing,
description=nothing,
sequence=nothing,
)This function creates a reference to an IPUMS Time Series table.
Arguments
name::String- (Optional) The unique identifier of the time series table.description::String- (Optional) A short description of the time series table.sequence::Int64- (Optional) The order in which the time series table will appear in the metadata API and extracts.
Returns
This function returns a TimeSeriesTableFullTimeSeriesInner object that contains information about a desired Time Series table.
Examples
julia> IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "1790_cPop",
description = "1790 Census: Population Data [US, States & Counties]",
sequence = 101)
# Output
{
"name": "1790_cPop",
"description": "1790 Census: Population Data [US, States & Counties]",
"sequence": 101
}References
Additional information about the TimeSeriesTableFullTimeSeriesInner object is available in the IPUMS Developer Docs
IPUMS.TimeSeriesTableSimple — Type
TimeSeriesTableSimple(;
name=nothing,
description=nothing,
geographicIntegration=nothing,
sequence=nothing,
timeSeries=nothing,
geogLevels=nothing,
)This function creates a new Simple Time Series Table record to support downloading of the corresponding data.
Arguments
name::String- (Optional) The unique variable identifier for the time series table, (eg. "A00", "OWNERSHP").description::String- (Optional) A short description of the time series variable referred to inname.geographicIntegration::String- (Optional) How does the variable value account for changes in geographic boundaries over time, (eg. "Nominal").sequence::Float32- (Optional) The order of appearence of the dataset in the metadata API and extract.timeSeries::Vector{String}- (Optional) A list of time series records corresponding to the variable specified inname.geogLevels::Vector{String}- (Optional) A list of geographic levels available for this time series table.
Returns
This function returns a new TimeSeriesTableSimple object.
Examples
julia> IPUMS.TimeSeriesTableSimple(name = "A00",
description = "Total Population",
geographicIntegration = "Nominal",
sequence = 0.01,
timeSeries = ["1790", "1800"],
geogLevels = ["state", "county"] )
# Output
{
"name": "A00",
"description": "Total Population",
"geographicIntegration": "Nominal",
"sequence": 0.01,
"timeSeries": [
"1790",
"1800"
],
"geogLevels": [
"state",
"county"
]
}
References
To find out more about the TimeSeriesTableSimple type visit the IPUMS Developer Docs
IPUMS._check_that_file_exists — Method
_check_that_file_exists(filepath::String)This is an internal function and checks whether the provided file exists or not.
Arguments
filepath::String- A file path that the user wishes to parse. The file must be an existing XML file.
Returns
The function returns nothing if the file exists. If the file does not exist, then the function raises an ArgumentError.
IPUMS._check_that_file_is_dat — Method
_check_that_file_is_dat(filepath::String)This is an internal function and checks whether the provided file is a DAT file. All IPUMS extract data files should be in DAT format.
Arguments
filepath::String- A file path that the user wishes to import. The file must be a DAT file.
Returns
The function returns nothing if the file is a DAT file. If the file is not a DAT file, then the function raises an ArgumentError.
IPUMS._check_that_file_is_xml — Method
_check_that_file_is_xml(filepath::String)This is an internal function and checks whether the provided file is an XML file. All DDI files should be in XML format.
Arguments
filepath::String- A file path that the user wishes to parse. The file must be an XML file.
Returns
The function returns nothing if the file is an XML file. If the file is not an XML file, then the function raises an ArgumentError.
IPUMS._get_var_metadata_from_ddi! — Method
_get_var_metadata_from_ddi!(ddi::DDIInfo)This is an internal function and not meant for the public API. This function iterates over the variable nodes in the DDI XML file nodes. The data from each variable node is collected in a DDIVariable object, and a vector of those DDIVariable object is save in the DDIInfo object.
Arguments
ddi::DDIInfo- ADDIInfoobject that will retain all of the parsed metadata.
Returns
The function return the original DDIInfo object with updated data in the attributes.
IPUMS._parse_column_float! — Method
_parse_column_float!(col, data, col_start, col_end, line_len, decimals)Internal function that parses a single floating-point column from all rows of a memory-mapped fixed-width IPUMS data file. Float values in IPUMS files are encoded as integers (e.g. "12345" with decimals=2 represents 123.45). This function parses the integer from raw bytes and divides by 10^decimals to recover the float value.
Arguments
col::Vector{Union{Missing, Float64}}- A pre-allocated column vector to hold the parsed float values.data::Vector{UInt8}- The memory-mapped file contents as a byte array.col_start::Int- The starting byte position of the field within a line.col_end::Int- The ending byte position of the field within a line.line_len::Int- The number of bytes per line (including the newline character).decimals::Int- The number of implied decimal places in the encoded integer.
Returns
This function does not return any output. Instead it modifies the provided column vector in-place.
IPUMS._parse_column_int! — Method
_parse_column_int!(col, data, col_start, col_end, line_len)Internal function that parses a single integer column from all rows of a memory-mapped fixed-width IPUMS data file. Each field is located using arithmetic byte offsets and parsed directly from the raw bytes.
Arguments
col::Vector{Union{Missing, Int64}}- A pre-allocated column vector to hold the parsed integer values.data::Vector{UInt8}- The memory-mapped file contents as a byte array.col_start::Int- The starting byte position of the field within a line.col_end::Int- The ending byte position of the field within a line.line_len::Int- The number of bytes per line (including the newline character).
Returns
This function does not return any output. Instead it modifies the provided column vector in-place.
IPUMS._parse_column_string! — Method
_parse_column_string!(col, data, col_start, col_end, line_len)Internal function that parses a single string column from all rows of a memory-mapped fixed-width IPUMS data file. Each field is located using arithmetic byte offsets, stripped of leading and trailing spaces, and converted to a Julia String.
Arguments
col::Vector{Union{Missing, String}}- A pre-allocated column vector to hold the parsed string values.data::Vector{UInt8}- The memory-mapped file contents as a byte array.col_start::Int- The starting byte position of the field within a line.col_end::Int- The ending byte position of the field within a line.line_len::Int- The number of bytes per line (including the newline character).
Returns
This function does not return any output. Instead it modifies the provided column vector in-place.
IPUMS._parse_int_bytes — Method
_parse_int_bytes(data, start, stop)Internal function that parses an integer value directly from a range of bytes in a memory-mapped file. This avoids allocating any String or SubString objects during parsing.
Arguments
data::Vector{UInt8}- The memory-mapped file contents as a byte array.start::Int- The starting byte position of the field.stop::Int- The ending byte position of the field.
Returns
Returns the parsed Int64 value, or missing if the field contains only whitespace.
IPUMS._read_ddi_and_parse_extract_level_metadata! — Method
_read_ddi_and_parse_extract_level_metadata!(ddi::DDIInfo)This is an internal function and not meant for the public API. This function parses the DDI XML file and captures the file-level metadata.
Arguments
ddi::DDIInfo- ADDIInfoobject that will retain all of the parsed metadata.
Returns
The function return the original DDIInfo object with updated data in the attributes.
IPUMS._string_to_num — Method
_string_to_num(x::SubString{String})This is an internal function and not meant for the public API. This function takes a text string and returns only the numeric portion of the string. For example in the input is "Codes999999", the function will return an Int64 with the value 999999.
Arguments
x::SubString{String}- A string that may contain some numeric data mixed with text.
Returns
This function returns the numeric part of the string, coded as an Int64 datatype.
IPUMS.basepath — Method
The default API base path for APIs in IPUMSAPI. This can be used to construct the OpenAPI.Clients.Client instance.
IPUMS.extract_download — Method
function extract_download(
api::IPUMSAPI,
extract_number::Int,
collection::String;
output_path::String = pwd(),
codebook_name::String = nothing,
table_data_name::String = nothing,
gis_data_name::String = nothing,
codebook::Bool = true,
table_data::Bool = true,
gis_data::Bool = true
)Download files associated with a given IPUMS data extract.
Arguments
api::IPUMSAPI– AnIPUMSAPIobject to establish connection details.extract_number::Int– extract ID assigned to the IPUMS data extract.collection::String– What IPUMS collection to be queried for the extract (options could include"nhgis","usa", etc. corresponding to IPUMS NHGIS or NHGIS USA databases).
Keyword Arguments
output_path::String– The path (location on computer) to output all downloaded files (Default: current working directory).codebook_name::String– What the name of the codebook file should be (Default: `nothing).table_data_name::String– What the name of the table data file should be (Default: `nothing).gis_data_name::String– What the name of the GIS file should be (Default: `nothing).codebook::Bool–trueto download the codebook file for the extract;falseto not download it (Default:true)table_data::Bool–trueto download the table data file for the extract;falseto not download it (Default:true)gis_data::Bool–trueto download the GIS file for the extract;falseto not download it (Default:true)
Returns
The path (location on computer) where the files were downloaded to.
Examples
julia> extract_download(api, 1, "nhgis"; output_path = "file_downloads/", codebook = false, gis_data_name = "GIS_1", table_data_name = "DATA_1")
[ Info: Table data for Extract 1 downloaded to file_downloads/DATA_1.zip.
[ Info: GIS data for Extract 1 downloaded to file_downloads/DATA_1.zip.
"file_downloads/"
julia> extract_download(api, 2, "nhgis", output_path="file_downloads/")
┌ Warning: Extract 2 has expired and the associated data cannot be downloaded any longer. If you would like to download the data for
this extract, please resubmit the extract request associated with this extract again to create a new extract with the same data fro
m this extract.
└ @ IPUMS ~/FOSS/IPUMS.jl/src/apis/api_IPUMSAPI.jl:213IPUMS.extract_info — Method
extract_info(
api::IPUMSAPI,
extract_number::Int,
collection::String;
version::String = "2"
)Get information about a specific data extract.
Arguments
api::IPUMSAPI– AnIPUMSAPIobject to establish connection details.extract_number::Int– extract ID assigned to the IPUMS data extract.collection::String– What IPUMS collection to be queried for the extract (options could include"nhgis","usa", etc. corresponding to IPUMS NHGIS or NHGIS USA databases).
Keyword Arguments
version::String– What version of the IPUMS API to use (Default:"2")
Returns
metadata::{String, Any} – A dictionary containing metadata about the queried data extract:
number– The IPUMS data extract IDtimeSeriesTableLayout– Layout of the the time series tables. Can be one of the following:"time_by_column_layout"(wide format, default): rows correspond to geographic units, columns correspond to different times in the time series"time_by_row_layout"(long format): rows correspond to a single geographic unit at a single point in time"time_by_file_layout": data for different times are provided in separate files
geographicExtents– Vector of geographic extents to use for all of the datasets in the extract definition.status– The current status of the IPUMS data extract (such as"completed"for a request being done). Potential results include:"queued""started""produced""canceled""failed""completed"
description– The associated description about the data extract.timeSeriesTables– Vector of time series tables for use in the extract definition.
version– What version of the API is being used for handling this request.dataFormat– The desired format of the extract data file."csv_no_header"(default) includes only a minimal header in the first row"csv_header"includes a second, more descriptive header row."fixed_width"provides data in a fixed width format
breakdownAndDataTypeLayout– The desired layout of any datasets that have multiple data types or breakdown values. Potential values can be:"single_file"(default) keeps all data types and breakdown values in one file"separate_files"splits each data type or breakdown value into its own file
shapefiles– Report what shapefiles were requested and used in this extract.downloadUrls– URLs to download the data from the requested extract.datasets– What datasets were used in this extract.collection– What collection is being queried.
NOTE: To be ready to download, an extract must have a
completedstatus. However, some requests that arecompletedmay still be unavailable for download, as extracts expire and are removed from IPUMS servers after a set period of time (72 hours for microdata collections, 2 weeks for IPUMS NHGIS). If an extract has expired, a warning from this function will be emitted.
defn::IPUMS.DataExtractDefinition – The associated data extract definition that was used to generate this extract.
msg::OpenAPI.Clients.ApiResponse – The response message from the IPUMS API.
Examples
julia> metadata, defn, msg = extract_info(api, 1, "nhgis", "2");
julia> metadata
Dict{String, Any} with 13 entries:
"number" => 1
"timeSeriesTab… => "time_by_file_layout"
"geographicExt… => ["010"]
"status" => "completed"
"description" => "example extract request"
"timeSeriesTab… => Dict{String, TimeSeriesTable}…
"version" => 2
"dataFormat" => "csv_no_header"
"breakdownAndD… => "single_file"
"shapefiles" => ["us_state_1790_tl2000"]
"downloadUrls" => Dict("codebookPreview"=>"http…
"datasets" => Dict{String, Dataset}("2000_S…
"collection" => "nhgis"IPUMS.extract_list — Method
extract_list(
api::IPUMSAPI,
collection::String;
version::String = "2",
extracts::Int64 = 10,
_mediaType=nothing
)Get a list of recent data extracts.
NOTE: This function emits warnings when returned extracts are expired.
Arguments
api::IPUMSAPI– AnIPUMSAPIobject to establish connection details.collection::String– What IPUMS collection to be queried for the extract (options could include"nhgis","usa", etc. corresponding to IPUMS NHGIS or IPUMS USA databases).
Keyword Arguments
version::String– What version of the IPUMS API to use (Default:"2").extracts::Int64– Starting from the newest extract, get the most recent desired number of extracts (Default:"10").
Returns
Vector{DataExtract}– a vector ofDataExtractobjects that contains the relevant extract number (number), its IPUMS status (status), the definition used to generate the associated definition (extractDefinition), and links to download the extract's data (downloadLinks).
Examples
julia> res = extract_list(api, "nhgis")
┌ Warning: Extract 1 has expired and the associated data cannot be downloaded any longer. If you would like to download the data for this extract, please resubmit the extract request associated with this extract again to create a new extract with the same data from this extract.
└ @ IPUMS
2-element Vector{IPUMS.DataExtract}:
{
"extractDefinition": {
#=
...
Extract definition details here
...
=#
},
"number": 2,
"status": "completed",
"downloadLinks": {
"codebookPreview": "nhgis0002_csv_PREVIEW.zip",
"tableData": "nhgis0002_csv.zip",
"gisData": "nhgis0002_shape.zip"
}
}
{
"extractDefinition": {
#=
...
Extract definition details here
...
=#
},
"number": 1,
"status": "completed",
"downloadLinks": {}
}TIP: If you want to record all the data extracts that are expired, you can loop through each of the returned extracts and check if the
downloadLinksfield is empty. If it is, that means it is expired.
IPUMS.extract_submit — Function
extract_submit(
api::IPUMSAPI,
collection::String,
extract_definition::String = nothing;
version::String = "2",
_mediaType=nothing
)Submit an extract definition to IPUMS for IPUMS to generate a data extract with requested data.
Arguments
api::IPUMSAPI– AnIPUMSAPIobject to establish connection details.collection::String– What IPUMS collection to be queried for the extract (options could include"nhgis","usa", etc. corresponding to IPUMS NHGIS or IPUMS USA databases).extract_definition::String– The location of a file storing the extract definition you want to submit.
Keyword Arguments
version::String– What version of the IPUMS API to use (Default:"2").
Returns
DataExtractPostResponse– Upon a successful submission, this object will contain a copy of the extract definition submitted, the extract ID, its status, and any relevant download links.
Examples
julia> res = extract_submit(api, "nhgis", my_extract_definition_file)
{
"extractDefinition": {
#=
...
Extract definition details here
...
=#
},
"number": 4,
"status": "queued",
"downloadLinks": {}
}
julia> res = extract_submit(api, "nhgis", "fake_file.json")
[ Info: The value you provided for the argument `extract_definition` ("fake_file.json") is not a valid filepath. Please update the path to your data extract.
┌ Error: ArgumentError("invalid JSON at byte position 1 while parsing type JSON3.False: InvalidChar\nfake_file.json\n")
└ @ IPUMS
┌ Error: The extract definition submission request was not successful. Please review your extract definition and try again.
└ @ IPUMS IPUMS.ipums_data_collections — Method
ipums_data_collections()
List IPUMS data collections with their corresponding codes used by the IPUMS API. Unlisted data collections are not yet supported by the IPUMS API.
Returns
DataFramewith four columns containing the full collection name, the type of data the collection provides, the collection code used by the IPUMS API, and the status of API support for the collection.
Example
julia> ipums_data_collections()
Row │ collection_name collection_type code_for_api api_support
│ String String String Bool
─────┼─────────────────────────────────────────────────────────────────
1 │ IPUMS USA microdata true
2 │ IPUMS CPS microdata true
3 │ IPUMS International microdata ipumsi true
...IPUMS.load_ipums_extract — Method
load_ipums_extract(ddi::DDIInfo, extract_filepath::String)
This file will take in a parsed DDIInfo object and file path to an IPUMS
DAT extract file, and returns a dataframe containing all of the data.Arguments
ddi::DDIInfo- A DDIInfo object, which is the result of parsing a DDI metadata file.extract_filepath::String- The directory path to an IPUMS extract DAT file.
Returns
This function outputs a Julia Dataframe that contains all of the data from
the IPUMS extract file. Further, the metadata fields of the Dataframe
contain the metadata parsed from the DDI file.Examples
Let's assume we have an extract DDI file named my_extract.xml, and an extract file called my_extract.dat.
julia> ddi = parse_ddi("my_extract.xml");
julia> df = load_ipums_extract(ddi, "my_extract.dat");IPUMS.metadata_nhgis_data_tables_get — Method
List all data_tables
Params:
- version::String (required)
- page_number::Int64
- page_size::Int64
Return: DataTableFull, OpenAPI.Clients.ApiResponse
IPUMS.metadata_nhgis_datasets_dataset_data_tables_data_table_get — Method
Detailed data table view
Params:
- dataset::String (required)
- data_table::String (required)
- version::String (required)
Return: DataTableFull, OpenAPI.Clients.ApiResponse
IPUMS.metadata_nhgis_datasets_dataset_get — Method
Detailed dataset view
Params:
- dataset::String (required)
- version::String (required)
Return: DatasetFull, OpenAPI.Clients.ApiResponse
IPUMS.metadata_nhgis_datasets_get — Method
List all datasets
Params:
- version::String (required)
- page_number::Int64
- page_size::Int64
Return: Vector{DatasetSimple}, OpenAPI.Clients.ApiResponse
IPUMS.metadata_nhgis_shapefiles_get — Method
List all the shapefiles.
Params:
- version::String (required)
- page_number::Int64
- page_size::Int64
Return: Vector{Shapefile}, OpenAPI.Clients.ApiResponse
IPUMS.metadata_nhgis_time_series_tables_get — Method
List all time series tables
Params:
- version::String (required)
- page_number::Int64
- page_size::Int64
Return: Vector{TimeSeriesTableSimple}, OpenAPI.Clients.ApiResponse
IPUMS.metadata_nhgis_time_series_tables_time_series_table_get — Method
Detailed time series table view
Params:
- timeseriestable::String (required)
- version::String (required)
Return: Vector{TimeSeriesTableFull}, OpenAPI.Clients.ApiResponse
IPUMS.parse_ddi — Method
parse_ddi(filepath::String)Parses a valid IPUMS DDI XML file and returns a DDIInfo object containing the IPUMS extract metadata.
Arguments
filepath::String– A string containing the path to the IPUMS DDI XML file.
Returns
A DDIInfo object that contains all of the file-level and variable-level metadata for the IPUMS extract.
Please check the documentation for DDIInfo for more information about this specific object.
Examples
Let's assume we have an extract DDI file named my_extract.xml
julia> typeof(parse_ddi("my_extract.xml"))
IPUMS.DDIInfoOpenAPI.from_json — Method
This is a pirated method that supports the extract_list method in returning additional information about page_size, page_number, and generated URLs.
TODO: Review if we want to replace extract_list's OpenAPI implementation with a manual implementation This would encompass dynamically building a URL based on the collection someone wants to use, the pagesize, the pagenumber, and execute the query.