IPUMS

Documentation for IPUMS.

IPUMS.jl Demo

IPUMS.jl is an in-development package built on OpenAPI.jl for accessing IPUMS data via their API.

Downloading Data from NHGIS

Set up the API client using your key from the NHGIS IPUMS website, then point to an extract definition file.

using IPUMS

api = IPUMSAPI("https://api.ipums.org/", Dict("Authorization" => "Your_Key"))

Submit an extract request:

test_extract_definition = "test/testdata/example_extract_request.json"

Check the extract's status:

res = extract_submit(api, "nhgis", test_extract_definition)
metadata, defn, msg = extract_info(api, res.number, "nhgis")

List all previously submitted extracts:

extract_list(api, "nhgis")

Download the result to a local path.

extract_download(api, res.number, "nhgis"; output_path = "file_downloads/")

Loading CPS Extract Data

Parse the DDI metadata file and load the extract into a DataFrame using sample data from the IPUMS.jl repo.

ddi = parse_ddi("test/testdata/cps_00157.xml")
df = load_ipums_extract(ddi, "test/testdata/cps_00157.dat")

The resulting DataFrame looks like this:

3×8 DataFrame
 Row │ YEAR   SERIAL  MONTH  ASECWTH  STATEFIP  PERNUM  ASECWT   INCTOT
     │ Int64  Int64   Int64  Float64  Int64     Int64   Float64  Int64
─────┼─────────────────────────────────────────────────────────────────────
   1 │  1962      80      3  1475.59        55       1  1475.59       4883
   2 │  1962      80      3  1475.59        55       2  1470.72       5800
   3 │  1962      80      3  1475.59        55       3  1578.75  999999998
IPUMS.DDIInfoType
DDIInfo(
    filepath::String, 
    conditions::String = "", 
    citation::String = "", 
    ipums_project::String = "",
    extract_notes::Sring = "", 
    extract_date::String = "",
    variable_info::Vector{DDIVariable} = DDIVariable[]
    _xml_doc::EzXML.Document = EzXML.XMLDocument()
    _ns::String = ""
    data_summary::DataFrame = DataFrame()
)

A struct representing the metadata taken from an IPUMS extract. An IPUMS extract contains both file-level metadata (such as the date of export), as well as variable level metadata (such as the name and data type of a variable).

The DDIInfo object is not generally called directly. The parse_ddi() function creates a DDIinfo object after successfully parsing a DDI file from an IPUMS extract.

The DDIInfo object contains file level metadata. The variable_info field of the DDIInfo object contains a vector of DDIVariable objects. DDIVariable objects contain metadata information about individual IPUMS variables.

Keyword Arguments

  • filepath::String - File system path to the DDI (.xml) file.
  • conditions::String - IPUMS legal specification on the proper use of IPUMS data.
  • citation::String - Information for the citation of IPUMS data.
  • ipums_project::String - Identifier for the IPUMS source of the extract data, such as IPUMS CPS, or IPUMS USA, etc.
  • extract_notes::String - Additional clarifying information or user nodes about the extract.
  • extract_date::String - Date on which the extract was produced.
  • variable_info::Vector{DDIVariable} - a vector of DDIVariable objects, which contain metadata on each variable or column in the data file.
  • _xml_doc::EzXML.Document - An internal attribute that contains an internal representation of the DDI DOM for parsing.
  • _ns::String - An internal attribute to hold any namespaces used in the XML DOM.
  • data_summary::DataFrame - Contains a dataframe that holds summary information about the variables in the dataset, including variable names, data types, variable descriptions, and categorical information.

Returns

  • DDIInfo object that contains both file-level and variable-level metadata extracted from an IPUMS DDI (.xml) file.

Example

julia> IPUMS.DDIInfo(filepath = "test_ddi.xml")

IPUMS.DDIInfo("test_ddi.xml", "", "", "", "", "", IPUMS.DDIVariable[], EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x00000000034466d0>)), "", 0×0 DataFrame)

References

Information about each variable field is taken from:

https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/fieldleveldocumentationfiles/schemas/codebookxsd/elements/var.html

source
IPUMS.DDIVariableType
DDIVariable(
    name::String = "", 
    position_start::Int64 = 9999,
    position_end::Int64 = 9999,
    position_width::Int64 = 9999,
    labl::String = "", 
    desc::String = "", 
    dcml::Int64 = 9999, 
    var_dtype::DataType = String,
    var_interval::String = "",
    category_labels::Union{Vector{Pair{Int64, String}}, Nothing} = nothing
    coder_instructions::Union{String, Nothing} = nothing
)

A struct representing individual variable/column metadata from an IPUMS extract file. This stuct is used for parsing the IPUMS datafile, which may be of fixed width format, hence the position_ fields. The default value for missing strings is an empty string "", while the default value for missing integer values is 9999.

Keyword Arguments

  • name::String - Name of the variable, as per the column name of the IPUMS extract file. This name is limited to 8 characters.
  • position_start::Int64 - The starting position (in columns) of a variable in a fixed width file format.
  • position_end::Int64 - The ending position (in columns) of a variable in a fixed width file format.
  • position_width::Int64 - The length (in columns) of a variable in a fixed width file format.
  • labl::String - A short description of the variable. Often the labl is used to display a description of the variable in a dataframe or display.
  • desc::String - A longer description of the variable, including information about the use of the variable.
  • dcml::Int64 - Identifies the number of decimal points in the variable.
  • var_dtype::DataType - Indentifies the Julia data type of the variable.
  • var_interval::String - Identifies if a numeric variable is discrete or continuous.
  • category_labels::Union{Vector{Pair{Int64, String}}, Nothing} - If a variable is categorical, then this is a vector of (key, value) pairs, where the key is a numerical index and the value is the category label, for example (1 => "category 1"). If a variable is not categorical, then this attribute has a value of nothing.
  • coder_instructions::Union{String, Nothing} - Contains any additional information about how the variable was coded and how it should be treated.

Returns

  • DDIVariable object specifying the metadata for each variable.

Example

julia> IPUMS.DDIVariable(
    name = "YEAR",
    position_start = 1,
    position_end = 4,
    position_width = 4, 
    labl = "Survey year",
    desc = "YEAR reports the year in which the survey was conducted.  YEARP is repeated on person records.",
    dcml = 0,
    var_dtype = String,
    var_interval = "continuous",
    category_labels = nothing,
    coder_instructions = nothing
    )

IPUMS.DDIVariable("YEAR", 1, 4, 4, "Survey year", "YEAR reports the year in which the survey was conducted.  YEARP is repeated on person records.", 0, Int64, "continuous", nothing, nothing)

References

Information about each variable field is taken from:

https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/fieldleveldocumentationfiles/schemas/codebookxsd/elements/stdyDscr.html

source
IPUMS.DataExtractType
DataExtract(;
    extractDefinition=nothing,
    number=nothing,
    status=nothing,
    downloadLinks=nothing,
)

This function prepares a data extract request for submission to the IPUMS API.

Arguments

  • extractDefinition::DataExtractDefinition- (Optional) Definition of the extracted data.
  • number::Int64- (Optional) Number of the data.
  • status::String- (Optional) Status of the data extraction (eg. "complete").
  • downloadLinks::DataExtractDownloadLinks- (Optional) Download link for the extracted data.

Returns

This function returns a DataExtract object.

Examples


julia> IPUMS.DataExtract(extractDefinition = IPUMS.DataExtractDefinition(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
                                                                        geogLevels = ["place_00498"]),
                                                   "1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
                                                                        geogLevels = ["state"])),
                                             timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
                                                                     "A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
                                             dataFormat = "csv_no_header",
                                             timeSeriesTableLayout = "time_by_row_layout",
                                             shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
                                             geographicExtents = ["united states"],
                                             description = "abc",
                                             version = 2,
                                             collection = "nhgis"),
                         number = 2,
                         status = "complete",
                         downloadLinks = IPUMS.DataExtractDownloadLinks(codebookPreview = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
                                         tableData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
                                         gisData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip")

                        )

# Output

{                                                                                                                                                  
  "extractDefinition": {                                                                                                                           
    "datasets": {                                                                                                                                  
      "1790_cPop": {                                                                                                                               
        "dataTables": [                                                                                                                            
          "NT1"                                                                                                                                    
        ],                                                                                                                                         
        "geogLevels": [                                                                                                                            
          "place_00498"                                                                                                                            
        ]                                                                                                                                          
      },                                                                                                                                           
      "1800_cPop": {                                                                                                                               
        "dataTables": [                                                                                                                            
          "NT3"                                                                                                                                    
        ],                                                                                                                                         
        "geogLevels": [                                                                                                                            
          "state"                                                                                                                                  
        ]                                                                                                                                          
      }
    },
    "timeSeriesTables": {
      "A00": {
        "geogLevels": [
          "state"
        ]
      },
      "A03": {
        "geogLevels": [
          "state"
        ]
     }
    },
    "dataFormat": "csv_no_header",
    "timeSeriesTableLayout": "time_by_row_layout",
    "shapefiles": [
      "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
    ],
    "geographicExtents": [
      "united states"
    ],
    "description": "abc",
    "version": 2,
    "collection": "nhgis"
  },
  "number": 2,
  "status": "complete",
  "downloadLinks": {
    "codebookPreview": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
    "tableData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
    "gisData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
  }
}

References

To learn more about the DataExtract type, please visit the IPUMS API DataExtract

source
IPUMS.DataExtractDefinitionType
DataExtractDefinition(;
    datasets=nothing,
    timeSeriesTables=nothing,
    dataFormat=nothing,
    timeSeriesTableLayout=nothing,
    breakdownAndDataTypeLayout=nothing,
    shapefiles=nothing,
    geographicExtents=nothing,
    description=nothing,
    version=nothing,
    collection=nothing,
)

This function creates a definition object that is used for extracting data from an IPUMS dataset.

Arguments

  • datasets::Dict{String, Dataset} - (Optional) A dictionary containing a Dataset object
  • timeSeriesTables::Dict{String, TimeSeriesTable} - (Optional) A dictionary containing a TimeSeriesTable object
  • dataFormat::String - (Optional) The requested format of the data
  • timeSeriesTableLayout::String - (Optional) The layout of the time series table data
  • breakdownAndDataTypeLayout::String - (Optional) The layout of the dataset data when multiple data types or breakdown combos are present.
  • shapefiles::Vector{String} - (Optional) A list of selected shapefiles.
  • geographicExtents::Vector{String} - (Optional) A list of geographic_instances to use as extents for all datasets on this request.
  • description::String - (Optional) A short description of your extract.
  • version::String - (Optional) the version of the IPUMS API to use (Default: "2")
  • collection::String - (Optional) What IPUMS collection to be queried for the extract (options could include "nhgis", "usa", etc. corresponding to IPUMS NHGIS or NHGIS USA databases).

Returns

The function return a new definition object to extract data from IPUMS dataset.

Examples

julia> IPUMS.DataExtractDefinition(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
                                                                        geogLevels = ["place_00498"]),
                                                   "1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
                                                                        geogLevels = ["state"])),
                           timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
                                                   "A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
                           dataFormat = "csv_no_header",
                           timeSeriesTableLayout = "time_by_row_layout",
                           shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
                           geographicExtents = ["united states"],
                           description = "abc",
                           version = 2,
                           collection = "nhgis"
                           )
# Output

{                                                                                                                                                  
  "datasets": {                                                                                                                                    
    "1790_cPop": {                                                                                                                                 
      "dataTables": [                                                                                                                              
        "NT1"                                                                                                                                      
      ],                                                                                                                                           
      "geogLevels": [                                                                                                                              
        "place_00498"                                                                                                                              
      ]                                                                                                                                            
    },                                                                                                                                             
    "1800_cPop": {                                                                                                                                 
      "dataTables": [                                                                                                                              
        "NT3"                                                                                                                                      
      ],                                                                                                                                           
      "geogLevels": [                                                                                                                              
        "state"                                                                                                                                    
      ]
    }
  },
  "timeSeriesTables": {
    "A00": {
      "geogLevels": [
        "state"
      ]
   }
  },
  "dataFormat": "csv_no_header",
  "timeSeriesTableLayout": "time_by_row_layout",
  "shapefiles": [
    "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
  ],
  "geographicExtents": [
    "united states"
  ],
  "description": "abc",
  "version": 2,
  "collection": "nhgis"
}

References

To learn more about DataExtractDefinitionobject, consult the IPUMS Developer Docs

source
IPUMS.DataExtractDownloadLinksType
DataExtractDownloadLinks(;
    codebookPreview=nothing,
    tableData=nothing,
    gisData=nothing,
)

This function provides the download links for a census extract's codebook, CSV data, and Shapefile.

Arguments

  • codebookPreview::String- (Optional) HTTP link to preview of the codebook used to encode the census data
  • tableData::String- (Optional) HTTP link to the NHGIS CSV data file for download
  • gisData::String- (Optional) HTTP link to the NHGIS Shapefile for download

Returns

The function returns a DataExtractDownloadLinks object containing the links for download.

Examples

julia> IPUMS.DataExtractDownloadLinks(codebookPreview = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
                           tableData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
                           gisData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip")

# Output

{
  "codebookPreview": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
  "tableData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
  "gisData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
}

References

For additional information on Dataset creation and download, consults the IPUMS Developer Docs

source
IPUMS.DataExtractPostType
DataExtractPost(;
    datasets=nothing,
    timeSeriesTables=nothing,
    dataFormat=nothing,
    timeSeriesTableLayout=nothing,
    breakdownAndDataTypeLayout=nothing,
    shapefiles=nothing,
    geographicExtents=nothing,
    description=nothing,
)

This function prepares a POST object for delivery to an IPUMS POST endpoint.

Arguments

  • datasets::Dict{String, Dataset} - (Optional) A dictionary containing a Dataset object
  • timeSeriesTables::Dict{String, TimeSeriesTable} - (Optional) A dictionary containing a TimeSeriesTable object
  • dataFormat::String - (Optional) a specified data format
  • timeSeriesTableLayout::String - (Optional) The layout of your time series table data.
  • breakdownAndDataTypeLayout::String - (Optional) The layout of your dataset data when multiple data types or breakdown combos are present
  • shapefiles::Vector{String} - (Optional) A list of selected shapefiles
  • geographicExtents::Vector{String} - (Optional) A list of geographic_instances to use as extents for all datasets on this request
  • description::String - (Optional) A short description of the extract.

Returns

This function returns a DataExtractPost object for delivery to an IPUMS POST endpoint.

Examples

julia> IPUMS.DataExtractPost(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
                                                                        geogLevels = ["place_00498"]),
                                                   "1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
                                                                        geogLevels = ["state"])),
                           timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
                                                   "A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
                           dataFormat = "csv_no_header",
                           timeSeriesTableLayout = "time_by_row_layout",
                           shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
                           geographicExtents = ["united states"],
                           description = "abc")

# Output

{                                                                                                                                                  
  "datasets": {                                                                                                                                    
    "1790_cPop": {                                                                                                                                 
      "dataTables": [                                                                                                                              
        "NT1"                                                                                                                                      
      ],                                                                                                                                           
      "geogLevels": [                                                                                                                              
        "place_00498"                                                                                                                              
      ]                                                                                                                                            
    },                                                                                                                                             
    "1800_cPop": {                                                                                                                                 
      "dataTables": [                                                                                                                              
        "NT3"                                                                                                                                      
      ],                                                                                                                                           
      "geogLevels": [
        "state"
      ]
    }
  },
  "timeSeriesTables": {
    "A00": {
      "geogLevels": [
        "state"
      ]
    },
    "A03": {
      "geogLevels": [
        "state"
      ]
    }
  },
  "dataFormat": "csv_no_header",
  "timeSeriesTableLayout": "time_by_row_layout",
  "shapefiles": [
    "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
  ],
  "geographicExtents": [
    "united states"
  ],
  "description": "abc"
}

Reference

For additional information on the DataExtractPost object, please refer to the IPUMS Developer Docs

source
IPUMS.DataExtractPostResponseType
DataExtractPostResponse(;
    extractDefinition=nothing,
    number=nothing,
    status=nothing,
    downloadLinks=nothing,
)

This function returns a response to a DataExtractPost request.

Arguments

  • extractDefinition::DataExtractDefinition - (Optional) A DataExtractDefinition from the original POST request
  • number::Int64 - (Optional) The extract ID number
  • status::String - (Optional) The status of data extraction
  • downloadLinks::DataExtractDownloadLinks - (Optional) The download links for the data

Returns

This function returns a DataExtractPostResponse object containing the response from the IPUMS API to the DataExtractPost request.

Examples

julia> IPUMS.DataExtractPostResponse(extractDefinition = IPUMS.DataExtractDefinition(datasets = Dict("1790_cPop" => IPUMS.Dataset(dataTables = ["NT1"],
                                                                                                                                  geogLevels = ["place_00498"]),
                                                                                                     "1800_cPop" => IPUMS.Dataset(dataTables = ["NT3"],
                                                                                                                                  geogLevels = ["state"])),
                                                                                     timeSeriesTables = Dict("A00" => IPUMS.TimeSeriesTable(geogLevels = ["state"]),
                                                                                                             "A03" => IPUMS.TimeSeriesTable(geogLevels = ["state"]) ),
                                                                                     dataFormat = "csv_no_header",
                                                                                     timeSeriesTableLayout = "time_by_row_layout",
                                                                                     shapefiles = ["https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"] ,
                                                                                     geographicExtents = ["united states"],
                                                                                     description = "abc",
                                                                                     version = 2,
                                                                                     collection = "nhgis"),
                                                                                     number = 90,
                                                                                     status = "queued",
                                                                                     downloadLinks = IPUMS.DataExtractDownloadLinks(codebookPreview = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
                                                                                                                                    tableData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
                                                                                                                                    gisData = "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"))

# Output

{                                                                                                                                                  
  "extractDefinition": {                                                                                                                           
    "datasets": {                                                                                                                                  
      "1790_cPop": {                                                                                                                               
        "dataTables": [                                                                                                                            
          "NT1"                                                                                                                                    
        ],                                                                                                                                         
        "geogLevels": [                                                                                                                            
          "place_00498"                                                                                                                            
        ]                                                                                                                                          
      },                                                                                                                                           
      "1800_cPop": {                                                                                                                               
        "dataTables": [                                                                                                                            
          "NT3"                                                                                                                                    
        ],                                                                                                                                         
        "geogLevels": [                                                                                                                            
          "state"                                                                                                                                  
        ]                                                                                                                                          
      }                                                                                                                                            
    },                                                                                                                                             
    "timeSeriesTables": {
      "A00": {
        "geogLevels": [
          "state"
        ]
      },
      "A03": {
        "geogLevels": [
          "state"
        ]
      }
    },
    "dataFormat": "csv_no_header",
    "timeSeriesTableLayout": "time_by_row_layout",
    "shapefiles": [
      "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
    ],
    "geographicExtents": [
      "united states"
    ],
    "description": "abc",
    "version": 2,
    "collection": "nhgis"
  },
  "number": 90,
  "status": "queued",
  "downloadLinks": {
    "codebookPreview": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv_PREVIEW.zip",
    "tableData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_csv.zip",
    "gisData": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/1234567/nhgis0007_shape.zip"
  }
}

References

To learn more about DataExtractPostResponse visit the IPUMS Developer Docs

source
IPUMS.DataTableFullType
DataTableFull(;
    name=nothing,
    nhgisCode=nothing,
    description=nothing,
    universe=nothing,
    sequence=nothing,
    datasetName=nothing,
    nVariables=nothing,
)

Constructor to store information about a given table within a particular IPUMS dataset.

Attributes

  • name::String - (Optional) The unique identifier for the data table within its dataset.
  • nhgisCode::String - (Optional) The code for this data table that will appear in extract.
  • description::String - (Optional) A short description of the data table.
  • universe::String - (Optional) The statistical population (set of entities) measured by this data table (e.g., persons, families, occupied housing units, etc.).
  • sequence::Int64 - (Optional) The order for which this data table will appear in the metadata API and extracts.
  • datasetName::String - (Optional) The name of the dataset from which the data are from
  • nVariables::Int64 - (Optional) A list of variables within the table.

Returns

This returns the name of the data, their nhgisCode, a description,their universe, a sequence, a dataset name and variables.

Examples


julia> IPUMS.DataTableFull(name = "NT1",
                           nhgisCode = "AAA",
                           description = "Total Population",
                           universe= "Persons",
                           sequence = 1,
                           datasetName = "1790_cPop",
                           nVariables = [IPUMS.DataTableFullVariablesInner(name = "NT001",
                                         nhgisCode = "AAA001")])

# Output

{
  "name": "NT1",
  "nhgisCode": "AAA",
  "description": "Total Population",
  "universe": "Persons",
  "sequence": 1,
  "datasetName": "1790_cPop",
  "nVariables": [
    {
      "name": "NT001",
      "nhgisCode": "AAA001"
    }
  ]
}

Reference

To find out more about the DataTableFull visit the IPUMS Developer Docs

source
IPUMS.DataTableFullVariablesInnerType
DataTableFullVariablesInner(;
    name=nothing,
    nhgisCode=nothing,
)

Inner constructor representing the variables within a DataTableFull object.

Attributes

  • name::String - (Optional) The unique identifier for the data table within its dataset.
  • nhgisCode::String - (Optional) The code for this data table that will appear in extract.

Returns

This returns an inner constructor for a DataTable variable, the DataTableFullVariablesInner object.

Examples


julia> IPUMS.DataTableFullVariablesInner(name = "NT1",
                                         nhgisCode = "AAA")

# Output

{
  "name": "NT1",
  "nhgisCode": "AAA"
}

Reference

To find out more about DataTableFullVariablesInner visit the IPUMS Developer Docs

source
IPUMS.DataTableSimpleType
DataTableSimple(;
    name=nothing,
    nhgisCode=nothing,
    description=nothing,
    sequence=nothing,
)

Builds an object that contains the representation of an IPUMS table.

Arguments

  • name::String- (Optional) The unique identifier for the data table within its dataset.
  • nhgisCode::String- (Optional) The code for this data table that will appear in extract.
  • description::String- (Optional) A short description of the data table.
  • sequence::Int64- (Optional) The order for which this data table will appear in the metadata API and extracts.

Examples

julia> IPUMS.DataTableSimple(name = "NT1",
                             nhgisCode = "AAA",
                             description = "Total Population",
                             sequence = 1)

# Output

{
  "name": "NT1",
  "nhgisCode": "AAA",
  "description": "Total Population",
  "sequence": 1
}

References

For more information about the DataTableSimple object, consult the IPUMS Developer Docs

source
IPUMS.DatasetType
Dataset(;
    dataTables=nothing,
    geogLevels=nothing,
    breakdownValues=nothing,
    years=nothing,
)

This function creates a new record of a Dataset given geographical levels, breakdown values and years.

Arguments

  • dataTables::Vector{String}- A list of available data tables for this Dataset.
  • geogLevels::Vector{String}- A list of geographic levels available for the Dataset,(eg. "county","state").
  • breakdownValues::Vector{String}- (Optional) Breakdown values available for this grouping for the available Dataset.
  • years::Vector{String}- (Optional) List of the years if data of multiple years are present.

Returns

This function returns a new Dataset object including geographic and temporal information.

Examples

julia> IPUMS.Dataset(dataTables = ["1790_cPop"],
                     geogLevels = ["state"],
                     breakdownValues =["bs32.ge00"],
                     years = ["1790"])

# Output

{
  "dataTables": [
    "1790_cPop"
  ],
  "geogLevels": [
    "state"
  ],
  "breakdownValues": [
    "bs32.ge00"
  ],
  "years": [
    "1790"
  ]
}

References

To know more about the Dataset type visit the links:

source
IPUMS.DatasetFullType
DatasetFull(;
    name=nothing,
    description=nothing,
    group=nothing,
    sequence=nothing,
    dataTables=nothing,
    geogLevels=nothing,
    hasMultipleDataTypes=nothing,
    breakdowns=nothing,
)

This function creates a dataset given a dataset description. This function is used to download data from the IPUMS website.

Arguments

  • name::String- (Optional) The dataset identifier
  • description::String- (Optional) a short description of the dataset
  • group::String- (Optional) the group of dataset to which it belongs
  • sequence::Int64- (Optional) the order of appearence of the dataset in the metadata API and extract
  • dataTables::Vector{DataTableSimple}- (Optional) The list of the available datatable for the dataset
  • geogLevels::Vector{DatasetFullGeogLevelsInner}- (Optional) A list of the geographic levels available for the dataset
  • hasMultipleDataTypes::Bool- (Optional) A boolean indicating if multiple data types exist for the dataset
  • breakdowns::DatasetFullBreakdowns- (Optional) List of breakdown available for the dataset

Returns

The function returns a new DatasetFull object.

Examples

julia> IPUMS.DatasetFull(name = "2010_SF1a",
                         description = "SF 1a - P & H Tables [Blocks & Larger Areas]",
                         group = "2010 Census",
                         sequence = 4802,
                         dataTables = [IPUMS.DataTableSimple(name = "P1",
                                                             nhgisCode = "H7V",
                                                             description = "Total Population",
                                                             sequence = 1)],
                         geogLevels = [ IPUMS.DatasetFullGeogLevelsInner(name = "nation",
                                        description = "Nation",
                                        hasGeogExtentSelection = false)],
                         hasMultipleDataTypes = false,
                         breakdowns = IPUMS.DatasetFullBreakdowns(name = "bs32",
                                                                  type = "Spatial",
                                                                  description = "Geographic Subarea (2010 Census and American Community Survey)",
                                                                  breakdownValues = [IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "bs32.ge00",
                                                                                                           description = "Total area")]))

# Output

{                                                                                                                                                  
  "name": "2010_SF1a",                                                                                                                             
  "description": "SF 1a - P & H Tables [Blocks & Larger Areas]",                                                                                   
  "group": "2010 Census",                                                                                                                          
  "sequence": 4802,                                                                                                                                
  "dataTables": [                                                                                                                                  
    {
      "name": "P1",
      "nhgisCode": "H7V",
      "description": "Total Population",
      "sequence": 1
    }
  ],
  "geogLevels": [
    {
      "name": "nation",
      "description": "Nation",
      "hasGeogExtentSelection": false
    }
  ],
  "hasMultipleDataTypes": false,
  "breakdowns": {
    "name": "bs32",
    "type": "Spatial",
    "description": "Geographic Subarea (2010 Census and American Community Survey)",
    "breakdownValues": [
      {
        "name": "bs32.ge00",
        "description": "Total area"
      }
    ]
  }
}

References

To learn more about the DatasetFull type visit the IPUMS Developer Docs

source
IPUMS.DatasetFullBreakdownsType
DatasetFullBreakdowns(;
    name=nothing,
    type=nothing,
    description=nothing,
    breakdownValues=nothing,
    years=nothing,
    geographicInstances=nothing,
)

This function creates a dataset given the description of the data.

Arguments

  • name::String - (Optional) The dataset identifier
  • type::String - (Optional) The type of dat in the dataset
  • description::String - (Optional) a short description of the dataset
  • breakdownValues::Vector{DatasetFullBreakdownsBreakdownValuesInner} - (Optional) List of breakdown available for the dataset
  • years::Vector{String}- (Optional) List of the years if data of multiple years are present
  • geographicInstances::Vector{DatasetFullBreakdownsBreakdownValuesInner} - (Optional) List of geographical extents

Returns

The function returns a new DatasetFullBreakdowns object.

Examples

julia> IPUMS.DatasetFullBreakdowns(name = "bs32",
                                   type = "Spatial",
                                   description = "Geographic Subarea (2010 Census and American Community Survey)",
                                   breakdownValues = [IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "bs32.ge00",
                                                                                                           description = "Total area")],
                                   years = ["2010"],
                                   geographicInstances = [IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "010",
                                                                                                           description = "Alabama")])

# Output

{
  "name": "bs32",
  "type": "Spatial",
  "description": "Geographic Subarea (2010 Census and American Community Survey)",
  "breakdownValues": [
    {
      "name": "bs32.ge00",
      "description": "Total area"
    }
  ],
  "years": [
    "2010"
  ],
  "geographicInstances": [
    {
      "name": "010",
      "description": "Alabama"
    }
  ]
}

References

To find out more about the Dataset type visit the IPUMS Developer Docs.

source
IPUMS.DatasetFullBreakdownsBreakdownValuesInnerType
DatasetFullBreakdownsBreakdownValuesInner(;
    name=nothing,
    description=nothing,
)

Inner constructor representing the variables within a DatasetFull object.

Arguments

  • name::String - (Optional) The dataset identifier
  • description::String - (Optional) a short description of the dataset

Returns

This function returns an inner constructor for an DatasetFullBreakdownsBreakdownValuesInner object. This object is used in constructing a DatasetFull object.

Examples

julia> IPUMS.DatasetFullBreakdownsBreakdownValuesInner(name = "1790_cPop",
                                                       description = "1790 Census: Population Data [US, States & Counties]")

# Output

{
  "name": "1790_cPop",
  "description": "1790 Census: Population Data [US, States & Counties]"
}

References

To find out more about the Dataset type visit the IPUMS Developer Docs

source
IPUMS.DatasetFullGeogLevelsInnerType
DatasetFullGeogLevelsInner(;
    name=nothing,
    description=nothing,
    hasGeogExtentSelection=nothing,
)

This function gives the geographical information about data and its description.

Arguments

  • name::String - (Optional) The dataset identifier
  • description::String - (Optional) A short description of the dataset
  • hasGeogExtentSelection::Bool - (Optional) boolean indicating if the dataset has geogrphical extent

Returns

The function returns a DatasetFullGeogLevelsInner object with geographic information related to a dataset.

Examples

julia> IPUMS.DatasetFullGeogLevelsInner(name = "1790_cPop",
                                        description = "1790 Census: Population Data [US, States & Counties]",
                                        hasGeogExtentSelection = 1)

# Output

{
  "name": "1790_cPop",
  "description": "1790 Census: Population Data [US, States & Counties]",
  "hasGeogExtentSelection": true
}

References

To learn more about the DatasetFullGeogLevelsInner visit the IPUMS Developer Docs

source
IPUMS.DatasetSimpleType
DatasetSimple(;
    name=nothing,
    description=nothing,
    group=nothing,
    sequence=nothing,
)

This function creates a dataset reference with a provided name, description, group, and sequence.

Arguments

  • name::String- (Optional) The dataset identifier
  • description::String- (Optional) A short description of the dataset
  • group::String- (Optional) The group of datasets to which this dataset belongs.
  • sequence::Int64- (Optional) The order in which the dataset will appear in the metadata API and extracts.

Returns

This function returns a DatasetSimple object with the properties specified by the function arguments.

Examples

julia> IPUMS.DatasetSimple(name = "1790_cPop",
                           description = "1790 Census: Population Data [US, States & Counties]",
                           group = "1790 Census",
                           sequence =  101)

# Output

{
  "name": "1790_cPop",
  "description": "1790 Census: Population Data [US, States & Counties]",
  "group": "1790 Census",
  "sequence": 101
}

References

To find out more about the Dataset type visit IPUMS API Dataset

source
IPUMS.ErrorType
Error(;
    type=nothing,
    status=nothing,
    detail=nothing,
)

This function returns the error message from a dataset request to the IPUMS API.

Arguments

  • type::Int64- The error code for the message
  • status::String - The actual error message returned
  • detail::String - Additional explanation about the cause of the error

Returns

This function return an Error object containing the reasons for the failure of a request to the IPUMS API.

Examples


julia> IPUMS.Error(type = 400,
                   status = "SemanticValidationError",
                   detail = "Geographic extents Extent selection is not required for selected geog levels. Please remove the 'geographic_extents' section of you request." )

# Output

{
  "type": 400,
  "status": "SemanticValidationError",
  "detail": "Geographic extents Extent selection is not required for selected geog levels. Please remove the 'geographic_extents' section of you request."
}

References

To learn more about errors, please see:

source
IPUMS.IPUMSAPIMethod
IPUMSAPI(url::String, headers::Dict)

Create an IPUMS API object for querying the IPUMS server.

NOTE: If you need more control of how the connection to the server is made, consider using the OpenAPI.jl package and creating an Client object that you can pass into IPUMSAPI() directly.

Arguments

  • url::String – The URL of the server. This is the base URL that will be used for all API calls.

  • headers::Dict – A dictionary of HTTP headers to be sent with all API calls.

Returns

api::IPUMSAPI – An API object that can be passed to all IPUMS.jl methods that require an API object.

source
IPUMS.IPUMSSourceType
IPUMSSource(
    proj_name::String, 
    url_name::String, 
    collection_type::String, 
    code_for_api::String = "", 
    api_support::Bool = false, 
    home_url::String = ""
)

A struct representing sources that IPUMS provides.

Arguments

  • proj_name::String - Name of the IPUMS project.
  • url_name::String - Name of the project as used in that project's website URL.
  • collection_type::String - Either "microdata" or "aggregate data" indicating the type of data this collection provides.

Keyword Arguments

  • code_for_api::String - The name of the project used when interacting with the IPUMS API (for collections that are supported by the API). (Default: "")
  • api_support::Bool - Logical indicating whether the collection is supported by the IPUMS API. (Default: false)
  • home_url::String - URL for the project's homepage. (Default: "")

Returns

  • IPUMSSource object specifying the previous parameters

Example

julia> IPUMSSource(
    proj_name = "IPUMS USA",
    url_name = "usa",
    collection_type = "microdata",
    api_support = true
)

IPUMS.IPUMSSource("IPUMS USA", "usa", "microdata", "", true, "")
source
IPUMS.ShapefileType
Shapefile(;
    name=nothing,
    year=nothing,
    geographicLevel=nothing,
    extent=nothing,
    basis=nothing,
    sequence=nothing,
)

This function creates a reference to an NHGIS shapefile for an IPUMS dataset.

Attributes

  • name::String- (Optional) The unique identifier of the shapefile.
  • year::String- (Optional) The survey year in which the file's represented areas were used for tabulations.
  • geographicLevel::String- (Optional) The geographic level of the shapefile.
  • extent::String- (Optional) The geographic extent which is covered by the shapefile.
  • basis::String- (Optional) The derivation source of the shapefile.
  • sequence::Int64- (Optional) The order the shapefile in which appears in the metadata API.

Returns

This function returns a Shapefile object containing the attributes specified in the function arguments.

Examples

julia> IPUMS.Shapefile(name = "base.tl2000.nongen.us_state_1790",
                       year = "1790",
                       geographicLevel = "state",
                       extent = "united states",
                       basis = "2000 tiger/line +",
                       sequence =  1)

# Output

{
  "name": "base.tl2000.nongen.us_state_1790",
  "year": "1790",
  "geographicLevel": "state",
  "extent": "united states",
  "basis": "2000 tiger/line +",
  "sequence": 1
}

References

Additional information about this object is available in the IPUMS Developer Docs

source
IPUMS.TimeSeriesTableType
TimeSeriesTable(;
    geogLevels=nothing,
    years=nothing,
)

This function creates a table record with a given geographical level and year information.

Arguments

  • geogLevels::Vector{String} - A vector containing geographical levels, (eg."state", "county"), for the Time Series Table
  • years::Vector{String} - (Optional) A list of years for this Time Series Table.

Returns

This function returns a Time Series Table record giving the geographical level and the years to which the data are referring.

Examples

julia> IPUMS.TimeSeriesTable(geogLevels=["state"],
                             years =["1790"])

# Outputs

{
  "geogLevels": [
    "state"
  ],
  "years": [
    "1790"
  ]
}

References

To find additional information on the Time Series Table, please refer to:

source
IPUMS.TimeSeriesTableFullType
TimeSeriesTableFull(;
    name=nothing,
    description=nothing,
    geographicIntegration=nothing,
    sequence=nothing,
    timeSeries=nothing,
    geogLevels=nothing,
)

This function returns an object containing the attributes for downloading a Time Series Table.

Arguments

  • name::String- (Optional) The unique variable identifier for the time series table, (eg. "A00", "OWNERSHP").
  • description::String- (Optional) A short description of the time series variable referred to in name.
  • geographicIntegration::String- (Optional) Specifies how the variable accounts for changes in geographic boundaries over time, (eg. "Nominal").
  • sequence::Float32- (Optional) The order of appearence of the dataset in the metadata API and extract.
  • timeSeries::Vector{TimeSeriesTableFullTimeSeriesInner}- (Optional) A list of time series records corresponding to the variable specified in name.
  • geogLevels::Vector{TimeSeriesTableFullTimeSeriesInner}- (Optional) A list of geographic levels available for this time series table.

Returns

This function return a TimeSeriesTableFull object containing the variable name, description, time series, and geographical information of the data.

Examples

julia> IPUMS.TimeSeriesTableFull(name="A00",
                                description= "Total Population", 
                                geographicIntegration= "Nominal", 
                                sequence= 0.01, 
                                timeSeries=[IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "AA",
                                                                                     description = "Persons: Total",
                                                                                     sequence = 1 )], 
                                geogLevels= [ IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "state",
                                                                                       description = "State",
                                                                                       sequence = 4 ), 
                                              IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "county",
                                                                                       description = "State--County",
                                                                                       sequence = 25 )])
# Output
{
  "name": "A00",
  "description": "Total Population",
  "geographicIntegration": "Nominal",
  "sequence": 0.01,
  "timeSeries": [
    {
      "name": "AA",
      "description": "Persons: Total",
      "sequence": 1
    }
  ],
  "geogLevels": [
    {
      "name": "state",
      "description": "State",
      "sequence": 4
    },
    {
      "name": "county",
      "description": "State--County",
      "sequence": 25
    }
  ]
}

References

For additional information please refer to the following sources:

source
IPUMS.TimeSeriesTableFullTimeSeriesInnerType
TimeSeriesTableFullTimeSeriesInner(;
    name=nothing,
    description=nothing,
    sequence=nothing,
)

This function creates a reference to an IPUMS Time Series table.

Arguments

  • name::String - (Optional) The unique identifier of the time series table.
  • description::String - (Optional) A short description of the time series table.
  • sequence::Int64 - (Optional) The order in which the time series table will appear in the metadata API and extracts.

Returns

This function returns a TimeSeriesTableFullTimeSeriesInner object that contains information about a desired Time Series table.

Examples

julia> IPUMS.TimeSeriesTableFullTimeSeriesInner(name = "1790_cPop",
                                                description =  "1790 Census: Population Data [US, States & Counties]",
                                                sequence = 101)
# Output

{
  "name": "1790_cPop",
  "description": "1790 Census: Population Data [US, States & Counties]",
  "sequence": 101
}

References

Additional information about the TimeSeriesTableFullTimeSeriesInner object is available in the IPUMS Developer Docs

source
IPUMS.TimeSeriesTableSimpleType
TimeSeriesTableSimple(;
    name=nothing,
    description=nothing,
    geographicIntegration=nothing,
    sequence=nothing,
    timeSeries=nothing,
    geogLevels=nothing,
)

This function creates a new Simple Time Series Table record to support downloading of the corresponding data.

Arguments

  • name::String- (Optional) The unique variable identifier for the time series table, (eg. "A00", "OWNERSHP").
  • description::String- (Optional) A short description of the time series variable referred to in name.
  • geographicIntegration::String- (Optional) How does the variable value account for changes in geographic boundaries over time, (eg. "Nominal").
  • sequence::Float32- (Optional) The order of appearence of the dataset in the metadata API and extract.
  • timeSeries::Vector{String}- (Optional) A list of time series records corresponding to the variable specified in name.
  • geogLevels::Vector{String}- (Optional) A list of geographic levels available for this time series table.

Returns

This function returns a new TimeSeriesTableSimple object.

Examples

julia> IPUMS.TimeSeriesTableSimple(name = "A00",
                                   description = "Total Population",
                                   geographicIntegration = "Nominal",
                                   sequence = 0.01,
                                   timeSeries = ["1790", "1800"],
                                   geogLevels = ["state", "county"] )

# Output

{
  "name": "A00",
  "description": "Total Population",
  "geographicIntegration": "Nominal",
  "sequence": 0.01,
  "timeSeries": [
    "1790",
    "1800"
  ],
  "geogLevels": [
    "state",
    "county"
  ]
}

References

To find out more about the TimeSeriesTableSimple type visit the IPUMS Developer Docs

source
IPUMS._check_that_file_existsMethod
_check_that_file_exists(filepath::String)

This is an internal function and checks whether the provided file exists or not.

Arguments

  • filepath::String - A file path that the user wishes to parse. The file must be an existing XML file.

Returns

The function returns nothing if the file exists. If the file does not exist, then the function raises an ArgumentError.

source
IPUMS._check_that_file_is_datMethod
_check_that_file_is_dat(filepath::String)

This is an internal function and checks whether the provided file is a DAT file. All IPUMS extract data files should be in DAT format.

Arguments

  • filepath::String - A file path that the user wishes to import. The file must be a DAT file.

Returns

The function returns nothing if the file is a DAT file. If the file is not a DAT file, then the function raises an ArgumentError.

source
IPUMS._check_that_file_is_xmlMethod
_check_that_file_is_xml(filepath::String)

This is an internal function and checks whether the provided file is an XML file. All DDI files should be in XML format.

Arguments

  • filepath::String - A file path that the user wishes to parse. The file must be an XML file.

Returns

The function returns nothing if the file is an XML file. If the file is not an XML file, then the function raises an ArgumentError.

source
IPUMS._get_var_metadata_from_ddi!Method
_get_var_metadata_from_ddi!(ddi::DDIInfo)

This is an internal function and not meant for the public API. This function iterates over the variable nodes in the DDI XML file nodes. The data from each variable node is collected in a DDIVariable object, and a vector of those DDIVariable object is save in the DDIInfo object.

Arguments

  • ddi::DDIInfo - A DDIInfo object that will retain all of the parsed metadata.

Returns

The function return the original DDIInfo object with updated data in the attributes.

source
IPUMS._parse_column_float!Method
_parse_column_float!(col, data, col_start, col_end, line_len, decimals)

Internal function that parses a single floating-point column from all rows of a memory-mapped fixed-width IPUMS data file. Float values in IPUMS files are encoded as integers (e.g. "12345" with decimals=2 represents 123.45). This function parses the integer from raw bytes and divides by 10^decimals to recover the float value.

Arguments

  • col::Vector{Union{Missing, Float64}} - A pre-allocated column vector to hold the parsed float values.
  • data::Vector{UInt8} - The memory-mapped file contents as a byte array.
  • col_start::Int - The starting byte position of the field within a line.
  • col_end::Int - The ending byte position of the field within a line.
  • line_len::Int - The number of bytes per line (including the newline character).
  • decimals::Int - The number of implied decimal places in the encoded integer.

Returns

This function does not return any output. Instead it modifies the provided column vector in-place.

source
IPUMS._parse_column_int!Method
_parse_column_int!(col, data, col_start, col_end, line_len)

Internal function that parses a single integer column from all rows of a memory-mapped fixed-width IPUMS data file. Each field is located using arithmetic byte offsets and parsed directly from the raw bytes.

Arguments

  • col::Vector{Union{Missing, Int64}} - A pre-allocated column vector to hold the parsed integer values.
  • data::Vector{UInt8} - The memory-mapped file contents as a byte array.
  • col_start::Int - The starting byte position of the field within a line.
  • col_end::Int - The ending byte position of the field within a line.
  • line_len::Int - The number of bytes per line (including the newline character).

Returns

This function does not return any output. Instead it modifies the provided column vector in-place.

source
IPUMS._parse_column_string!Method
_parse_column_string!(col, data, col_start, col_end, line_len)

Internal function that parses a single string column from all rows of a memory-mapped fixed-width IPUMS data file. Each field is located using arithmetic byte offsets, stripped of leading and trailing spaces, and converted to a Julia String.

Arguments

  • col::Vector{Union{Missing, String}} - A pre-allocated column vector to hold the parsed string values.
  • data::Vector{UInt8} - The memory-mapped file contents as a byte array.
  • col_start::Int - The starting byte position of the field within a line.
  • col_end::Int - The ending byte position of the field within a line.
  • line_len::Int - The number of bytes per line (including the newline character).

Returns

This function does not return any output. Instead it modifies the provided column vector in-place.

source
IPUMS._parse_int_bytesMethod
_parse_int_bytes(data, start, stop)

Internal function that parses an integer value directly from a range of bytes in a memory-mapped file. This avoids allocating any String or SubString objects during parsing.

Arguments

  • data::Vector{UInt8} - The memory-mapped file contents as a byte array.
  • start::Int - The starting byte position of the field.
  • stop::Int - The ending byte position of the field.

Returns

Returns the parsed Int64 value, or missing if the field contains only whitespace.

source
IPUMS._read_ddi_and_parse_extract_level_metadata!Method
_read_ddi_and_parse_extract_level_metadata!(ddi::DDIInfo)

This is an internal function and not meant for the public API. This function parses the DDI XML file and captures the file-level metadata.

Arguments

  • ddi::DDIInfo - A DDIInfo object that will retain all of the parsed metadata.

Returns

The function return the original DDIInfo object with updated data in the attributes.

source
IPUMS._string_to_numMethod
_string_to_num(x::SubString{String})

This is an internal function and not meant for the public API. This function takes a text string and returns only the numeric portion of the string. For example in the input is "Codes999999", the function will return an Int64 with the value 999999.

Arguments

  • x::SubString{String} - A string that may contain some numeric data mixed with text.

Returns

This function returns the numeric part of the string, coded as an Int64 datatype.

source
IPUMS.basepathMethod

The default API base path for APIs in IPUMSAPI. This can be used to construct the OpenAPI.Clients.Client instance.

source
IPUMS.extract_downloadMethod
function extract_download(
    api::IPUMSAPI, 
    extract_number::Int, 
    collection::String; 
    output_path::String = pwd(), 
    codebook_name::String = nothing, 
    table_data_name::String = nothing, 
    gis_data_name::String = nothing, 
    codebook::Bool = true, 
    table_data::Bool = true, 
    gis_data::Bool = true
)

Download files associated with a given IPUMS data extract.

Arguments

  • api::IPUMSAPI – An IPUMSAPI object to establish connection details.

  • extract_number::Int – extract ID assigned to the IPUMS data extract.

  • collection::String – What IPUMS collection to be queried for the extract (options could include "nhgis", "usa", etc. corresponding to IPUMS NHGIS or NHGIS USA databases).

Keyword Arguments

  • output_path::String – The path (location on computer) to output all downloaded files (Default: current working directory).

  • codebook_name::String – What the name of the codebook file should be (Default: `nothing).

  • table_data_name::String – What the name of the table data file should be (Default: `nothing).

  • gis_data_name::String – What the name of the GIS file should be (Default: `nothing).

  • codebook::Booltrue to download the codebook file for the extract; false to not download it (Default: true)

  • table_data::Booltrue to download the table data file for the extract; false to not download it (Default: true)

  • gis_data::Booltrue to download the GIS file for the extract; false to not download it (Default: true)

Returns

The path (location on computer) where the files were downloaded to.

Examples

julia> extract_download(api, 1, "nhgis"; output_path = "file_downloads/", codebook = false, gis_data_name = "GIS_1", table_data_name = "DATA_1")
[ Info: Table data for Extract 1 downloaded to file_downloads/DATA_1.zip.
[ Info: GIS data for Extract 1 downloaded to file_downloads/DATA_1.zip.
"file_downloads/"

julia> extract_download(api, 2, "nhgis", output_path="file_downloads/")
┌ Warning: Extract 2 has expired and the associated data cannot be downloaded any longer. If you would like to download the data for
 this extract, please resubmit the extract request associated with this extract again to create a new extract with the same data fro
m this extract.
└ @ IPUMS ~/FOSS/IPUMS.jl/src/apis/api_IPUMSAPI.jl:213
source
IPUMS.extract_infoMethod
extract_info(
    api::IPUMSAPI,
    extract_number::Int,
    collection::String;
    version::String = "2"
)

Get information about a specific data extract.

Arguments

  • api::IPUMSAPI – An IPUMSAPI object to establish connection details.

  • extract_number::Int – extract ID assigned to the IPUMS data extract.

  • collection::String – What IPUMS collection to be queried for the extract (options could include "nhgis", "usa", etc. corresponding to IPUMS NHGIS or NHGIS USA databases).

Keyword Arguments

  • version::String – What version of the IPUMS API to use (Default: "2")

Returns

metadata::{String, Any} – A dictionary containing metadata about the queried data extract:

  • number – The IPUMS data extract ID

  • timeSeriesTableLayout – Layout of the the time series tables. Can be one of the following:

    • "time_by_column_layout" (wide format, default): rows correspond to geographic units, columns correspond to different times in the time series

    • "time_by_row_layout" (long format): rows correspond to a single geographic unit at a single point in time

    • "time_by_file_layout": data for different times are provided in separate files

  • geographicExtents – Vector of geographic extents to use for all of the datasets in the extract definition.

  • status – The current status of the IPUMS data extract (such as "completed" for a request being done). Potential results include:

    • "queued"

    • "started"

    • "produced"

    • "canceled"

    • "failed"

    • "completed"

  • description – The associated description about the data extract.

  • timeSeriesTables – Vector of time series tables for use in the extract definition.

  • version – What version of the API is being used for handling this request.

  • dataFormat – The desired format of the extract data file.

    • "csv_no_header" (default) includes only a minimal header in the first row

    • "csv_header" includes a second, more descriptive header row.

    • "fixed_width" provides data in a fixed width format

  • breakdownAndDataTypeLayout – The desired layout of any datasets that have multiple data types or breakdown values. Potential values can be:

    • "single_file" (default) keeps all data types and breakdown values in one file

    • "separate_files" splits each data type or breakdown value into its own file

  • shapefiles – Report what shapefiles were requested and used in this extract.

  • downloadUrls – URLs to download the data from the requested extract.

  • datasets – What datasets were used in this extract.

  • collection – What collection is being queried.

NOTE: To be ready to download, an extract must have a completed status. However, some requests that are completed may still be unavailable for download, as extracts expire and are removed from IPUMS servers after a set period of time (72 hours for microdata collections, 2 weeks for IPUMS NHGIS). If an extract has expired, a warning from this function will be emitted.

defn::IPUMS.DataExtractDefinition – The associated data extract definition that was used to generate this extract.

msg::OpenAPI.Clients.ApiResponse – The response message from the IPUMS API.

Examples

julia> metadata, defn, msg = extract_info(api, 1, "nhgis", "2");

julia> metadata
Dict{String, Any} with 13 entries:
  "number"        => 1
  "timeSeriesTab… => "time_by_file_layout"
  "geographicExt… => ["010"]
  "status"        => "completed"
  "description"   => "example extract request"
  "timeSeriesTab… => Dict{String, TimeSeriesTable}…
  "version"       => 2
  "dataFormat"    => "csv_no_header"
  "breakdownAndD… => "single_file"
  "shapefiles"    => ["us_state_1790_tl2000"]
  "downloadUrls"  => Dict("codebookPreview"=>"http…
  "datasets"      => Dict{String, Dataset}("2000_S…
  "collection"    => "nhgis"
source
IPUMS.extract_listMethod
extract_list(
    api::IPUMSAPI, 
    collection::String;
    version::String = "2", 
    extracts::Int64 = 10, 
    _mediaType=nothing
)

Get a list of recent data extracts.

NOTE: This function emits warnings when returned extracts are expired.

Arguments

  • api::IPUMSAPI – An IPUMSAPI object to establish connection details.

  • collection::String – What IPUMS collection to be queried for the extract (options could include "nhgis", "usa", etc. corresponding to IPUMS NHGIS or IPUMS USA databases).

Keyword Arguments

  • version::String – What version of the IPUMS API to use (Default: "2").

  • extracts::Int64 – Starting from the newest extract, get the most recent desired number of extracts (Default: "10").

Returns

  • Vector{DataExtract} – a vector of DataExtract objects that contains the relevant extract number (number), its IPUMS status (status), the definition used to generate the associated definition (extractDefinition), and links to download the extract's data (downloadLinks).

Examples

julia> res = extract_list(api, "nhgis")
┌ Warning: Extract 1 has expired and the associated data cannot be downloaded any longer. If you would like to download the data for this extract, please resubmit the extract request associated with this extract again to create a new extract with the same data from this extract.
└ @ IPUMS 
2-element Vector{IPUMS.DataExtract}:
 {
  "extractDefinition": {
    #=
    ...
    Extract definition details here
    ...
    =#
  },
  "number": 2,
  "status": "completed",
  "downloadLinks": {
    "codebookPreview": "nhgis0002_csv_PREVIEW.zip",
    "tableData": "nhgis0002_csv.zip",
    "gisData": "nhgis0002_shape.zip"
  }
}

 {
  "extractDefinition": {
    #=
    ...
    Extract definition details here
    ...
    =#
  },
  "number": 1,
  "status": "completed",
  "downloadLinks": {}
}

TIP: If you want to record all the data extracts that are expired, you can loop through each of the returned extracts and check if the downloadLinks field is empty. If it is, that means it is expired.

source
IPUMS.extract_submitFunction
extract_submit(
    api::IPUMSAPI, 
    collection::String, 
    extract_definition::String = nothing; 
    version::String = "2", 
    _mediaType=nothing
)

Submit an extract definition to IPUMS for IPUMS to generate a data extract with requested data.

Arguments

  • api::IPUMSAPI – An IPUMSAPI object to establish connection details.

  • collection::String – What IPUMS collection to be queried for the extract (options could include "nhgis", "usa", etc. corresponding to IPUMS NHGIS or IPUMS USA databases).

  • extract_definition::String – The location of a file storing the extract definition you want to submit.

Keyword Arguments

  • version::String – What version of the IPUMS API to use (Default: "2").

Returns

  • DataExtractPostResponse – Upon a successful submission, this object will contain a copy of the extract definition submitted, the extract ID, its status, and any relevant download links.

Examples

julia> res = extract_submit(api, "nhgis", my_extract_definition_file)
{
  "extractDefinition": {
    #=
    ...
    Extract definition details here
    ...
    =#
  },
  "number": 4,
  "status": "queued",
  "downloadLinks": {}
}

julia> res = extract_submit(api, "nhgis", "fake_file.json")
[ Info: The value you provided for the argument `extract_definition` ("fake_file.json") is not a valid filepath. Please update the path to your data extract.
┌ Error: ArgumentError("invalid JSON at byte position 1 while parsing type JSON3.False: InvalidChar\nfake_file.json\n")
└ @ IPUMS 
┌ Error: The extract definition submission request was not successful. Please review your extract definition and try again.
└ @ IPUMS 
source
IPUMS.ipums_data_collectionsMethod

ipums_data_collections()

List IPUMS data collections with their corresponding codes used by the IPUMS API. Unlisted data collections are not yet supported by the IPUMS API.

Returns

  • DataFrame with four columns containing the full collection name, the type of data the collection provides, the collection code used by the IPUMS API, and the status of API support for the collection.

Example

julia> ipums_data_collections()

 Row │ collection_name      collection_type  code_for_api  api_support 
     │ String               String           String        Bool        
─────┼─────────────────────────────────────────────────────────────────
   1 │ IPUMS USA            microdata                             true
   2 │ IPUMS CPS            microdata                             true
   3 │ IPUMS International  microdata        ipumsi               true
   ...
source
IPUMS.load_ipums_extractMethod
load_ipums_extract(ddi::DDIInfo, extract_filepath::String)

This file will take in a parsed DDIInfo object and file path to an IPUMS
DAT extract file, and returns a dataframe containing all of the data.

Arguments

  • ddi::DDIInfo - A DDIInfo object, which is the result of parsing a DDI metadata file.
  • extract_filepath::String - The directory path to an IPUMS extract DAT file.

Returns

This function outputs a Julia Dataframe that contains all of the data from 
the IPUMS extract file. Further, the metadata fields of the Dataframe 
contain the metadata parsed from the DDI file.

Examples

Let's assume we have an extract DDI file named my_extract.xml, and an extract file called my_extract.dat.

julia> ddi = parse_ddi("my_extract.xml");
julia> df = load_ipums_extract(ddi, "my_extract.dat");
source
IPUMS.parse_ddiMethod
parse_ddi(filepath::String)

Parses a valid IPUMS DDI XML file and returns a DDIInfo object containing the IPUMS extract metadata.

Arguments

  • filepath::String – A string containing the path to the IPUMS DDI XML file.

Returns

A DDIInfo object that contains all of the file-level and variable-level metadata for the IPUMS extract.

Please check the documentation for DDIInfo for more information about this specific object.

Examples

Let's assume we have an extract DDI file named my_extract.xml

julia> typeof(parse_ddi("my_extract.xml"))
IPUMS.DDIInfo
source
OpenAPI.from_jsonMethod

This is a pirated method that supports the extract_list method in returning additional information about page_size, page_number, and generated URLs.

TODO: Review if we want to replace extract_list's OpenAPI implementation with a manual implementation This would encompass dynamically building a URL based on the collection someone wants to use, the pagesize, the pagenumber, and execute the query.

source