| Title: | Wrapper for Statistics Portugal API |
|---|---|
| Description: | An R6-based client to facilitate interaction with the Statistics Portugal (Instituto Nacional de Estatistica - INE) API (<https://www.ine.pt/xportal/xmain?xpid=INE&xpgid=ine_api&INST=322751522&xlang=en>). |
| Authors: | Carlos Matos [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-1134-0396>) |
| Maintainer: | Carlos Matos <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.1 |
| Built: | 2026-05-30 09:31:32 UTC |
| Source: | https://github.com/c-matos/ineptr2 |
An R6 class providing access to the Statistics Portugal (INE) API. Holds configuration state (language, caching preferences) and provides methods for retrieving data, metadata, and indicator catalog.
See INEClient-fields for configurable fields (language,
caching, timeouts, etc.).
get_data(indicator, row_limit, ...)Retrieve tidy data for an indicator, with automatic chunking and optional caching.
download_data(indicator, row_limit, ...)Download data to the file cache without loading into memory.
load_raw_data(indicator)Load previously downloaded raw JSON data from the file cache.
preview_chunks(indicator, row_limit, ...)Preview how many API chunks a download would require.
get_metadata(indicator)Get cleaned metadata for an indicator.
info(indicator)Print a summary of an indicator's key properties.
get_dim_info(indicator)Get dimension descriptions.
get_dim_values(indicator, dims)Get possible values for all dimensions.
is_valid(indicator)Check if an indicator exists.
is_updated(indicator, last_updated, metadata)Check if an indicator has been updated since last download.
get_catalog()Download and parse the full indicator catalog (~10 min).
download_catalog()Download the catalog to the file cache.
list_cached()List indicators present in the file cache.
clear_cache(indicator)Clear cached files.
langLanguage code ("PT" or "EN").
use_cacheWhether caching is enabled.
cache_dirCache directory path, or NULL for default.
row_limitDefault maximum output rows per API request.
max_retriesMaximum retry attempts for chunk downloads.
progress_intervalPrint progress every N chunks during downloads.
timeoutTimeout in seconds for API requests.
new()
Create a new INE API client.
INEClient$new( lang = "PT", use_cache = FALSE, cache_dir = NULL, row_limit = 1000000L, max_retries = 3L, progress_interval = 10L, timeout = 300 )
langLanguage code: "PT" (default) or "EN".
use_cacheLogical. Whether to cache API responses. Default FALSE.
cache_dirCharacter or NULL. Cache directory path.
If NULL (default), uses tools::R_user_dir("ineptr2", "cache").
row_limitInteger. Default maximum output rows per API request.
Default 1000000L.
max_retriesInteger. Maximum retry attempts for failed chunk
downloads. Default 3L.
progress_intervalInteger. Print a progress message every N chunks
during downloads. Default 10L.
timeoutNumeric. Timeout in seconds for API requests (metadata
and data endpoints). Default 300 (5 minutes). The catalog endpoint
uses a separate, longer timeout.
A new INEClient object.
get_data()
Retrieve tidy data for an indicator.
INEClient$get_data(indicator, row_limit = NULL, ...)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
row_limitInteger or NULL. Maximum output rows per API request
before splitting into multiple calls. If NULL (default), uses the
client's row_limit field. See Details.
...Dimension filters. Each argument should be named dimN
(where N is the dimension number) with a character vector of values.
Omitted dimensions include all values.
The INE API limits each request to 1 000 000 output rows, counted
as the product of unique values across all dimensions. When the
estimated row count exceeds row_limit, the request is automatically
split into smaller chunks by iterating over one or more dimensions.
If requests are timing out, try lowering row_limit (or increasing
the client's timeout field) to produce more, smaller chunks.
When use_cache is enabled, processed data is stored as an RDS file.
Subsequent calls with the same or narrower dimension filters return
the cached result without hitting the API. Changing filters to
include values outside the cached set triggers a fresh download.
A data frame with the indicator data.
download_data()
Download data for an indicator to the file cache without
loading it into memory. Caching is temporarily enabled for the
duration of the call regardless of the client's use_cache setting.
INEClient$download_data(indicator, row_limit = NULL, ...)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
row_limitInteger or NULL. Maximum output rows per API request
before splitting into multiple calls. If NULL (default), uses the
client's row_limit field.
...Dimension filters in the form dimN = value.
Invisibly, a list with indicator, cache_dir, total_chunks,
and complete, or invisible(NULL) on partial download failure
(resume by calling again).
load_raw_data()
Load previously downloaded raw data from the file cache
as a list of parsed JSON responses. Use download_data() first to
populate the cache.
INEClient$load_raw_data(indicator)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
A list with responses (parsed JSON) and urls.
get_metadata()
Get cleaned metadata for an indicator.
INEClient$get_metadata(indicator)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
API response body as a list.
get_catalog()
Get the full INE indicator catalog.
This operation is very time-consuming (~10 minutes) as it downloads
the entire catalog from the INE API. Consider using download_catalog()
to cache the result for subsequent calls.
INEClient$get_catalog()
A data frame with one row per indicator.
download_catalog()
Download the INE indicator catalog to the file cache
without loading it into memory. This operation is time-consuming
(~10 minutes) as it downloads the entire catalog from the INE API.
Subsequent calls return the cached file immediately. Caching is
temporarily enabled for the duration of the call regardless of
the client's use_cache setting.
INEClient$download_catalog()
Invisibly, the cache file path.
info()
Print a summary of an indicator's key properties: code, name, periodicity and time range, last update date, and a per-dimension breakdown of unique values. Labels are displayed in the client's current language.
INEClient$info(indicator)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
Invisibly, a list with code, name, periodicity,
first_period, last_period, last_updated, and
dimensions (a data frame with dim_num, name, and
n_values columns).
get_dim_info()
Get dimension descriptions for an indicator.
INEClient$get_dim_info(indicator)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
A data frame with dim_num, abrv, and versao columns.
get_dim_values()
Get possible values for all dimensions of an indicator.
INEClient$get_dim_values(indicator, dims = NULL)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
dimsInteger vector of dimension numbers to include,
or NULL (default) for all dimensions.
A tidy data frame with dimension values.
preview_chunks()
Preview how many API chunks a download would require, without fetching any data. Useful for estimating download time before committing to a large request.
INEClient$preview_chunks(indicator, row_limit = NULL, ...)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
row_limitInteger or NULL. Maximum output rows per API request
before splitting into multiple calls. If NULL (default), uses the
client's row_limit field.
...Dimension filters in the form dimN = value.
Invisibly, a list with chunks and estimated_rows.
is_valid()
Check if an indicator exists and is callable via the INE API.
INEClient$is_valid(indicator)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
TRUE if indicator exists, FALSE otherwise.
is_updated()
Check if an indicator has been updated since last download.
INEClient$is_updated(indicator, last_updated = NULL, metadata = NULL)
indicatorINE indicator ID as a 7-digit string. Example: "0010003".
last_updatedA Date object or a character string in "YYYY-MM-DD" format.
If provided, takes precedence over cached metadata. If NULL (default),
the function looks for cached metadata or the metadata argument.
metadataA metadata list object as returned by get_metadata().
If provided and last_updated is NULL, extracts DataUltimaAtualizacao.
TRUE if updated, FALSE if not.
list_cached()
List indicators present in the file cache.
INEClient$list_cached()
A data frame with one row per cached indicator and columns
indicator, has_metadata, has_data, chunks_downloaded,
chunks_total, and download_complete. Returns a zero-row data
frame if no cache exists.
clear_cache()
Clear cached files.
INEClient$clear_cache(indicator = NULL)
indicatorOptional INE indicator ID. If NULL (default), clears all cached files.
Invisibly returns TRUE if files were removed, FALSE otherwise.
print()
Print a summary of the client configuration.
INEClient$print(...)
...Ignored.
clone()
The objects of this class are cloneable with this method.
INEClient$clone(deep = FALSE)
deepWhether to make a deep clone.
INEClient-fields for field descriptions.
# -- Setup -- ine <- INEClient$new() ine <- INEClient$new(lang = "EN", use_cache = TRUE) print(ine) # -- Metadata -- meta <- ine$get_metadata("0010003") ine$info("0010003") dims <- ine$get_dim_info("0010003") vals <- ine$get_dim_values("0010003") # -- Data -- df <- ine$get_data("0010003") df <- ine$get_data("0010003", dim1 = "S7A2024", dim2 = c("11", "17")) ine$preview_chunks("0008273") # -- Validation -- ine$is_valid("0010003") ine$is_updated("0010003", last_updated = "2024-01-01") # -- Cache -- ine$list_cached() ine$clear_cache()# -- Setup -- ine <- INEClient$new() ine <- INEClient$new(lang = "EN", use_cache = TRUE) print(ine) # -- Metadata -- meta <- ine$get_metadata("0010003") ine$info("0010003") dims <- ine$get_dim_info("0010003") vals <- ine$get_dim_values("0010003") # -- Data -- df <- ine$get_data("0010003") df <- ine$get_data("0010003", dim1 = "S7A2024", dim2 = c("11", "17")) ine$preview_chunks("0008273") # -- Validation -- ine$is_valid("0010003") ine$is_updated("0010003", last_updated = "2024-01-01") # -- Cache -- ine$list_cached() ine$clear_cache()
Configuration fields for the INEClient class. All fields are
implemented as active bindings with validation. Set them with
ine$field <- value and read them with ine$field.
lang |
Character. Language code: |
use_cache |
Logical. Whether to cache API responses locally.
Default |
cache_dir |
Character or |
row_limit |
Integer. Maximum output rows per API request
before splitting into chunks. Must be between 1 and 1 000 000
(the API ceiling). Default |
max_retries |
Integer. Maximum retry attempts for failed
chunk downloads. Default |
progress_interval |
Integer. Print a progress message every
N chunks during downloads. Default |
timeout |
Numeric. Timeout in seconds for API requests
(metadata and data endpoints). Default |
INEClient for methods.
ine <- INEClient$new() ine$lang ine$lang <- "EN" ine$use_cache <- TRUE ine$cache_dir <- tempdir() ine$row_limit <- 500000Line <- INEClient$new() ine$lang ine$lang <- "EN" ine$use_cache <- TRUE ine$cache_dir <- tempdir() ine$row_limit <- 500000L