| Title: | Read Linguistic Data in the Cross Linguistic Data Format (CLDF) |
|---|---|
| Description: | Cross-Linguistic Data Format (CLDF) is a framework for storing cross-linguistic data, ensuring compatibility and ease of data exchange between different linguistic datasets see Forkel et al. (2018) <doi:10.1038/sdata.2018.205>. The 'rcldf' package is designed to facilitate the manipulation and analysis of these datasets by simplifying the loading, querying, and visualisation of CLDF datasets making it easier to conduct comparative linguistic analyses, manage language data, and apply statistical methods directly within R. |
| Authors: | Simon J. Greenhill [aut, cre] |
| Maintainer: | Simon J. Greenhill <[email protected]> |
| License: | Apache License (>= 2.0) |
| Version: | 1.6.2 |
| Built: | 2026-05-20 23:19:12 UTC |
| Source: | https://github.com/simongreenhill/rcldf |
Adds a dataframe.
add_dataframe(table, filename, group)add_dataframe(table, filename, group)
table |
a metadata section from the CLDF metadata. |
filename |
the filename. |
group |
a grouping from the metadata. |
A dataframe
Extracts a CLDF table as a 'wide' dataframe by resolving all foreign key links
as.cldf.wide(object, table)as.cldf.wide(object, table)
object |
the |
table |
the name of the table to extract. |
A tibble dataframe
md <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf") cldfobj <- cldf(md) forms <- as.cldf.wide(cldfobj, 'FormTable')md <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf") cldfobj <- cldf(md) forms <- as.cldf.wide(cldfobj, 'FormTable')
Reads a Cross-Linguistic Data Format dataset into an object.
included here to match people expecting e.g. readr::read_csv etc
cldf( mdpath, load_bib = FALSE, cache_dir = tools::R_user_dir("rcldf", which = "cache") ) read_cldf( mdpath, load_bib = FALSE, cache_dir = tools::R_user_dir("rcldf", which = "cache") )cldf( mdpath, load_bib = FALSE, cache_dir = tools::R_user_dir("rcldf", which = "cache") ) read_cldf( mdpath, load_bib = FALSE, cache_dir = tools::R_user_dir("rcldf", which = "cache") )
mdpath |
the path to the directory or metadata JSON file. |
load_bib |
a boolean flag (TRUE/FALSE, default FALSE) to load the
sources.bib BibTeX file. |
cache_dir |
a directory to cache downloaded files to |
A cldf object
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
Determine whether the input is true, with missing values being interpreted as false.
coalesce_truth(x)coalesce_truth(x)
x |
logical, |
FALSE if x is anything but TRUE
Returns a table of datasets available in cldf_meta
datasets()datasets()
A dataframe of available dataset.
Translate csvw datatypes to R types. This implementation currently targets readr::cols column specifications.
datatype_to_type(datatypes)datatype_to_type(datatypes)
datatypes |
a list of csvw datatypes |
rcldf adds some overrides here to add e.g. anyURI etc.
a readr::cols specification - a list of collectors
cspec <- datatype_to_type(list("double", list(base="date", format="yyyy-MM-dd"))) readr::read_csv(readr::readr_example("challenge.csv"), col_types=cspec)cspec <- datatype_to_type(list("double", list(base="date", format="yyyy-MM-dd"))) readr::read_csv(readr::readr_example("challenge.csv"), col_types=cspec)
The CSVW Default Dialect specification described in CSV Dialect Description Format.
default_dialectdefault_dialect
An object of class list of length 13.
a list specifying a default csv dialect
If neither the table nor the group have a tableSchema annotation,
then this default schema will used.
default_schema(filename, dialect = default_dialect)default_schema(filename, dialect = default_dialect)
filename |
a csv file |
dialect |
specification of the csv's dialect (default: |
a table schema
Returns the cache dir.
get_cache_dir(cache_dir = NA)get_cache_dir(cache_dir = NA)
cache_dir |
a directory to use |
A string of the cache dir
path.Returns a dataframe of with details on the CLDF dataset in path.
get_details(path, cache_dir = NA)get_details(path, cache_dir = NA)
path |
the path to resolve |
cache_dir |
a directory to cache downloaded files to |
A dataframe.
Returns the filesize in bytes of a directory.
get_dir_size(path)get_dir_size(path)
path |
a directory to size |
A numeric of the file size in bytes
Get a filename from url value in metadata (handles .zip files)
get_filename(base_dir, url)get_filename(base_dir, url)
base_dir |
the base_dir |
url |
the url statement |
A string
Returns a table of the foreign keys in a CLDF dataset.
get_foreign_keys(cldf_obj)get_foreign_keys(cldf_obj)
cldf_obj |
a CLDF object |
a dataframe
o <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) get_foreign_keys(o)o <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) get_foreign_keys(o)
Downloads and installs a CLDF dataset from a Zenodo endpoint
get_from_zenodo(zid, load_bib = FALSE, cache_dir = NULL)get_from_zenodo(zid, load_bib = FALSE, cache_dir = NULL)
zid |
Zenodo endpoint |
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
A cldf object
Identifies the separator characters specified by the CLDF metadata.
get_separators(metadata)get_separators(metadata)
metadata |
|
A dataframe with three columns (name, separator, url).
Extracts a single table from a CLDF dataset.
get_table_from( table, mdpath, cache_dir = tools::R_user_dir("rcldf", which = "cache") )get_table_from( table, mdpath, cache_dir = tools::R_user_dir("rcldf", which = "cache") )
table |
a CLDF table type |
mdpath |
a path to a CLDF file |
cache_dir |
a directory to cache downloaded files to |
a dataframe
md_json <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf") df <- get_table_from("LanguageTable", md_json)md_json <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf") df <- get_table_from("LanguageTable", md_json)
Convert a CLDF URL tablename to a short tablename
get_tablename(conformsto, url = NA)get_tablename(conformsto, url = NA)
conformsto |
the dc:conforms to statement |
url |
the url statement |
A string
get_tablename("http://cldf.clld.org/v1.0/terms.rdf#ValueTable")get_tablename("http://cldf.clld.org/v1.0/terms.rdf#ValueTable")
url looks like a github URLReturns TRUE if url looks like a github URL
is_github(url)is_github(url)
url |
A string |
A boolean TRUE/FALSE
is_github('https://github.com/SimonGreenhill/rcldf/')is_github('https://github.com/SimonGreenhill/rcldf/')
url looks like a URLReturns TRUE if url looks like a URL
is_url(url)is_url(url)
url |
A string |
A boolean TRUE/FALSE
is_url('http://simon.net.nz')is_url('http://simon.net.nz')
Returns a dataframe of directories in the cache dir
list_cache_files(cache_dir = NULL)list_cache_files(cache_dir = NULL)
cache_dir |
the cache directory to use. If NULL then R_user_dir will be used. |
A dataframe of the directories
Returns a CLDF dataset object of the latest CLTS version.
load_clts(load_bib = FALSE, cache_dir = NULL)load_clts(load_bib = FALSE, cache_dir = NULL)
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
A cldf object
Returns a CLDF dataset object of the latest Concepticon version.
load_concepticon(load_bib = FALSE, cache_dir = NULL)load_concepticon(load_bib = FALSE, cache_dir = NULL)
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
A cldf object
Looks up a dataset from the registry returned by datasets,
resolves the requested version, and downloads it from either Zenodo or
GitHub.
load_dataset(dataset, version = NULL, source = "Zenodo")load_dataset(dataset, version = NULL, source = "Zenodo")
dataset |
a character string naming the dataset (must match the
|
version |
a character string specifying the version to load (e.g.
|
source |
a character string, either |
A cldf object.
datasets.
## Not run: # load the latest version of a dataset ds <- load_dataset("vanuatuvoices") # load a specific version ds <- load_dataset("vanuatuvoices", version = "v1.3") # load from GitHub instead ds <- load_dataset("vanuatuvoices", source = "GitHub") ## End(Not run)## Not run: # load the latest version of a dataset ds <- load_dataset("vanuatuvoices") # load a specific version ds <- load_dataset("vanuatuvoices", version = "v1.3") # load from GitHub instead ds <- load_dataset("vanuatuvoices", source = "GitHub") ## End(Not run)
Returns a CLDF dataset object of the latest D-PLACE version.
load_dplace(load_bib = FALSE, cache_dir = NULL)load_dplace(load_bib = FALSE, cache_dir = NULL)
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
A cldf object
Returns a CLDF dataset object of the latest glottolog version.
load_glottolog(load_bib = FALSE, cache_dir = NULL)load_glottolog(load_bib = FALSE, cache_dir = NULL)
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
A cldf object
Returns the cachekey for the given path.
make_cache_key(path)make_cache_key(path)
path |
a path to generate the cachekey for. |
A string.
null to R's NA.Note that this is run by default on loading a dataset with cldf()
nullify(cldfobj, nulls = NULL)nullify(cldfobj, nulls = NULL)
cldfobj |
a CLDF Object |
nulls |
a dataframe of null values to replace (default=NULL). |
A cldf object
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) cldfobj <- nullify(cldfobj)cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) cldfobj <- nullify(cldfobj)
Creates a leaflet map showing all languages in the CLDF dataset that have geographic coordinates. Longitudes are standardized to a 0-360 range to ensure a continuous Pacific-centered view.
plot_languages(x, color_by = "ID")plot_languages(x, color_by = "ID")
x |
A cldf object. |
color_by |
Character string specifying the column in |
A leaflet map object.
Filters the dataset for a specific Parameter ID and maps the values across languages. This function automatically resolves whether the data is in a Form or Value table and joins it with geographic data.
plot_parameter(x, parameter = "1sg_a", color_by = "Value")plot_parameter(x, parameter = "1sg_a", color_by = "Value")
x |
A cldf object. |
parameter |
Character string. The ID of the parameter to plot (e.g., "1sg_a"). |
color_by |
Character string. The column to use for the color scale (e.g., "Value"). |
A leaflet map object.
Similar to plot_parameter, but instead of circles, this function renders the
actual phonetic forms (Value) as text labels directly on the map. Labels are
color-coded based on the color_by column (e.g., Cognacy).
plot_word(x, parameter = "1sg_a", color_by = "Cognacy")plot_word(x, parameter = "1sg_a", color_by = "Cognacy")
x |
A cldf object. |
parameter |
Character string. The ID of the parameter (word) to plot. |
color_by |
Character string. Column used to categorize and color the text labels. |
A leaflet map object.
Summarises the CLDF file
## S3 method for class 'cldf' print(x, ...)## S3 method for class 'cldf' print(x, ...)
x |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
No return value, called for side effects.
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) print(cldfobj)cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) print(cldfobj)
Prints a CLDF schema
## S3 method for class 'cldf_schema' print(x, ...)## S3 method for class 'cldf_schema' print(x, ...)
x |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
No return value, called for side effects.
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) print(schema(cldfobj))cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) print(schema(cldfobj))
Reads and parses the BibTeX sources file from a CLDF dataset, making
bibliographic references available in bibtex format. By default, sources
are not loaded automatically when using cldf() as BibTeX parsing can be
time-consuming. Use this function to load them, or pass load_bib=TRUE to
cldf() when loading the dataset.
read_bib(object)read_bib(object)
object |
A cldf object containing the dataset |
The cldf object, modified to include a sources list with
parsed BibTeX data
# Load a dataset with sources ds <- cldf(system.file("extdata/huon", "cldf-metadata.json", package="rcldf"), load_bib=TRUE) # Or load without sources first, then add them ds_no_bib <- cldf(system.file("extdata/huon", "cldf-metadata.json", package="rcldf")) ds <- read_bib(ds_no_bib) # View the sources ds$sources# Load a dataset with sources ds <- cldf(system.file("extdata/huon", "cldf-metadata.json", package="rcldf"), load_bib=TRUE) # Or load without sources first, then add them ds_no_bib <- cldf(system.file("extdata/huon", "cldf-metadata.json", package="rcldf")) ds <- read_bib(ds_no_bib) # View the sources ds$sources
Relabels a column in a dataset for merging.
relabel(column, table)relabel(column, table)
column |
the tablename. |
table |
the tablename. |
A string of "column.table"
Helper function to resolve the path (e.g. directory or md.json file)
resolve_path(path, cache_dir = NA)resolve_path(path, cache_dir = NA)
path |
the path to resolve |
cache_dir |
a directory to cache downloaded files to |
A list of two items:
path - string containing the path to the metadata.json file
metadata - a csvwr metadata object
Extracts the CLDF dataset schema showing tables, columns, data types, and foreign key relationships.
schema(cldf_obj)schema(cldf_obj)
cldf_obj |
A CLDF object created with |
A schema object:
## Not run: # Load a dataset df <- cldf("path/to/dataset") schema(df) ## End(Not run)## Not run: # Load a dataset df <- cldf("path/to/dataset") schema(df) ## End(Not run)
Note that this is run by default on loading a dataset with cldf()
separate(cldfobj, separators = NULL)separate(cldfobj, separators = NULL)
cldfobj |
a CLDF Object |
separators |
a dataframe of separator values to replace (default=NULL). |
A cldf object
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) cldfobj <- separate(cldfobj)cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) cldfobj <- separate(cldfobj)
Sets the cache dir for the current session.
set_cache_dir(cache_dir = NA)set_cache_dir(cache_dir = NA)
cache_dir |
a directory to use |
NULL. Sets an environment value.
Subset a CLDF object with Cascading Filters
subset_cldf(x, expr)subset_cldf(x, expr)
x |
A cldf object. |
expr |
A logical expression (e.g., Language_ID == 'kate') |
Summarises the CLDF file
## S3 method for class 'cldf' summary(object, ...)## S3 method for class 'cldf' summary(object, ...)
object |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
None
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) summary(cldfobj)cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")) summary(cldfobj)
tbl based on expression e.Helper function to filter a table based on a logical expression.
update_table(e, tbl)update_table(e, tbl)
e |
the expression. |
tbl |
the table. |
A filtered tables.