Typically an Open SDG implementation is split into a site repository and a data repository. For this reason the Open SDG configuration is split into site configuration and data configuration. This document details the available settings for data configuration.
The data repository will automatically generate a website which summarizes the available endpoints in your build. The
docs_* settings affect how that website is built.
docs_branding setting controls the title which displays at the top of these website pages. The default if omitted is shown below:
docs_branding: Build docs
Optional: This adds an introductory paragraph on the homepage of the automatically-generated website. If omitted, no introductory paragraph will appear. Here is an example:
docs_intro: This is a list of examples of endpoints and output that are available on this service. Click each of the links below for more information on the available output.
Optional: This can be used to convert any indicator IDs in the automatically-generated website into actual links to your implementation's indicator pages. If omitted, the indicator IDs will not be hyperlinked. Here is an example:
Optional: This creates additional "download" buttons on each indicator page of your Open SDG implementation. Use this if there are additional per-indicator files (such as SDMX files) that you would like to make available for download.
This should be a list of objects, each having certain parameters. The available parameters are:
button_label: The label of the button to display. This can be a translation key.
source_pattern: A wildcard pattern used to identify the files you would like to make available for download.
output_folder: A folder in which to create for placing the files, where they will be available for download.
indicator_id_pattern: A regular expression to convert filenames into indicator IDs. The default is
indicator_(.*), which would convert "indicator_1-1-1" into "1-1-1". For more help with regular expressions, look for online tools such as Regex 101.
The following example would ensure that all files matching
data/indicator_*.csv will be available for download in the build at
indicator_downloads: - button_label: csv source_pattern: tests/data/indicator_*.csv output_folder: data-csv indicator_id_pattern: indicator_(.*)
Optional: This controls how your indicators are loaded. The available parameters are:
- non_disaggregation_columns: This specifies a list of columns that should not be considered disaggregations.
Here are the defaults that are assumed if this is omitted:
indicator_options: non_disaggregation_columns: - Year - Units - Series - Value - GeoCode - Observation status - Unit multiplier - Unit measure
Optional: This setting identifies the source (or sources) of your data and metadata. This can be omitted if you are using the standard Open SDG approach of CSV data and YAML metadata. But if you would like to use non-standard inputs (such as SDMX) then you can use this as needed.
Each item must have a "class" which corresponds to classes in the /sdg/inputs folder of the sdg-build library. Further, each item can have any/all of the parameters that class uses. Below are full descriptions of all the possible inputs and their corresponding parameters:
InputCkan: Input data from a CKAN service. The available parameters are:
endpoint: The remote URL of the endpoint for fetching indicators.
indicator_id_map: Map of API ids (such as "resource ids") to indicator ids.
InputCsvData: Input data from a folder of CSV files. The available parameters are:
path_pattern: A wildcard pattern used for identifying the source files.
InputCsvMeta: Input metadata from a folder of CSV files. The available parameters are:
path_pattern: Same as described above in other inputs.
metadata_mapping: A map of human-readable labels to machine keys or a path to a CSV file containing that mapping. This allows the CSV metadata files to use human-readable labels instead of machine keys, which makes management easier.
git: Whether to use Git (version control) information to populate "last updated" dates in the metadata. This is a convenience feature to save you from the manual work of keeping the "last updated" dates accurate.
git_data_dir: Only used if you are using the "git" option described above. Location of folder containing the data files.
git_data_filemask: Only used if you are using the "git" option described above. A pattern for data filenames, where "*" is the indicator id. Any indicator can override this setting by having a metadata field called "data_filename" with the name of the data file for that indicator.
For more technical information see the InputCsvMeta class definition.
InputExcelMeta: Input metadata from a folder of Excel files. The available parameters are the same as in InputCsvMeta.
For more technical information see the InputExcelMeta class definition.
InputSdmxJson: Input data from an SDMX-JSON file or endpoint. The available parameters are:
source: Remote URL of the SDMX source, or path to local SDMX file.
drop_dimensions: List of SDMX dimensions/attributes to ignore
drop_singleton_dimensions: Whether to drop dimensions/attributes with only 1 variation
dimension_map: Map of SDMX ids to human-readable names. For dimension names, the key is simply the dimension id. For dimension value names, the key is the dimension id and value id, separated by a pipe (|). This also includes attributes.
indicator_id_map: A map of SDMX series codes to indicator ids. Normally this is not needed, but sometimes the DSD may contain typos or mistakes, or the DSD may not contain any reference to the indicator ID numbers. This need not contain all indicator ids, only those that need it. If a particular series should be mapped to multiple indicators, then they can be a list of strings. Otherwise each indicator is a string.
import_names: Whether to import names. Set to false to rely on global names.
import_translation_keys: Whether to import translation keys instead of text values. Set to true to import translation keys, which will be in the format of
concept.[id]. If left false, text values are imported instead, taken from the first language in the DSD.
dsd: Remote URL of the SDMX DSD (data structure definition) or path to local file.
indicator_id_xpath: An xpath query to find the indicator id within each Series code.
indicator_name_xpath: An xpath query to find the indicator name within each Series code.
InputSdmxMl_Multiple: Input data from multiple SDMX-ML files (which can be a mix of either "Structure" or "Structure Specific"). The available parameters are the same as in InputSdmxJson, along with these additional parameters:
path_pattern: Same as described above in other inputs.
For more technical information see the InputSdmxMl_Multiple class definition.
InputSdmxMl_Structure: Input data from an SDMX-ML Structure file. The available parameters are the same as in InputSdmxJson.
For more technical information see the InputSdmxMl_Structure class definition, and an example of using InputSdmxMl_Structure in Python code.
InputSdmxMl_StructureSpecific: Input data from an SDMX-ML Structure Specific (also known as "Compact") file. The available parameters are the same as in InputSdmxJson.
For more technical information see the InputSdmxMl_StructureSpecific class definition.
InputYamlMdMeta: Input metadata from a folder of YAML/Markdown files. The available parameters are the same as in InputCsvMeta.
Note that YAML/Markdown files should have a
--- at the bottom. Any Markdown text below that line will be used as the
page_content metadata field.
Defaults: As mentioned above, this
inputs setting is optional. The defaults below show what is assumed if
inputs is omitted entirely.
inputs: - class: InputCsvData path_pattern: data/*-*.csv - class: InputYamlMdMeta path_pattern: meta/*-*.md git: true git_data_dir: data
Optional: This setting corresponds exactly to the language setting in the site configuration. However it is optional here. If you use this setting, your data will be translated and placed in language subfolders. For more information on how this translation works, see documentation on translating metadata and translating data.
languages: - es - en
Optional: This allows the build to generate one or more GeoJSON files to be used by Open SDG maps. This should be a list of layers, each one containing certain parameters. The parameters available correspond to the sdg-build library's OutputGeoJson class and are described below:
geojson_file: A path to a GeoJSON file (remote or local) which contains all of the "geometries" for the regions to include. Each region should have an id and a name, in properties (see name_property and id_property).
name_property: The property in the geometry file which contains the region's name.
id_property: The property in the geometry file which contains the region's id.
id_column: The name of a column in the indicator data which corresponds to the id that is in the "id_property" of the geometry file. This serves to "join" the indicator data with the geometry file.
output_subfolder: A folder beneath 'geojson' to put the files. The full path will be:
filename_prefixA prefix added before the indicator id to construct a filename for each geojson file.
exclude_columns: A list of strings, each a column name in the indicator data that should not be included in the disaggregation. This is typically for any columns that mirror the region referenced by the id column.
id_replacements: An optional for with replacements to apply to the values in the id_column. This is typically used if another column exists which "mirrors" what would be in an id column, to avoid duplicate work. For example, maybe a "Region" column exists with the names of the regions as values. This can be used to "map" those region names to geocodes, and save you the work of maintaining a separate id column.
Below is an example of a possible configuration which includes one layer:
map_layers: - geojson_file: https://geoportal1-ons.opendata.arcgis.com/datasets/4fcca2a47fed4bfaa1793015a18537ac_4.geojson name_property: rgn17nm id_property: rgn17cd output_subfolder: regions filename_prefix: indicator_
Optional: This allows the build to generate stats for reporting status by additional fields, beyond the default "status by goal" report. This is optional, but the example below shows how to generate reporting status by the
reporting_status_extra_fields: - un_custodian_agency
Optional: This identifies a file containing the schema (possible fields) for metadata. Currently this needs to be a prose.io config, and defaults to '_prose.yml'.
Optional: This identifies a directory to hold the "built" files. The default is '_site'.
Optional: This setting controls the directory in which scripts should find source files. In most cases this can be left at the default ('') which points to the root of the data repository. However this is available in case you need to place your source files in a subfolder.
Optional: This setting identifies the source (or sources) of your translations. This can be omitted if your languages are already included in sdg-translations and you do not need any custom translations. But if you are using other languages or need custom translations, then you can use this as needed.
Each item must have a "class" which corresponds to classes in the /sdg/translations folder of the sdg-build library. Further, each item can have any/all of the parameters that class uses. Below are full descriptions of all the possible translations and their corresponding parameters:
TranslationInputCsv: Input translations from a folder of local CSV files. The available parameters are:
source: The folder containing the translation files. Defaults to "translations".
For more technical information see the TranslationInputCsv class definition.
TranslationInputSdgTranslations: Input translations from a Git repository structured like the sdg-translations project. The available parameters are:
tag: Specifies a particular tag (or branch or commit) to use in the Git repository.
branch: Specifies a particular branch (or tag or commit) to use in the Git repository. Alias for "tag".
source: Specifies the endpoint for the Git repository. Defaults to the sdg-translations project: 'https://github.com/open-sdg/sdg-translations.git'
For more technical information see the TranslationInputSdgTranslations class definition and an example of TranslationInputSdgTranslations configuration.
TranslationInputSdmx: Input translations from an SDMX DSD file. The available parameters are:
source: The location of the SDMX DSD file (either local or remote).
TranslationInputYaml: Input translations from a folder of local YAML files. The available parameters are the same as in TranslationInputCsv above.
Defaults: As mentioned above, this
translations setting is optional. The defaults below show what is assumed if
translations is omitted entirely.
translations: - class: TranslationInputSdgTranslations source: https://github.com/open-sdg/sdg-translations.git branch: master - class: TranslationInputYaml source: translations