These guidelines are for anyone wishing to supply a dataset for an SDG indicator.
- The first row in the csv must be the column names. There can't be any extra rows with information, notes etc.
- The first column must be
Year, case sensitive with no extra white space.
- The last column must be
Value, case sensitive with no extra white space.
- If geo codes are included, the second from last column must be 'GeoCode', case sensitive with no extra white space
- Any number of categorical columns may go in between e.g. 'Sex', 'Age', 'Geography'. Ensure you use consistent category naming across the indicators.
Yearshould be integer, unquoted, e.g.
- You do not need to have each year present, and order does not matter.
Valueshould always be numeric (no notes, or < symbols etc), or blank (signifies missing).
Unitssee section below. This is a special field.
- If a cell is blank (missing) this means aggregated over. So in the example below the first row is disaggregated by Grade=A, then Grade=B, and the last one is missing, which means “Both”. The blanks in the Fruit field mean “All fruits”. So blank, in general, means “all”. There is no need for any text saying 'Total'.
- There must not be trailing white-space in any categorical field. So "A Word" is OK, but "A " is not. See further info below.
If we have it broken down by fruit OR grade then it would look like the plot below. The last line, with all grouping columns blank, is the national level indicator.
Then if we introduce two levels it would go to something like (values just for illustration):
The idea is that the plot would start out with the all blank row (at the bottom) and then as you add disaggregations to the plot it would filter to show the right data.
A full size demo data set can be downloaded here
Some indicators are reported using different units of measure. A special column name,
Units is used to capture this (see the UK's 9-2-1 as an example). The
Units column is interpreted as a special top-level disaggregation. You may only choose one at a time and it affects the chart axis labels. A Units column is not required if an indicator only includes one unit.
Indicators without headlines¶
A "headline" is a term for the all-blank row mentioned above, which contains no disaggregations. By default, this "headline" is what will display as soon as an indicator page loads.
However, Open SDG can also handle indicator data without a headline. This might happen, for example, if you have statistics for "Apples" and "Oranges", but no aggregated total for all fruits:
Open SDG will choose the smallest set of disaggregations to start with. So in the example above, Open SDG will start with "Oranges" selected.
In addition, by setting a special metadata field called
data_start_values, you can also control exactly what disaggregations are selected. For example by setting this in the metadata for an indicator...
data_start_values: - field: Fruit value: Apples - field: Grade value: A
...Open SDG will start with both "Apples" and "A" selected, instead of "Oranges".
You can double check for white space within an Excel file using a COUNTIF formula or by running a macro. We will add more information on this soon.
All column headers and disaggregation values can be "translation keys" to enable translation. As an optional shorthand, if the translation key is in the
data group, then the group can be omitted. For example, the translation key
data.female can be written as simply