Frequently Asked Questions
What is Caliper?
Caliper is a platform supporting the maintenance (creation, editing), dissemination and use of statistical classifications:
Caliper is centred around the idea that statistical classifications are open standards, that are regularly passed in different systems (just like the statistical data they define). Therefore all technologies used in the backhand are open and oriented to data exchange over the web.
The main tools used in Caliper are:
- For editing/maintenance, the tool adopted is VocBench, an open source project developed by the University of Tor Vergata, Italy.
- For dissemination, Caliper serves classifications in different ways:
- as content browsable on the web. The tools used for this are:
- SKOSMOS, an open source project developed by the National Library of Finland
- Caliper-specific visualization realized in Drupal (the content management system used for the Caliper website)
- as files for download
- as web resources with URIs that can be used in computer applications
- as repositories that can be accessed online through a query interface. This allows data to be retrieved on demand (as opposed to downloading entire files with the whole classification)
- as content browsable on the web. The tools used for this are:
If you are interested in other functionalities/tools, please do not hesitate to contact us.
Despite their importance, statistical classifications have not received much attention in the effort of modernization of official statistics. Caliper tries to address that gap. We aim at making statistical classifications available in formats that are open, fully machine readable, and easily accessible by humans for consultation and reuse. Utlimately, Caliper aims at making statistical data better interoperable.
Is Caliper an official FAO website?
NO. This is experimental work carried on at FAO and supported by a grant from the Bill and Melinda Gates Foundation. The University of Tor Vergata (Rome, Italy) provides technical and scientific support to it.
What can Caliper do for me?
If you are a statistician, Caliper can help you in the following tasks:
- Look up for classifications items. For this, you can choose three different ways:
- Look up for correspondences among classifications.
- Download classifications or mappings in your format of choice
The whole list of classifications in Caliper is in section Classifications. For each classification, you fill find all services associated.
If you're a developer, you are probably interested in ways to automatically access the resources in Caliper. Currently, the following are implemented:
- Access the SPARQL endpoint and test what you can retrieve. A number of sample queries are given for you to start testing.
- Access content in LOD style.
For documentation concerning the data model, see page Documentation.
Is Caliper the author of the classifications exposed?
NO. The classifications published in Caliper are maintained and published by dedicated institutions (sometimes in collaborations with FAO but not necessarily). No classification or correspondence was developed within this project. Occasionally, we may have multilingual contents added to the original classifications for testing purposes (e.g., the Spanish translation for CRS Purpose Codes).
What is the license of Caliper?
Caliper is not an official FAO website and the data it contains should be considered as experimental. For this reason, no license is explicitly given.
What is open data, and linked data?
.According to the Open Definition, “Open data and content can be freely used, modified, and shared by anyone for any purpose”. The idea of open data has gained momentum with the raise of the Internet and, more strongly with open data initiatives promoted by governments and other large institutions (cf. the catalogs of institutaionl open data initiatives by UK and the USA). The general understanding is that open data is distributed with an open license (for example along the lines of Creative Commons), and expressed according to standardized (as opposed to proprietary) machine-readable formats. Moreover, it should be registered in appropriate catalogs so as to facilitate its discovery.
In essence, Linked Data is structured data that is interlinked with other data, so as to become more useful. Linked data aims at making data more easily consumable by machines, for example by means of semantic queries. It relies on existing web standards such as the HTTP protocol, the RDF data model, and the notion of global identifiers over the web (URIs).
Does FAO have a policy on open data / data sharing?
What languages are supported?
All. Caliper supports editing, display of and search for information in all languages, including those with non-Latin script (such as the two FAO languages Russian and Chinese) and with right-to-left orientation (such as Arabic, other FAO language). A few classifications in Caliper already include multilingual content, cf. those about geographical areas (cf. country names) and FCL (the FAOSTAT Commodity List).
May I use the classifications published on Caliper?
You're welcome to test our work. But pleases remember that the classifications presented on this web site do not replace in any ways the versions distributed by their original maintainers.
Caliper hosts statistical classifications in RDF. What modelling is adopted?
Fundamentally, the statistical classifications available in Caliper are rendered as SKOS Concept Schemes. We use SKOS to express: hierarchies, classifications entries, labels, explanatory notes, definitions, and correspondences. In summary:
- items are skos:Concept, endowed with labels in different languages, definitions, notation, documentation notes (change, editorial, history...)
- the hierarchical structure of a classification is expressed by means of the standard SKOS properties skos:narrower, skos:broader.
- mappings between items in different classifications are expressed using the SKOS standard properties for semantic relations (skos:closeMatch and skos:exactMatch)
- subsets of a classification are defined by using SKOS collections.
XKOS is used in addition to SKOS to express correspondences between classifications, and specific sets of items in them.
All resources you can find in Caliper are endowed with metadata, which may refer to:
- a classification as a standard (say, its intellectual content), or to
- the specific format in which it is encoded (say, CSV or RDF)
To see the difference, compare: "UNSD is the author of CPC2.1" with "Caliper is the author of the RDF version of CPC2.1".
In order to be able to express these two types of statements, we use elements from DCAT - dcat:Dataset (for the CPC authored by UNSD), and dcat:Distribution (for the CPC rendered as RDF by Caliper). See the graphics below:
Other vocabularies used: Dublin Core and the Ontology Web Language (OWL) to express certain pieces of metadata (ie. title, creator, publisher, description, date of creation, date of last update, version, history notes...).
Does Caliper use XKOS, the extension of SKOS for statistical classifications?
XKOS, the Extended Knowledge Organization System, is an extension of SKOS specifically designed to add to SKOS the constructs specific to statistical classifications. XKOS is promoted by DDI, the Data Documentation Initiative. needed to express
Caliper uses XKOS to:
- model correspondences between classifications and classifications' items (xkos:Correspondence and xkos:ConceptAssociation, respectively);
- to distinguish hierarchical levels in a classification (xkos:ClassificationLevel)
Does Caliper offer persistent identifiers?
Persistent identifiers will be needed once Caliper is an official service of FAO. At the moment, since it is an experimental project, we do not guarantee for the persistence of URIs in Caliper.
How does Caliper treat correspondences?
Correspondences are expressed in two ways:
- using properties from SKOS. These are: skos:exactMatch for 1-1 correspondences, skos:closeMatch for partial correspondences.
- using properties and entities defined in XKOS.
For more detail, see the documentation page for the RDF Modelling.
Wasn't SDMX enough?
SDMX is an XML-based standard for disseminating statistical data, i.e., (numeric) data together with its dimensions. The equivalent of SDMX in RDF is the RDF Data Cure (W3C Recommendatiton, 2014), which implements the SDMX cube model as Linked Data.
XKOS is the RDF vocabulary specific for statistical classifications, endorsed by DDI. No XML-based equivalent of SDMX for statistical classification exists.
I want to know more - is there any documentation, papers or presentations available?
Documentation specific on Caliper:
Documentation on tools used in Caliper:
- on VocBench, the editing software we use:
- Installation: http://vocbench.uniroma2.it/documentation/
- VocBench User Management. http://vocbench.uniroma2.it/doc/user/users.jsf
- Shee2RDF. http://art.uniroma2.it/sheet2rdf/documentation/
- SKOS implementation: http://vocbench.uniroma2.it/doc/skos.jsf
- Loddy, for content negotiation: https://bitbucket.org/art-uniroma2/loddy/downloads/Loddy_DEMO_2016-03-01.zip
- GraphDB, the triple store behind VocBench: http://graphdb.ontotext.com/documentation/standard/
- On Fuseki, the SPARQL Endpoint available as back-end: https://jena.apache.org/documentation/fuseki2/
- on SKOSMOS, the browsing tool: https://github.com/NatLibFi/Skosmos/wiki
- on Drupal, the content management system powering this website and used to display correspondences between classifications. https://www.drupal.org/
Terminology - SKOS, SPARQL
SKOS stands for "Simple Knowledge Organization System (SKOS). It is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data." (From Wikipedia).
SKOS defines classes and properties for representing:
- a "concept scheme" (a set of terms: a classification, a code list, a list of subject headings...)
- its terms / concepts (labels in different languages, definition, notation, editorial notes...),
- the relationships between concepts (generic or hierarchical)
- subsets of concepts (collections)
It is therefore suitable for representing classifications in a semantic, machine readable way.
SPARQL. Is the query language for RDF.
Terminology - Vocabulary
In everyday language, a vocabulary is a set of words, possibly used by a group, individual, or work, or in a field of knowledge (See the definitions given by Merriam-Webster dictionary). Vocabularies are then fundamental to shape the universe of discourse of people, and have a special role in the field of information management, especially in the form of controlled vocabularies, i.e., selected list of words used as "tag" or "classifier" of information unit - numeric or textual data. Because of their role in defining the entities to measure and codifying data, statistical classifications can be considered as special types of vocabularies.
Also in the area of information management and in the semantic web, vocabularies play a very important role. The World Wide Web Consortium (W3C) Vocabularies are defined in this broad sense by the W3C: "On the Semantic Web, vocabularies define the concepts and relationships (also referred to as “terms”) used to describe and represent an area of concern. Vocabularies are used to classify the terms that can be used in a particular application, characterize possible relationships, and define possible constraints on using those terms. In practice, vocabularies can be very complex (with several thousands of terms) or very simple (describing one or two concepts only)."
Moreover, the W3C usefully distinguishes two types of vocabularies:
- value vocabularies or sets of controlled values used to categorize and classify things. These are also known as Knowledge Organization Systems (KOS) and include classifications, code lists, thesauri, even certain types of ISO standards that prescribe controlled lists of values;
- metadata element sets that prescribe what features or properties should be used to describe things. They are also called schemas, or description vocabularies. XML schemas and RDF schema, formal languages to describe entities in XML and RDF respectively. Other example include ontologies, application profiles, and UML models.
The statistical classifications that are the focus of Caliper fall under the first type. SKOS, the formal language we used to express statistical classifications in a machine-readable format, is an example of the second type. Specifically, SKOS is a vocabulary for RDF, tailored to express thesauri on the web.
BROWSING - Skosmos. What is the meaning of tab "Groups"?
The tab "Groups" displays concepts belonging to the SKOS structure "skos:collection". We are using this feature of SKOS to experiment with the possibility of marking classifications' fragments / subsets specific to some needs. See for example the group of concepts relevant to the FAO Fisheries and Aquaculture Dept, in CPC v2.1:
NOTE: Currently, concepts in a skos:collection are visualized in SKOSMOS as a flat list, although the hierarchical information remain available (see main panel, to the right).
BROWSING - Why do you have three browsing tools?
Because each tool has different features and advantages. Briefly:
Skosmos: offers a neat hierarchy-like visualization, and search by code and labels in different languages. It is oriented to display SKOS thesauri, does not support OWL or SKOS-XL. Correspondences are shown together with the classification entry they refer to.
PMKI: it is the read-only version of VocBench, then it allows users to see everything present in the editor environment, including OWL ontologies and different SKOS concept schemes.
Drupal: being the content management system used to build the website, it ensure uniformity of the look-and-feel of the entire Caliper space. It is highly customizable. Correspondences may be visualized and search independently of the classifications they refer to.
I have downloaded a csv from Caliper, but codes are missing leading zeroes!
You probably opened the csv from Excel. If you opened it with a text editor, you would see all leading zeroes correctly in place. To be able to see them in Excel too, follow the instructions given by the Office Support website.
In what format should source data be, in order to be included in Caliper?
Caliper works well with all formats commonly used to store or pass classifications around, such as CSV, XLS, DB dump, or JSON. However, you should also consider *how* the data is organized internally (for example, how the classification hierarchy is rendered, or in how many files the classification is split) because while all formats will ultimately be converted, some may require more effort than others (as a general rule, the more ad-hoc your structure is, the more effort will be needed for conversion).
Geographical grouping also depends on years (ex. EU members). What do you do about this?
True. International organizations based on membership do have a temporal dimension, as over time new members join and others leave. This is something we plan on including in Caliper. We expect to use the same mechanism already adopted for managing groups and alternative membership, such as in the case of the alternative membership of SDG country grouping and UNSD M49.
We also have another line of work related to the time dimension of geographical information, for "former countries" (countries no longer existing as political entity because of some changes, in their name, territorial extension etc.). We would like to include those countries in their temporal and geographical context, for example to be able to extract the composition of geographical groups at a given time.
HS (Harmonized System) is not complete, why?
We have included some parts of HS in Caliper, namely only the items linked by some other classifications in Caliper. The items included belong to HS versions 2007, 2012, 2017, reusing existing correspondences available in FAO. The three fragments of HS can be browsed in Skosmos - shown as one "Hierarchy" (because it is internally managed a single skos:ConceptScheme) and three "Groups" (or subset, rendered as skos:Collection).
How are classifications maintained (edited) in Caliper?
The editing tool used in Caliper is VocBench (version 3). It is a powerful tool, able to support the editing of both classifications (as RDF vocabularies), and the OWL model (ontologies) the use, when this is the case. VocBench also fully supports editorial workflows, so that some users will only be able to add translations, for example, while other are allowed to approve changes and perform more complicated operations. Therefore, the level of knowledge of RDF required for the maintenance of classifications in VocBench very much depends on your role in the project. We have developed guidelines for editors, currently under testing within the FAO Statistic Division.
Do you implement Linked Data Platform Collections (LDPC) W3C recommendation (read/write) for Registers API ?
Is VocBench already fully integrated in term of content maintenance and publication to Caliper?
Yes. The first step for the inclusion of classifications into Caliper is its conversion into RDF for further edit/visualization/use in VocBench. The conversion can be done either programmatically (in Caliper, we mostly use Python for this) or directly through a facility included in VocBench that can be operated through the GUI (cf. menu Tool -> Seet2RDF). Quality checks can also be done from inside VocBench (menu Tool -> Integrity Constraint Validator). All projects are then finalized in VocBench (for example, to specify some metadata information) and from there passed on to all other services included in Caliper.
A user guide specifically addressing editors of statistical classifications has been developed within FAO (ESS). This is a work in progress and newer versions will be available in the future.
How much does Caliper cost?
All tools used in Caliper are free as in free beer and as in free speech (= formats and code open in the public domain, no software fees). The costs of Caliper reside in hosting, system administration, expertise involved in the conversion of classifications and their maintenance. All the tools adopted have a large community of users to ensure reliability.
What are the skills required to maintain Caliper?
Since Caliper provides different functionalities, different skills are involved, but an understanding of the RDF technology stack is essential to all. In some more detail:
- For the first conversion of classifications/mapping into RDF Linked data Vocabulary, a deep knowledge of those technologies and modelling practices is a must, together with deep understanding of features and requirements of statistical classifications. This is also needed to plan for improvement of available features or for new extensions. The actual conversion from any file into RDF for Caliper is achieved programmatically, either via Python scripts or using the Sheet2RDF tool available inside VocBench.
- Once converted into RDF datasets, most maintenance will be done by content experts, those who normally maintain classifications, or perhaps translators if translations are needed.
- Once converted into RDF datasets, the conversion into alternative formats (other than RDF serializations) for download is done programmatically.
- The site is maintained in Drupal, a content management system that implements many functionalities to interact with RDF data. For example, the lookup on mappings is implemented directly in Drupal.
- The maintenance of the IT infrastructure requires good sys admin skills together and solid understanding of RDF technology stack.