Alter George, Gregory Arofan, Mceachern Steven, Bell Darren S., Burke Derek, Chen Robert, Cardacino Alessio, Chaya Nada, Barraclough David, Brownlee Rowan, Emery Tom, Gerland Patrick, Giudici Cristina, Gozalov Abdulla, Greising Edgardo, Ionescu Sanda, Jääskeläinen Taina, Kanjala Chifundo, Kantorova Vladimira, Larmarange Joseph, Lattes Pablo, Lyle Jared, Magnuson Diana, Meinhart Melissa, Mishra Santosh Kumar, Silva Romesh, Spoorenberg Thomas, Ueffing Philipp et Winkler Jay (2023) FAIR Vocabularies in Population Research: report of the IUSSP-CODATA Working Group on FAIR Vocabularies, Report, IUSSP ; CODATA. https://hal.science/hal-04096418.
Résumé : This report describes the role of controlled vocabularies in the documentation and dissemination of demographic data in the light of the FAIR principles that all data should be “Findable, Accessible, Interoperable, and Reusable” by both humans and machines (Wilkinson et al., 2016). Population research is an empirically focused field with a long tradition of widely shared, easily accessible, data collections. The FAIR Principles point to ways that this tradition can be enhanced by taking advantage of emerging standards and technologies. Our work builds on the “Ten Simple Rules for making a vocabulary FAIR” (Cox et al., 2021), prepared by a group formed at a workshop convened by CODATA and DDI to describe how a FAIR vocabulary will work with international standards for documenting and sharing social science data. Controlled vocabularies play a central role in data sharing by associating data with concepts and by defining which categories or codes may be applied. FAIR vocabularies specify globally accessible persistent identifiers to distinguish data items that are the same from those that are different. Consider the most basic variable in demographic analysis: age. The Organization for Economic Cooperation and Development (OECD) has a list of 643 age categories, while the UN Population Division copes with more than 1100 age groups. If the meanings of variables in a dataset are only available through human-readable documentation, like a pdf, harmonizing data from two providers will remain a tedious manual process. However, if the age categories are linked to persistent identifiers in machine actionable metadata, software can be programmed to harmonize age groupings. If these operations are performed across dozens of variables in hundreds of data sources, enormous amounts of human time will be saved. Construction of the infrastructure for FAIR data has begun. Demographic concepts are already included in vocabularies developed by other disciplines, like medicine, with definitions that conflict with usage in population research. Therefore, there is a need for a FAIR vocabulary of demographic concepts endorsed by an authoritative institution in the field of population science. IUSSP has a long history of working with the UN and other agencies to define demographic concepts (International Union for the Scientific Study of Population, 1954; Vincent, 1953). Those efforts currently exist in electronic forms (Demopædia and Demovoc) that provide a base for a multilingual FAIR Vocabulary of Demography. We argue that a FAIR Vocabulary of Demography will have important benefits for the population research community represented by IUSSP, and we conclude with recommendations for IUSSP and other important organizations. In addition to summarizing the activities of the Working Group, this report is intended to serve as an introduction to the standards and infrastructure used to share social science data. Most demographers have never heard of URIs, SDMX, or DDI, even though they use services from the UN, ILO, OECD, CESSDA, IPUMS, and other organizations that depend on these standards. Understanding key features of the international data infrastructure will help IUSSP leadership to influence its development.