Linked open data is sometimes described as 'Web 3', the next evolution in the web from a web of documents, to a web of interlinked datasets.
This workspace records some of the work undertaken during the IKM Emergent programme to identify how linked open data technologies and practices might impact upon development knowledge sharing.
Contents |
Young Lives Linked Data Demonstrator
Young Lives is an international study of childhood poverty, involving 12,000 children in 4 countries over 15 years. It is led by a team in the Department of International Development at the University of Oxford in association with research and policy partners in the 4 study countries: Ethiopia, India, Peru and Vietnam. One of the goals of the programme is to see widespread use of both it's published research, and the datasets that have been collected as part of the study.
We undertook two pilots to explore how linked data might play a role in communicating Young Lives research.
Phase 1
We developed a set of scripts to model a subset of the Young Lives survey data as RDF Linked Data ( Notes on Modelling RDF data). The modelled data was loaded into an OntoWiki platform, and made accessible to browse as linked data, and a graphing widget was developed to access the data using SPARQL to access the data, and to display comparable datasets from WHO (mocked up in RDF as raw RDF was not available).
This phase:
- Identified key practical issues in creating and publishing linked data;
- Identified that, whilst linked data should allow connections to be made across datasets:
- (a) Subtle differences in definition across datasets can limit the possibility of automated comparison;
- (b) There are limited datasets available covering key development topics in RDF at present, and this limits the scope for automated comparison;
- The self-describing nature of linked data can support in-depth annotation of an academic study
The Phase 1 demonstration is no longer online.
Phase 2
The second phase focussed on:
- Modelling all the questions asked during the study and publishing these as linked data;
- Publishing selected statistics from Round three of the study as linked data and making these accessible through an interactive visualisation;
- Representing the structure of the Young Lives study as linked data (using a SKOS concept scheme;
- Modelling details of publications from the study, integrating data on these from R4D, a third party source of data on some of the publications
The resulting site at data.younglives.org.uk has been designed to provide a stable platform for end-users to access and browse - making key concepts and findings from the study accessible to both humans and computers.
Outputs
The site at data.younglives.org.uk includes an interactive graphing tool (accessible from country pages), documentation of linked data features, and a draft topic map.
Two add-ons to the OntoWiki platform have been developed, and are released as open source projects on GitHub:
- The Young Lives Grapher - which takes RDF Data Cubes and displays interactive graphs using the Google Chart API
- The CSV to Data Cube Import Tool - which takes formatted CSV files, and provides an interface to convert these into RDF Data Cubes
Custom code to generate mappings between SPSS, DDI and RDF models of study structure, and to process Young Lives publication files are also archived on GitHub.
Technical Issues Paper
A Technical Issues paper collates together key learning from the process of developing a number of linked data pilots. It highlights key considerations for practitioners exploring the development of linked open data projects in the development field. The draft is available in three sections:
- Primer - Introducing Linked Open Data (PDF)
- Mapping Linked and Open Data in Development (PDF)
- Creating and Using Linked Data (PDF)
The paper remains a working draft, though elements of it have been re-used in a number of other focussed publication.
The Social Life of Open Data
The Social Life of Open Data (SLOD) project has explored the ways in which open data re-use relies upon a chain of re-use, with different actors between 'raw datasets' and their re-use playing a significant roles in shaping how data is interpreted and used.
Building on a mapping of linked and open data in development the SLOD pilot has looked at open data re-use from the International Aid Transparency Initiative, capturing details of diverse data re-use from the project.
In looking at IATI data, a distinction between the 'infrastructures' of open data, and the 'eco-system' of re-use was identified, and this formed the basis of initial project write-up. Time constraints and the technical difficulties of capturing and managing full provenance information mean that a full analysis of the social provenance chains involved in IATI data re-use (and the application of the method to further datasets) has been postponed until later in 2012, to be completed as part of the authors PhD work outside of the IKM Emergent programme.
Outputs
- SLOD Tool - an open source Django application for capturing data using the W3C PROV-DM model.
- File:Social Life of Data Research Proposal.pdf - developing methods for exploring the social life of data.
- File:Social Life of Data - Infrastructure and Ecosystem Paper.pdf - write up of initial research looking at social structures around domestic and international open data projects.
Further papers coming in 2012 as part of Tim Davies wider Open Data Impacts research work.
Further Resources
See the IKM Emergent Linked Data dGroup for more context and discussion.
Global Hunger Index as Linked Data (outcome of 2010 workshop).
Draft Literature Review and Young Lives Working Notes (deprecated)