Workspaces:1:Linked Open Data

Linked open data is sometimes described as 'Web 3', the next evolution in the web from a web of documents, to a web of interlinked datasets.

This workspace records some of the work undertaken during the IKM Emergent programme to identify how linked open data technologies and practices might impact upon development knowledge sharing.

Contents

Young Lives Linked Data Demonstratordata.younglives.org.uk

 

Young Lives is an international study of childhood poverty, involving 12,000 children in 4 countries over 15 years. It is led by a team in the Department of International Development at the University of Oxford in association with research and policy partners in the 4 study countries: Ethiopia, India, Peru and Vietnam. One of the goals of the programme is to see widespread use of both it's published research, and the datasets that have been collected as part of the study.

We undertook two pilots to explore how linked data might play a role in communicating Young Lives research.

Phase 1

We developed a set of scripts to model a subset of the Young Lives survey data as RDF Linked Data ( Notes on Modelling RDF data). The modelled data was loaded into an OntoWiki platform, and made accessible to browse as linked data, and a graphing widget was developed to access the data using SPARQL to access the data, and to display comparable datasets from WHO (mocked up in RDF as raw RDF was not available).

This phase:

  • Identified key practical issues in creating and publishing linked data;
  • Identified that, whilst linked data should allow connections to be made across datasets:
    • (a) Subtle differences in definition across datasets can limit the possibility of automated comparison;
    • (b) There are limited datasets available covering key development topics in RDF at present, and this limits the scope for automated comparison;
  • The self-describing nature of linked data can support in-depth annotation of an academic study

The Phase 1 demonstration is no longer online.

Phase 2

Young Lives Grapher

The second phase focussed on:

  • Modelling all the questions asked during the study and publishing these as linked data;
  • Publishing selected statistics from Round three of the study as linked data and making these accessible through an interactive visualisation;
  • Representing the structure of the Young Lives study as linked data (using a SKOS concept scheme;
  • Modelling details of publications from the study, integrating data on these from R4D, a third party source of data on some of the publications

The resulting site at data.younglives.org.uk has been designed to provide a stable platform for end-users to access and browse - making key concepts and findings from the study accessible to both humans and computers.

Outputs

The site at data.younglives.org.uk includes an interactive graphing tool (accessible from country pages), documentation of linked data features, and a draft topic map.

Two add-ons to the OntoWiki platform have been developed, and are released as open source projects on GitHub:

  • The Young Lives Grapher - which takes RDF Data Cubes and displays interactive graphs using the Google Chart API
  • The CSV to Data Cube Import Tool - which takes formatted CSV files, and provides an interface to convert these into RDF Data Cubes

Custom code to generate mappings between SPSS, DDI and RDF models of study structure, and to process Young Lives publication files are also archived on GitHub.

Technical Issues Paper

Elements of the Linked Open Data Puzzle

A Technical Issues paper collates together key learning from the process of developing a number of linked data pilots. It highlights key considerations for practitioners exploring the development of linked open data projects in the development field. The draft is available in three sections:

The paper remains a working draft, though elements of it have been re-used in a number of other focussed publication.

The Social Life of Open Data

IATI EcoSystem

The Social Life of Open Data (SLOD) project has explored the ways in which open data re-use relies upon a chain of re-use, with different actors between 'raw datasets' and their re-use playing a significant roles in shaping how data is interpreted and used.

Building on a mapping of linked and open data in development the SLOD pilot has looked at open data re-use from the International Aid Transparency Initiative, capturing details of diverse data re-use from the project.

In looking at IATI data, a distinction between the 'infrastructures' of open data, and the 'eco-system' of re-use was identified, and this formed the basis of initial project write-up. Time constraints and the technical difficulties of capturing and managing full provenance information mean that a full analysis of the social provenance chains involved in IATI data re-use (and the application of the method to further datasets) has been postponed until later in 2012, to be completed as part of the authors PhD work outside of the IKM Emergent programme.

Outputs

Further papers coming in 2012 as part of Tim Davies wider Open Data Impacts research work.

Further Resources

See the IKM Emergent Linked Data dGroup for more context and discussion.

Global Hunger Index as Linked Data (outcome of 2010 workshop).

Draft Literature Review and Young Lives Working Notes (deprecated)

Open IGF