Vines rationale

The “ Vines” activity, part of the IKM project, has been centred on the issue of “Internet bias” . The Vines activities were aiming at developing “ proof-of-concept” demonstrators to make content originating from the “ South” more visible. The Internet gives everybody and every organisation an opportunity to be a sender of Information, but obviously it does not give each sender an equal opportunity to reach an audience for that information. In the next paragraph we will argue why we expect that there is a bias on the Internet that makes information from the South less visible and harder to find.

Visibility

Over the years different mechanisms have come in use for information exchange, and in most cases the newer mechanisms only partially replaced older mechanisms.

  • Since the early nineties of this century e-mail was the first mechanism for most Internet users that they were exposed to. E-mail was and still is a way that information is divulged amongst individuals and groups. There is an idea that , originally set out by Frigyes Karinthy , that everyone is on average approximately six steps away, by way of introduction, from any other person on Earth (“Six degrees of separation”) E-mail users are a smaller group and closer connected, so potentially things could spread very quickly though e-mail. In reality even in much smaller and biased groups things appear to be spread much more slowly, as was shown by Liben-Lowell and Kleinberg (1)
  • In the late nineties the World Wide Web became an effective broadcasting medium for organizations and individuals, initially in North America and Western Europe, and other regions in the wold catching up. With the rapid growth of interlinked sites the total information offer became unmanageable. The first generation search engines did not offer much help to select the most relevant search results . Subject guides were developed for specific groups of users, like Eldis and the Development Gateway for the development sector. These useful initiatives rely on an editorial process and are therefore limited in their scope.
  • In the earlier years of this century the first generation search engines was replaced by services (in the first place Google) that use algorithms to present the most relevant references on top. This algorithm is often referred to as Pagerank after Larry Page (one of the founders of Google) although other search engines like Bing and Yahoo use similar methods. These algorithms rank results higher if more pages link to those pages, and the weight of these incoming links is higher if they come from pages to which many other pages link. So generally this would cause a bias against newcomers , and this is seen as a (temporary) setback as far example a domain name is changed. It is likely that the adverse effect will be stronger for developing countries. Vaughan and Telwell (2) have given evidence that the coverage of developing countries was already lagging behind when these services were introduced because the crawlers that discover the pages that are indexed by search engines did not find their pages due to a lack of incoming links.
  • Since the last years of this decade social media where users can share and recommend information within their social networks have become the mechanism for information exchange that currently attracts most attention. The Internet has been described as a series of “ echo chambers” for different communities that listen to different selections of news. Empirical data on the use of these media gives different pictures for different media:
    • Exchange on the picture exchange site ‘Flickr’ general does not do beyond two degrees of separation (3)
    • On Twitter there is a limited number of influential contributors , many of which are news agencies (4)
    • On Facebook users are more likely to redistribute information from contacts that are close ties than from weak ties. But in total the amount of information from “ weak ties” is larger than from “strong ties” as there are more of the former (5) . For Farhad Manjoo, author of the book ‘True enough’ where the concept of ‘echo chambers’ was introduced, this research was the reason to conclude that the “ echo chambers” do not exist.

We have not found in the literature empirical evidence that an internet bias against information from the South exists, and we are not in a position to do such research ourselves. However, we have argued above why we are convinced that such a bias exists:

  • Senders of information from the South are relatively late comers
  • Search engines are less likely to find the material through their crawlers, and will rank it lower due to low number of incoming links
  • Peer-to-peer exchange of information through social media and e-mail is relatively limited and likely to favour the major senders of information

The Vines activity has chosen to develop a proof of concept prototype of services to make information from the South more visible.

Terminology used

Even if the material is visible users will not see it if they do not use the right terminology to find it. In discussions amongst information professionals until recently the view was dominant that standardizing the terminology used to index information items would soon become irrelevant. Most of these systems (like classification codes and thesauri) stem from systems where indexers would assign descriptors ‘manually’ for library catalogues and Abstracting & Indexing databases (e.g. Sociological abstracts, Biosis, Agris) . This is no longer considered as feasible for all the information that is available to Internet users in any subject domain. Manual indexing was expected to be replaced by full text searching .

Recently there have been developments that have prompted more attention for standardized terminology systems for information services:


  • Many information services do offer now “ faceting” of search results, i.e. a breakdown for example author, publication date or subject . These value-added services have become feasible when the SOLR open source library . A breakdown according to subject is only possible if the subject terminology is standardized
  • The next generation of the World Wide web is expected to make the exchange of information possible on a deeper level than web pages / documents. The information held in texts or databases can be expressed as triples (“subject - predicate – object” , put more simply “thing – property – value”). Until recently the technology to exchange “ linked open data on the “ semantic web” was very experimental, but now these features are being built into main stream software products. Technology to extract triples from texts is now in use in larger services like the Thomson-Reuter news services. Meaningful exchange of triples is only possible if the same terminology is used or if different sets of technology can be mapped to each other.

The Vines activity has attempted to develop a workbench to visualize and work with the terminologies used to index material from the South and material from the North about the South. It has explored the use of Linked Open Data technology to compare “native” sets of terminology with the terminology used by the Thomson-Reuters automatic indexing system.


References

(1) Liben-Lowell, D., Kleinberg, J..(2008). Tracing information flow on a global scale using Internet chain-letter data. PNAS (Proceedings of the National Academy of Sciences), 105(12). http://www.pnas.org/content/105/12/4633.long (2) Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: evidence and possible causes. Information Processing & Management, 40(4), 693–707. Retrieved from http://www.sciencedirect.com/science/article/pii/S0306457303000633 (3) Cha, M., Mislove, A., & Gummadi, K. P. (2009). A Measurement-driven Analysis of Information Propagation in the Flickr Social Network. Social Networks, 721-730. http://www2009.eprints.org/73/1/p721.pdf (4) Lee, C., Kwak, H., Park, H., & Moon, S. (2010). Finding influentials based on the temporal order of information adoption in twitter. Proceedings of the 19th international conference on World wide web - WWW ’10, 1137. New York, New York, USA: ACM Press. doi:10.1145/1772690.1772842 (5) Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012). The Role of Social Networks in Information Diffusion. Arxiv preprint arXiv:1201.4145. Retrieved from http://arxiv.org/abs/1201.4145

Collections