The Index: An overview
(Please note. The full FAQ with references and sources is available at (pdf file)
The Freedom Index (FI) is a proposed international standard for the classification of human rights (and human rights related) information. The idea behind this ambitious project is to ensure that any information in the public domain can be easily and precisely found. Presently, only a tiny sliver of information can be discovered.
In its initial stage of development the index will centre on human rights information, though – in theory – the system can be expanded to incorporate all streams of human knowledge.
The numeric system (known as a “taxonomy”) assigns unique subject reference codes to audio and visual material, reports, judicial decisions, websites, legislation, articles, blogs, forums, research material etc. It is best imagined as a library index system, though instead of identifying individual works, the new index is centred on microscopically identifying subject matter. The system intends to substantially increase the visibility, longevity and effectiveness of human rights information published in all languages, both online and offline. It will create an unprecedented linkage of information. Importantly, the system will help nurture a level playing field across all languages so information can be more easily found outside the searcher’s own language and geographic location.
We started working on the FI because an increasing amount of human rights information is becoming inaccessible and even invisible. Almost everybody uses search engines to find information, but the reality is that online data are often impossible to discover via conventional search techniques, particularly across language and discipline boundaries. This means discovery of crucial information is often random and accidental. Put simply, there is now so much volume and diversity of online information that the vast majority is lost in the white noise of search results. And while there are many dedicated silos and search portals for specific information, these are often difficult to identify – and they rarely if ever connect to each other. In many respects, the old vision of an open Internet is being replaced by a battle of walled gardens (Facebook being perhaps the most notable example). Although search technologies are constantly changing, presently only one page in 500 is accessible to search engines.1 New search techniques are constantly in development and libraries are working, for example, on advanced metadata concepts. The FI will complement such efforts rather than seeking to replace them.
Search companies are struggling to deal with this challenge. It’s certainly true that if a searcher knows exactly what to look for (say, a specific book or a court decision) there is a reasonable chance that search engines will identify that content. However, if you don’t know the precise details of the material, you are unlikely ever to discover it. Additionally, the prospect of finding precisely relevant material outside your own language has now – in many instances – approached zero. The above problems are likely to become more onerous both for individual users and for human rights organisations. A recent study2 by the World Economic Forum (WEF) on the future of the Internet predicts a fifty-fold increase in online data by 2020, The report observes: “Global standards in medicine, for example, allow for communication between doctors who cannot speak each other’s language. Now we need to establish a similar harmony for data.” It is, course, true that the rapid acceleration of more humanized methods of mobile and voice search may lead to more accurate discovery of some information,3 but this trend will almost certainly exclude any data outside the searcher’s own language.
The Index system
The FI architecture works alongside search and other systems with the aim of bridging and improving them. Our framework is extremely simple and robust, meaning that it can be used with many existing systems – even in the offline world. This means that archivists dealing with countless paper items can easily index the material and create a public notification of their existence.
The FI is similar in many respects to the major library systems, particularly the Dewey Decimal Classification (DDC). Its purpose is to identify infinitesimally small subsets of data from the mass of available information, allowing searchers to discover all precisely relevant data in whatever language or region they choose. When people or organisations publish any data online, they have the option of generating a 12-digit code which is then attached to the information. This code – together with the related data – are centrally indexed by us and are also crawled by the search engines. The code number is microscopically focused on the precise content (or multiple points of focus) being indexed. As described later, this number or more likely, a cluster of numbers), may also be automatically generated. Crowd interaction may also accelerate and refine this process.
We are also on creating the technology to automate code generation of all online data.
The Index logic
Like the major library systems, FI can be split between content fields and “cutter” fields that more precisely identify the data in question. Seven of the twelve fields in the code relate to the subject matter of the work being indexed. This hierarchical structure provides around seven million sub divisions of human rights. However, this focus is further refined through three geographic and language fields that identify every country, region and every official language. A “media” field fine tunes the code even further by specifying whether the data is a web document, hard copy publication, audio/visual content or whatever.
One of the extraordinary features of FI is the “Control Field”, which is the first field of the index code. This permits publishers to specify the status of data. Much of the material will be public, but there is also capacity to code data as “secret”, “draft”, “in development” or “restricted”, meaning that the system will present options to establish a range of access permissions. This means, for example, that researchers or campaign organisers can promote the fact of their work online without the need to reveal the full content. Only a title, abstract and contact information is revealed. The same applies to extremely sensitive data. Authorised publishers are able to shift the index status so that restricted documents can be made public simply by changing the control field number from “2” to “1”.
The system is extremely fine grained. For example, entering the code 141433211311 (as a hypothetical example) will reveal all publicly available United Kingdom online judicial decisions in the English language relating to appeals against prosecutions arising from alleged interference with communications surveillance for national security purposes. This presently could result in fewer than twelve search results, all precisely relevant. Searching different permutations of the last three fields will yield a narrower or broader selection of results (changing 141433211311 to 141433211000 will produce the same subject focus, but will reveal published material in all languages and regions).
The FI is urgently needed. The Human Rights sector needs to follow the example of medicine and engineering by developing its own narrative, language and taxonomy. Layering smart technologies over an inadequate foundation will simply result in a corrupted outcome. This doesn’t mean we won’t integrate automation – indeed it’s likely that much of the indexing can be automated. However, we will not rely solely on “black box” technology, but rather, a combination of automation and an open source/crowd source approach to the indexing of information.
Embedded quality standards
The code emerged from an open process, and within it are a number of important safeguards, such as design standards requiring the index to be simple, usable, flexible, transparent, collaborative, non-proprietary, language-neutral, scalable and offline-friendly. In January 2017 the Freedom Index was established in the Netherlands as an independent non-profit foundation (a “Stichting”).
The FI will interface with advanced technologies that can help accelerate the cataloguing of billions of items of data. In the end, however, there should ideally be human input, particularly for material such as YouTube videos and media broadcasts. Additionally, the system needs to be trained. We need a technology friendly roadmap through human rights so everyone can benefit from the masses of crucial data that are produced each day.
1New Scientist https://www.newscientist.com/article/mg20827872-000-engines-of-the-future-into-the-deep-web/ https://searchenginewatch.com/sew/opinion/2411478/longer-search-queries-are-becoming-the-norm-what-it-means-for-seo
2 World Economic Forum, “Mapping the Future”, 2015 http://reports.weforum.org/outlook-global-agenda-2015/future-agenda/mapping-the-future-the-future-of-the-internet/