What can the Open Data movement learn from Open Access?
How Open Data relates to Open Access
by Wander A.M. Engbers and Wendy Carrara, Capgemini Consulting.
Last May the PASTEUR4OA project organised a conference to discuss developments and next steps to strengthen the Open Access movement in the European Union. Open Access (OA) refers to the sharing of academic outputs such as articles without restrictions to the access or use. During the conference, called “Green Light for Open Access: Aligning Europe’s OA Policies”, all stakeholders from the Open Access domain came together in Amsterdam to exchange experiences, best practices and success stories. It also fell together with the recent developments which introduced Open Data and Open Access as part of the (URL) Horizon 2020 research funding. Together, it formed a good incentive for the European Data Portal to explore Open Access and find out how it relates to Open Data
The PASTEUR4OA (Open Access Policy Alignment Strategies for European Union Research) project supported the development and implementation of Open Access policies in Europe, stimulating the sharing of academic work under Open Data licences. In May 2016, this project of the European Commission came to its end and was included in the (url) OpenAIRE project. To see where the project brought the movement and what still lies ahead, different stakeholders from the Open Access movement such as universities and publishers came together in Amsterdam.
The conference, was structured into five themes such as the outcomes, technical details, the next steps, shedding light and exchanging best practices on the implementation of Open Access policies throughout the European Union. An implementation which requires a ‘change of the system’ according to keynote speaker Ron Dekkers. While the word ‘Open’ is the obvious commonality between Open Data and Open Access, the commonalities in the concept and most importantly context are less clear. However, both concepts are based on the same idea that publicly funded data or knowledge should be shared to maximise its benefits. Both for the providers of Open Data, mostly governments, and the publishers of academic works, produced by academics but distributed by publishers, this requires major paradigm shifts.
Before exploring synergies between Open Data and Open Access, it is important to define the context and idea of Open Access. The term refers to distributing academic works under an Open Access licence. Thereby offering free access and re-use for any purpose. At the moment, big publishers such as Elsevier, Taylor & Francis or one of the many others distribute the academic works for a consideration. These companies provide editorial work and direct the peer-review process to assure the quality of the articles drafted by academics. Articles which they subsequently sell back to the academic community through subscriptions or individual downloads. This model is based on the pre-digital world in which the printing and distribution costs of physical journals justified the prominent commercial position of the publishers.
However, due to ongoing digitalisation, the (free) possibilities to share information or articles grew and the commercial distribution of publicly funded research started to be under scrutiny. While there is little discussion on the importance of publishing houses as means of assuring quality, academics started developing a new model to distribute scientific works: Open Access. In this model, research outputs such as articles are distributed without restrictions for access to or usage of the works by anyone. This can be done by offering free access to all articles in a repository (‘Green’ Open Access) or by making it available through a publisher (‘Gold’ Open Access). In the latter case, the publisher is paid an ‘Article Processing Charge’ for their quality assurance and services by the author or institution providing the article.
Open Access & Open Data
While Open Access concerns the publication of articles, reports and other forms of processed or analysed pieces of information, Open Data concerns the sharing of unprocessed or ‘raw’ information. This can concern all types of information which is collected, produced or paid for by public bodies or other organisations and made freely available for re-use for any purpose. Similar to Open Access the access to and (re-)use of the information is specified by an accompanying licence.
The conference mentioned previously took place to discuss the alignment of Europe’s Open Access initiatives and how to move towards a more tangible realisation of this new form of publication. During the event it became apparent that while the concept of sharing under a licence is similar, how the transformation takes place in Open Access differs from the Open Data developments. This is primarily due to seemingly difference of context, which in Open Access is more clearly confined to the academic world leading to a limited number of stakeholders. Furthermore, while in Open Data, the publisher provides data without much interference with the user, in Open Access both roles are filled by the academia being both supplier and biggest consumer of the product and possibly both involved in an academic debate. While the group of academia might be homogenous on the outside, within this group there are also diverse interests depending on the organisational position. While the organisations who finance the research are interested in encouraging the use of (free) Open Access articles, for the producing researchers the financial element is of lower importance than the ‘prestige’ of publishing in a renowned journal.
While the researcher is primarily focused on getting its work published and cited by the most prominent medium, the body financing the research is primarily focused on keeping its costs low so it can finance more projects. To complicate the matter, there is a private party involved, which business model is built around the central position of its paid services: the publishers. A private party with vested business interests which affects the transformation. Not a private player but a player with the same ‘profit model’ can be found in the Open Data field in the business registers or cadastral data providers, for instance, as they tend to sell their data too. For them, the sharing of their data will also mean a different business model to work with so a new charging model has to be developed. Although this debate is held in both movements, the main difference is that in Open Access the ‘charging’ question concerns a private party whereas in the Open Government Data field, the question concerns a (semi-) public body. In the latter case, the collection and curation of data has already been paid for with tax-payers money and is then sold to any potential user, business or citizen. This creates a difference in the relations between the stakeholders but can also provide inspiration for the two movements to learn from each other.
As Beate Eellend of the Swedish National Library said “we need all stakeholders aboard to reach the set Open Access goals.” This accounts for both movements.
These two contexts, the charging model and relation between user and provider, initially seemed to create distance from the movements until the topic of research data came under discussion. This data refers to the raw material on which the analysis of the academic publication is based upon. To truly maximise the potential of sharing research, it should not stop with sharing the outcome of an analysis: the underlying data should also be shared. The Horizon 2020 programme has already embraced this policy in their updated (URL) work programme. In which form should the research data be shared? In the form of Open Data. When it comes to this aspect, many similarities are found both within the concept and context. Furthermore, the publication of research data extends the lifetime of the article and invested work. While the article can be bound by time, the data can be re-used over and over again, referring back to the original work.
It is often said that the value of Open Data lies in its use, which creates the need for finding a purpose of application of a certain datasets before someone sees the value of sharing the data. For Open Access, the value of the product is already irrefutable proven by the nature of what is shared, namely academic research. Behind this academic work, a series of exercises are conducted from collecting the data to doing the analysis which create the value. While these extra steps in Open Access are done by the researcher, for Open Data, it is the users who are expected to do this and thus create the value. This creates a superficial difference in how the value of what is shared is experienced, simply because what is shared, is shared at a different moment. The Data Value Chain can provide an explanatory framework for this difference as it provides a chronological frame for the production of value with data.
Figure 1: Data Value Chain
While the shared product under an Open Access licence is found on the right side of the chain, the data shared as Open Data is on the left side. Essentially, the two movements represent a different place in the data value chain. With the Data Value Chain as transcending paradigm, much can be learned from both developments. On the technical side, the Open Data movement is strongly organised at the pan-European level, for example with the (URL) European Data Portal. On the other hand, Open Data could learn from how Open Access brings together the different stakeholders and interests to reach a common point. In the Open Access context, all stakeholders know the value, a shared sense of value still waiting to be created in Open Data.
The European Data Portal harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included. Going beyond the harvesting of metadata, the strategic objective of the European Data Portal is to improve accessibility and increase the value of Open Data. The European Data Portal addresses the whole data value chain: from data publishing to data re-use.