Big Data & Issues & Opportunities: Open Data

In this tenth article in our series on "Big Data & Issues & Opportunities" (see our previous article here), we address various legal issues and opportunities that one may encounter when using open data for big data technologies. As in our previous articles, illustrations from the transport sector will be provided where relevant.

The 'big data' required to feed big data analytics tools can emanate from a variety of sources. One such source is the public sector, which has been opening up certain of its datasets to the public.[1] Such public disclosure and use of these datasets is however subject to rules at both EU and national level, which will be discussed in this article.

What is open data?

In the context of the Digital Single Market strategy of the European Commission, the concept of open data refers to "the idea that certain data should be freely available for use and re-use".[2] "Open data" moreover increasingly refers to so-called public sector information ("PSI"), i.e. material produced, collected, paid for and/or held by public sector bodies at national, regional and local level, such as ministries, agencies, municipalities, but also by organisations that are mainly funded by or under the control of a public authority.[3]

The EU institutions have taken both legislative and non-legislative measures to encourage the uptake of open data. On the non-legislative front, the European Commission has been very active in the field of open data, providing for soft measures facilitating access to data. Its involvement has included engaging with Member States through the Public Sector Information expert group (PSI Group), funding the Legal Aspects of Public Sector Information (LAPSI) network and developing the EU Open Data Portal, which provides access to data from the EU institutions and bodies for re-use[4], to name a few.

The PSI Directive

On the legislative front, the EU adopted its first set of rules on the re-use of public sector information (the "PSI Directive") already in 2003.[5] The aim of that Directive was not so much to make public data more accessible and encourage its re-use, but to ensure that when public sector bodies decided to make data available, they did so in a fair and non-discriminatory manner.[6] Consequently, while public authorities had to comply with these requirements when they decided to make data available, the making available of data as such had not been made mandatory. The initial version of the PSI Directive was even primarily aimed at paper documents, even though electronic data already fell within its scope of application.

In 2013, the PSI Directive was given a thorough makeover in order to keep pace with technological developments, which had led to the rise of the data economy, and to unlock the potential of big data held and accumulated by government authorities.[7] In a significant departure from the first PSI Directive, an actual obligation was introduced for public sector bodies to make public sector information available.[8] This effectively eliminated the possibility for public bodies to avoid application of the Directive by deciding not to make information available in the first place. Still, the amended Directive includes a number of exceptions to the principle of mandatory data provision.[9] Other provisions introduced by the Directive stipulate, among others, that the information can be made available "as is" or subject to conditions, which can be imposed by way of a licence. Member States are moreover encouraged to develop standard licences that should be made available in digital format.[10]

Opportunities in the use of open data

Public sector information is a resource with great potential for a number of beneficiaries, ranging from other public sector bodies, to private businesses including start-ups, SMEs and multinationals, to academia and citizens themselves.[11] Start-ups and SMEs typically do not have the same amount or type of resources as larger companies, and as a result may encounter difficulties when trying to gain access to certain data or may even fail to obtain access altogether. This competitive disadvantage can constitute a barrier for start-ups and SMEs to enter certain markets. The PSI Directive attempts to remove this disadvantage with respect to public sector information, among others through the non-discrimination principle. This principle ensures that start-ups and SMEs are able to use PSI for commercial purposes under the same conditions as would be imposed on any other company for a similar purpose.[12]  

In the transport sector, open government data covers a wide variety of data categories. Departure and arrival times, timetables of public transportation, fares, safety-related or other types of disruptions are only a few types of information that is typically held by public sector entities. As this data is opened up to the public in an open, standardised, machine-readable format, SMEs and start-ups may be enabled to enter markets they would have been prevented from entering if they were required to gather the relevant data in other ways. Similarly, the proliferation of tools to analyse this information, including tools for big data analytics, can pave the road for those companies to explore new business opportunities.

Illustration in the transport sector: In maritime industries, a huge amount of data is created and collected through AIS. 'AIS' stands for Automatic Identification System and was created as a navigation and anti-collision tool. Hoping to foster innovation in the industry, the Danish Maritime Authority decided in 2016 to make historical AIS data available through an open data platform, in addition to the live AIS data feed that it was already offering.[13] While AIS was originally designed to improve maritime safety conditions, many other uses can be envisaged. One application that could result from the accessibility of AIS data is being considered in the port of Rotterdam, where AIS data is used to analyse current and historical vessel dwell times. The dwell time of a ship in a port is the time during which it is docked. Long, avoidable dwell times are a big waste of time and resources for operators. The analysis of AIS data aims to forecast dwell times, which individual shippers would then be able to use to support transport decisions.

Challenges in the use of open data

Today, an EU-based company that wishes to rely on public sector information for big data applications may still encounter several challenges, three of which we will touch upon in this section: (i) licensing; (ii) the interplay between the legal regimes on open and personal data;  and (iii) the interplay between the PSI Directive and the Database Directive.[14]

The PSI Directive allows public sector bodies to make the re-use of data subject to conditions, notably through the use of licences.[15] While Member States are required to have in place standard licences for the use of public sector information, public sector bodies are merely "encouraged" and thus not obliged to use them.[16] Despite guidelines on recommended standard licences being adopted by the Commission in 2014[17], little uniformity is seen as EU Member States have embraced very different licensing practices.[18] As a consequence, any company that wishes to reuse public sector information from different Member States with the aim of developing a product is obliged take into account as many (and perhaps even more) licences as the number of Member States in which it operates.

On the interplay between open and personal data it should be noted that, in theory, the relationship between the PSI Directive and the General Data Protection Regulation ("GDPR") evokes little question. The former clearly states that it is without prejudice to the rules on personal data protection (at the time still contained in Directive 95/46/EC) and that documents may be excluded from the Directive's scope on account of data protection rules.[19] In the same vein, the GDPR clarifies that the PSI Directive in no way affects "the level of protection of natural persons with regard to the processing of personal data" and does not alter the rights and obligations set out in the GDPR. It does however allow the principle of access to public sector information to be taken into account when applying the GDPR.[20] While the abovementioned rules should not be understood as meaning that PSI containing personal data cannot in any case be disclosed, they nevertheless create a tension which typically leads to PSI remaining locked.

Still, what the above really implies is that a careful assessment should be made to determine the circumstances under which the personal data part of PSI could lawfully be disclosed. That assessment involves among others determining whether the relevant public sector dataset contains personal data and if that is the case, ensuring that following disclosure, the dataset is processed in accordance with data protection laws.[21] This gives rise to a number of additional challenges, among others stemming from the broad definition of "personal data". Another example is the fact that making available PSI for re-use for all commercial and non-commercial purposes risks being at odds with the principle of purpose limitation enshrined in the GDPR. The same holds true for the principle of data minimisation. A potential means to avoid grave violations of the GDPR would be to conclude agreements with third parties to make arrangements for bilateral data sharing involving exclusivity, but these are principally forbidden by the PSI Directive as such practice would not create a level playing field.[22] It is thus clear that data protection legislation presents a unique challenge to the opening up of public sector information, either because it risks preventing a large part of PSI datasets from being disclosed altogether or because it creates compliance issues when public sector bodies do decide to disclose PSI containing personal data. 

Uncertainty also exists about the precise relationship between the PSI Directive and the Database Directive. The PSI Directive states that it is without prejudice to that Directive and excludes from its scope all documents "for which third parties hold intellectual property rights".[23] It appears that this has been frequently relied upon by public bodies to exclude applicability of the PSI Directive to their information.[24] A concern exists among stakeholders that in this way, public bodies are able to circumvent the rules of the PSI Directive even where the data is perhaps not actually covered by any intellectual property right.[25]

Proposal for a revised PSI Directive

On 25 April 2018, the European Commission presented a proposal for revision of the PSI Directive (the “Recast Proposal”). Political agreement on the text was reached on 22 January 2019 by the negotiators of the European Parliament, the Council of the EU and the European Commission.[26] The most fundamental change with respect to the existing version of the PSI Directive relates to the Recast Proposal's material scope of application, which is extended to data held by public undertakings. The Recast Proposal clarifies that an undertaking is considered 'public' if public sector bodies may exercise "a dominant influence by virtue of their ownership of it, their financial participation therein, or the rules which govern it", regardless of whether that is a direct or an indirect influence. The only relevant criterion is therefore whether public sector bodies are able to exercise control over an undertaking.

While not all public undertakings are covered by the Recast Proposal, it does extend (among others) to (i) those active in the areas defined in Directive 2014/25/EU, which includes transport services and ports and airports; (ii) those acting as public service operators pursuant to Regulation 1370/2007/EC, which covers public passenger transport services by rail and by road; (iii) those acting as air carriers fulfilling public service obligations pursuant to Regulation 1008/2008/EC; and (iv) those acting as EU ship owners fulfilling public service obligations pursuant to Regulation 3577/92/EEC (the Maritime Cabotage Regulation).[27] The Recast Proposal is thus to a large degree targeted at public undertakings in the transport sector at large.

The Recast Proposal does limit its scope of application by excluding information held by public undertakings that is produced outside the scope of the provision of services in the general interest as defined by law or other binding rules in the Member State concerned.[28] It will thus be important to consider whether or not a public undertaking has produced the requested information in the context of the provision of services of general interest. The scope has been limited further in the text on which political agreement was found, and now also excludes data that are related to activities for which public undertakings are directly exposed to competition and are therefore exempt from procurement rules.[29]

However, even where the revised Directive would be applicable and except where otherwise required under applicable law, the public undertaking in question could still decide whether or not to disclose the information as no mandatory information sharing obligation has been introduced (thus far). In this sense, the obligations imposed on public undertakings would be similar to those imposed on public entities under the regime of the initial version of the PSI Directive. The regime is optional, but as soon as a public undertaking decides to make information available, it will have to respect the rules laid down in the Directive. One can wonder what the consequences will be of introducing such regime that, admittedly, is optional but has been paired with strict modalities. Public undertakings may have concerns about the compliance burden that these strict modalities would entail and therefore choose not to disclose any data as a result. This has been mitigated to some extent, as certain (mainly procedural) requirements on the processing of re-use requests were not made applicable to public undertakings.[30]

Another novelty in the Recast Proposal is the introduction of the category of so-called “high-value datasets”. These are datasets associated with important socio-economic benefits, the re-use of which should in principle be free of charge. The Annex of the Recast Proposal includes “mobility” as one of the thematic categories for high-value datasets. The datasets themselves are however not defined in the Recast Proposal itself, but would be adopted by the European Commission through a combination of Delegated Acts and Implementing Acts.[31] Public undertakings fear that such future Delegated Acts could force them to make high-value datasets available for free and would thereby significantly affect their competitive position on the market, as they could be put in an inferior position compared to private undertakings operating on the same markets, upon which no such obligations would be imposed. This could hinder ongoing innovation in public service undertakings by increasing the risk of investing in own datasets and collaborating with start-ups and thus taking away the incentive for public undertakings to carry out such activities.[32] This fear has been mitigated to some extent in the text of 22 January 2019, which expressly excludes the requirement to make such high-value datasets available for free in case this would lead to a distortion of competition in the relevant market.[33]

The Recast Proposal further introduces various smaller changes. It contains provisions aimed at facilitating the re-use of dynamic data (e.g. real-time traffic information), such as the obligation to proactively make such data available via a suitable Application Programming Interface (API).[34] The text also clarifies that costs related to data anonymisation[35] may be included in the fees charged to re-users.[36]

Illustration in the transport sector: In 2015, the German railway and infrastructure operator Deutsche Bahn, a public undertaking, organised its second Hackathon. Deutsche Bahn has an open data portal, and organised the contest under the motto “we provide the data, you innovate with it”. In 24 hours, the winning team managed to achieve very promising results through the evaluation of large amounts of data from infrastructure-related delays. More specifically, they enabled Deutsche Bahn to identify improvement potential for infrastructure by assessing whether problems are more often caused by concrete or by wooden sleepers and by indicating places with increased track position errors. Although Deutsche Bahn, as a public undertaking, was not (yet) under any obligation to make its data available, this is a clear example of the value that can be created by doing so.[37]

Limits to the desirability of opening up PSI: the case of essential services and critical infrastructure

The evolution of the PSI Directive since 2003 shows a continuous broadening of its scope. That trend is continued with the Recast Proposal which aims to include public undertakings. Taking into account the potential benefits of opening up data, it seems that this broadened scope can only be applauded. There can however be some limits to the desirability of making available public sector data, which we will illustrate here through the example of essential services and critical infrastructure. 

As explained in our fourth article of this series, the NIS Directive requires Member States to identify so-called operators of essential services. The latter are services that a Member State deems essential for the “maintenance of critical societal and economic activities”.[38] Such operators must among others be identified for all major modes of transportation, notably air, rail, water, and road. Not unimportantly, the NIS Directive makes no distinction between public or private entities and thus impacts both public and private operators in the transport sector.

Furthermore, Directive 2008/114/EC[39] (hereafter the “Critical Infrastructure Directive”) is concerned with the identification and designation of European critical infrastructures. These are assets, systems or parts thereof located in Member States that are essential for the maintenance of vital societal functions, health, safety, security, economic or social well-being of people, and the disruption or destruction of which would significantly impact the Member State concerned.[40] Similarly to the NIS Directive, security requirements are introduced for such European critical infrastructures. Member States must among others ensure that operators and/or owners of such infrastructures develop security plans to ensure the infrastructure’s protection.

Many operators in the transport sector either provide essential services within the meaning of the NIS Directive or operate a critical infrastructure within the meaning of the Critical Infrastructure Directive. In the transport sector, many essential services operators are public undertakings. The essential services covered by the NIS Directive are moreover likely to constitute services provided in the general interest. This would mean that, under the Recast Proposal, the PSI regime would cover those services offered by essential services providers.

There is however an inherent tension between the Recast Proposal's aim to make public data more accessible and to encourage the re-use of this information, and the aim of the NIS Directive to ensure security and continuity of those services that are essential for the maintenance of critical societal and economic activities. A certain amount of data gathered and generated through the provision of essential services will necessarily be of a sensitive nature. Making this sensitive data accessible to the public would inherently entail risks for the security and continuity of the service. The same reasoning applies to operators of critical infrastructures under the Critical Infrastructure Directive. This clearly shows that, while open data policies are for the most part beneficial to society, these policies should not be pursued thoughtlessly and certain sensitivities should be taken into account in current and subsequent revisions of the PSI Directive.

Conclusion

The Open Data movement and governments around the world, including the EU, are committed to making data, and more particularly 'government data' or public sector information, publicly available and usable. The EU institutions have taken both legislative and non-legislative measures to encourage the uptake of open data, most notably through the PSI Directive which attempts to remove barriers to the re-use of public sector information throughout the EU. Still, open data regimes also encounter a number of challenges – on a technical, economic and legal level – that cannot be ignored. The proposal for a recast of the PSI Directive aims to address some of these concerns. It introduces one major change by expanding the Directive’s scope to include public undertakings. While information sharing has not been made mandatory for public undertakings (yet), the new regime still constitutes a significant development for the transport sector, where services are often provided by public undertakings.

Our next article will address data sharing obligations in the context of big data, with illustrations drawn from the transport sector.

 

 This series of articles has been made possible by the LeMO Project (www.lemo-h2020.eu), of which Bird & Bird LLP is a partner. The LeMO project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 770038.

The information given in this document concerning technical, legal or professional subject matter is for guidance only and does not constitute legal or professional advice.

The content of this article reflects only the authors’ views. The European Commission and Innovation and Networks Executive Agency (INEA) are not responsible for any use that may be made of the information it contains.


[1] This is for instance the case where a national ministry for transport makes available a dataset containing public transport data, following which that dataset can be used by private companies to develop commercial products and services.

[2] European Commission, 'Open Data' (European Commission, 8 June 2018)  accessed 18 October 2018

[3] European Commission, 'European Legislation on Reuse of Public Sector Information' (European Commission, 25 April 2018) accessed 18 October 2018

[4] Accessible online here

[5] Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information, OJ L 345, 90

[6] PSI Directive, Recital 8

[7] Directive 2013/37/EU of the European Parliament and of the Council of 26 June 2013 amending Directive 2003/98/EC on the re-use of public sector information, OJ L 175, 1

[8] Consolidated PSI Directive, art 3 (1)

[9] Public sector information that contains personal data or is covered by intellectual property rights for instance must not be made available. Exceptions also apply for certain institutions (e.g. museums, libraries, and archives) and for situations where the authority has to generate revenue to cover a substantial part of the costs relating to its public duties. (Consolidated PSI Directive, art 2)

[10] Consolidated PSI Directive, art 8

[11] Barbara Ubaldi, 'Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives' (OECD Working Papers on Public Governance, No. 22, OECD Publishing 2013) 11  accessed 18 October 2018

[12] Stefaan Verhulst and Robyn Caplan, 'Open Data: A Twenty-first-century Asset for Small and Medium-sized Enterprises' (The Governance Lab 2015) 11 accessed 18 October 2018

[13] MI News Network, 'Danish Maritime Authority Makes Historical AIS Data Available To Everybody' (Marine Insight, 28 December 2016) accessed 18 October 2018

[14] Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, OJ L 77, 20

[15] The only limitation in this respect is the fact that conditions may not "unnecessarily restrict possibilities for re-use and shall not be used to restrict competition". (Consolidated PSI Directive, art 8(1))

[16] Ibid art 8(2)

[17] The Commission published Guidelines in July 2014 to help the Member States implement the revised rules and to indicate best practices regarding recommended standard licences, datasets, and charging for the re-use of public sector documents. See Commission Notice Guidelines on recommended standard licences, datasets and charging for the reuse of documents [2014] OJ C 240/1

[18] In some Member States, notably Poland, public authorities do not promote any model licence agreements. In others, like France and the United Kingdom, standard licences are in force. In other Member States such as Belgium, a lack of unity even exists within the different levels of government.

[19] Consolidated PSI Directive, arts 1(2)(cc) and 1(4)

[20] GDPR, Recital 154

[21] Consolidated PSI Directive, art 1(2)(cc)

[22] Consolidated PSI Directive, art 11(1)

[23] Consolidated PSI Directive, art 1(2)(b)

[24] European Commission, 'Consultation on PSI Directive Review, Synopsis Report' (European Commission 2018) 3 accessed 18 October 2018

[25] Ibid 6-7

[26] Proposal for a revision of the Public Sector Information (PSI) Directive, accessed 14 February 2019

[27] Recast Proposal, art 1(1)(b)

[28] Recast Proposal, art 2(1)(a)

[29] The text refers to the exemption from procurement rules in accordance with Article 34 of Directive 2014/25/EU. See here, accessed 19 February 2019

[30] Recast Proposal, art 4

[31] Recast Proposal, art 13 and https://data.consilium.europa.eu/doc/document/ST-5635-2019-INIT/en/pdf, p. 55, accessed 19 February 2019

[32] Valeria Ronzitti, 'European Commission Proposal for a Review of the PSI Directive Risks Hindering Innovation and Investments in Public Services' (CEEP, 26 April 2018) accessed 18 October 2018

[33] https://data.consilium.europa.eu/doc/document/ST-5635-2019-INIT/en/pdf, p. 56, accessed 19 February 2019

[34] Recast Proposal, art 5(4)

[35] For more information on data anonymisation, we refer to our third article in this series.

[36] Recast Proposal, art 6(1)

[37] Philipp Drieger, 'All aboard with Infrastructure 4.0 — Splunk wins Deutsche Bahn Internet of Things Hackathon' (Splunk) accessed 18 October 2018

[38] NIS Directive, Recital 20

[39] Council Directive 2008/114/EC on the identification and designation of European critical infrastructures and the assessment of the need to improve their protection [2008] OJ L 345/75

[40] Critical Infrastructure Directive, art 2(a)

Latest insights

More Insights
camera

Landmark decision by the District Court of Hamburg on text and data mining exceptions (Kneschke v. Laion)

Nov 05 2024

Read More
Satellite dish against a pink sky

World Space Business Week 2024

Nov 04 2024

Read More
magnifying glass

CJEU stresses data minimisation in Schrems III case

Nov 04 2024

Read More