Open Science Infrastructure
   HOME

TheInfoList



OR:

Open Science Infrastructure (or ''open scholarly infrastructure'') is
information infrastructure An information infrastructure is defined by Ole Hanseth (2002) as "a shared, evolving, open, standardized, and heterogeneous installed base" and by Pironti (2006) as all of the people, processes, procedures, tools, facilities, and technology whic ...
that supports the open sharing of scientific productions such as publications, datasets, metadata or code. In November 2021 the
Unesco The United Nations Educational, Scientific and Cultural Organization (UNESCO ) is a List of specialized agencies of the United Nations, specialized agency of the United Nations (UN) with the aim of promoting world peace and International secur ...
recommendation on Open Science describes it as "shared research infrastructures that are needed to support
open science Open science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional. Open science is transparent and accessib ...
and serve the needs of different communities". Open science infrastructures are a form of scientific infrastructure (also called ''
cyberinfrastructure United States federal government agencies use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computin ...
'', ''
e-Science E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable dis ...
'' or ''e-infrastructure'') that support the production of open knowledge. Beyond the management of common resources, they are frequently structured as community-led initiatives with a set collective norms and governance regulations, which makes them also a form of
knowledge commons The term "knowledge commons" refers to information, data, and content that is collectively owned and managed by a community of users, particularly over the Internet. What distinguishes a knowledge commons from a commons of shared physical resources ...
. The definition of open science infrastructures usually exclude privately owned scientific infrastructures run by leading commercial publishers. Conversely it may include actors not always characterized as scientific infrastructures that play a critical role in the ecosystem of open science, such as publishing platforms in open access (''open scholarly communication service''). Computing infrastructures and online services have played a key role in the production and diffusion of scientific knowledge since the 1960s. While these early scientific infrastructure were initially envisioned as community initiatives, they could not be openly used due to the lack of interconnectivity and the cost of network connection. The creation of the
World Wide Web The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
made it possible to share data and publications on a large scale. The sustainability of online research projects and services became a critical policy issue and entailed the development of major infrastructure in the 2000s. The concept of open science infrastructure emerged after 2015 following a scientific policy debate over the expansion of commercial and privately owned infrastructures in numerous research activities and the publication of the ''Principles for Open Scholarly Infrastructures''. Since the 2010s, large ecosystems of interconnected scientific infrastructures have emerged in
Europe Europe is a continent located entirely in the Northern Hemisphere and mostly in the Eastern Hemisphere. It is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east ...
,
South South is one of the cardinal directions or compass points. The direction is the opposite of north and is perpendicular to both west and east. Etymology The word ''south'' comes from Old English ''sūþ'', from earlier Proto-Germanic ''*sunþa ...
and
North America North America is a continent in the Northern Hemisphere, Northern and Western Hemisphere, Western hemispheres. North America is bordered to the north by the Arctic Ocean, to the east by the Atlantic Ocean, to the southeast by South Ameri ...
through the development of new open science project and the conversion of legacy infrastructures to open science principles.


Definitions and terminology

''Open science infrastructure'' is a form of knowledge infrastructure that makes it possible to create, publish and maintain open scientific outputs such as publication, data or software. A
Unesco The United Nations Educational, Scientific and Cultural Organization (UNESCO ) is a List of specialized agencies of the United Nations, specialized agency of the United Nations (UN) with the aim of promoting world peace and International secur ...
recommendation about
open science Open science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional. Open science is transparent and accessib ...
approved in November 2021 defines open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities". A SPARC report on European open science infrastructure includes the following activities within the range of open science infrastructures: "We define Open Access & Open Science Infrastructure as sets of services, protocols, standards and software contributing to the research lifecycle – from collaboration and experimentation through data collection and storage, data organization, data analysis and computation, authorship, submission, review and annotation, copyediting, publishing, archiving, citation, discovery and more".


Infrastructure

The use of the term "infrastructure" is an explicit reference to the physical infrastructures and networks such as power grids, road networks or telecommunications that made it possible to run complex economic and social system after the industrial revolution: "The term infrastructure has been used since the 1920s to refer collectively to the roads, power grids, telephone systems, bridges, rail lines, and similar public works that are required for an industrial economy to function (...) If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy". The concept of infrastructure was notably extended in 1996 to forms of computer-mediated knowledge production by Susan Leigh Star and Karen Ruhleder, through an empirical observation of an early form of open science infrastructure, the Worm Community System. This definition has remained influential through the next two decades in
science and technology studies Science and technology studies (STS) or science, technology, and society is an interdisciplinary field that examines the creation, development, and consequences of science and technology in their historical, cultural, and social contexts. Histo ...
and has affected the policy debate over the building of scientific infrastructure since the early 2000s Open science infrastructure have specific properties that contrast them with other forms of open science projects or initiatives: *Open science infrastructures are not simply a technical product but embed a set of tools, institutions and social norms. Consequently, infrastructures are not always visible as they can be largely hidden under the routine of normal activities The resilience and tacitness of the infrastructures makes it especially difficult to identify the real contributions and "labour cost" of open science work, as it remains "invisible in the university system". This make it also difficult to allocate funding effectively as critical infrastructure may remain undetected by funding bodies. *Open science infrastructures are durable and resilient. They are expected to run on a long-term basis and multiple research programs relies on. To some extent, infrastructure are successful when they are forgotten and become an integral part of routine research activities: "Infrastructure at its best is invisible. We tend to only notice it when it fails." *Open science infrastructures can be shared and used by different actors and communities. It must be sufficiently consistent to remain coordinated and yet it have to welcome a diverse array of local uses: "an infrastructure occurs when the tension between local and global is resolved". Predefined agreement on the scope and the governance of the infrastructure within all stakeholders is a critical step.


Openness and the commons

Open science infrastructures are open, which differentiate them with other scientific and knowledge infrastructure and, more specifically, with subscription-based commercial infrastructures. Openness is both a core value and a directing principle that affect the aims, the governance and the management of the infrastructure. Open science infrastructure face similar issues met by other open institutions such as
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
repositories or large scale collaborative project such as Wikipedia: "When we study contemporary knowledge infrastructures we find values of openness often embedded there, but translating the values of openness into the design of infrastructures and the practices of infrastructuring is a complex and contingent process". The conceptual definition of open science infrastructures has been largely influenced by the analysis of
Elinor Ostrom Elinor Claire "Lin" Ostrom (née Awan; August 7, 1933 – June 12, 2012) was an American Political science, political scientist and Political economy, political economist whose work was associated with New institutional economics, New Institution ...
on the
commons The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable Earth. These resources are held in common even when owned privately or publicly. Commons ...
and more specifically on the
knowledge commons The term "knowledge commons" refers to information, data, and content that is collectively owned and managed by a community of users, particularly over the Internet. What distinguishes a knowledge commons from a commons of shared physical resources ...
. In accordance with Ostrom,
Cameron Neylon David Cameron Neylon is an advocate for open access and Professor of Research Communications at thCentre for Culture and Technologyat Curtin University. From 2012 to 2015 they were the Advocacy Director at the Public Library of Science. Educatio ...
understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms. The economic theory of the commons make it possible to expand beyond the scope of limited scope of scholar associations toward large scale community-led initiatives: "Ostrom's work (…) provides a template (…) to make the transition from a local ''club'' to a community-wide infrastructure." Open science infrastructure tend to favor a non-for profit, publicly funded model with strong involvement from scientific communities, which disassociate them from privately owned closed infrastructures: "open infrastructures are often scholar-led and run by non-profit organisations, making them mission-driven instead of profit-driven." This status aims to ensure the autonomy of the infrastructure and prevent their incorporation into commercial infrastructure. It has wide range implications on the way the organization is managed: "the differences between commercial services and non-profit services permeated almost every aspect of their responses to their environment". Open science infrastructures are not only a more specific subset of scientific infrastructures and cyberinfrastructures but may also include actors that would not fall into this definition. "Open access publication platforms" such as
Scielo SciELO (Scientific Electronic Library Online) is a bibliographic database, digital library, and cooperative electronic publishing model of open access journals. SciELO was created to meet the scientific communication needs of developing countrie ...
, OpenEdition or the
Open Library of Humanities The Open Library of Humanities is a nonprofit, diamond open access publisher in the humanities and social sciences founded by Martin Paul Eve and Caroline Edwards. Founded in 2015, OLH published 27 scholarly journals as of 2022, and as of 2025 ...
are considered an integral part of open science infrastructures in the UNESCO definition and in several literature review and policy reports, whereas they were usually considered as a separate entities in the policy debate on cyberinfrastructure and e-infrastructures. In the 2010 report of the European Commission on e-infrastructure, scientific publishing platforms are "not e-Infrastructures but closely related to it". Open science infrastructures may also incorporate additional values and ethical principles. Samuel Moore has theorized a form of ''care-full scholarly commons'' that does not exist yet but would incorporate latent forms of open science infrastructure and communities: "In addition to sharing resources with other projects, commoning also requires commoners to adopt an outwardly-focused, generous attitude to other commons projects, redirecting their labour away from proprietary." In 2018, Okune et al. introduced a similar concept of "inclusive knowledge infrastructures" that "deliberately allow for multiple forms of participation amongst a diverse set of actors (…) and seek to redress power relations within a given context."


Principles for open science infrastructures

In 2015 ''Principles for Open Scholarly Infrastructure'' have laid out an influential prescriptive definition of open science infrastructures. Subsequent definitions and terminologies of open science infrastructures have been largely elaborated on this basis. The text has also influenced the definition of open science infrastructure retained by the UNESCO in November 2021. The ''Principles'' attempt to hybridize the framework of infrastructure studies with the analysis of the
commons The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable Earth. These resources are held in common even when owned privately or publicly. Commons ...
initiated by
Elinor Ostrom Elinor Claire "Lin" Ostrom (née Awan; August 7, 1933 – June 12, 2012) was an American Political science, political scientist and Political economy, political economist whose work was associated with New institutional economics, New Institution ...
. The principles develop a series of recommendations in three critical areas to the success of open infrastructures: * Governance: the governance of the infrastructure should be open and accountable to the scientific communities it aims to serve. Specific measures should ensure that the management of the organization is transparent and diverse. * Sutainability: the core activities of organization should be covered by recurring funds. Short-term subventions should be limited to short-term projects. Whil the organization could charge for services, it should not extend to the data that should remain "a community property". * Insurance: the technical infrastructure and the output of the organization are open. This ensure that the infrastructure can be recreated if necessary (in the jargon of open source, it becomes "forkable"). The text ends by mentioning several potential consequences of the principles. The authors advocate for a responsible centralization, that embodies a different than the large web commercial platforms like Google and Facebook while still maintaining the important benefit of centralized infrastructures: "we will be able to build accountable and trusted organisations that manage this centralization responsibly". Existing examples of large open infrastructure include ORCID, the Wikimedia Foundation or CERN. A more critical reception has focused on the underlying political philosophy of the ''Principles''. While the scientific community is a key part of the governance of open science infrastructure, Samuel Moore underline that it is never precisely defined, which raised potential issues of under-representation of minority groups:


History


Early developments (1950–1990)

Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by
Paul Otlet Paul Marie Ghislain Otlet (; ; 23 August 1868 – 10 December 1944) was a Belgian author, lawyer and peace activist; who was a foundational figure in documentalism, a precursory discipline to information science. Otlet created the Universal D ...
or
Vannevar Bush Vannevar Bush ( ; March 11, 1890 – June 28, 1974) was an American engineer, inventor and science administrator, who during World War II, World War II headed the U.S. Office of Scientific Research and Development (OSRD), through which almo ...
already incorporated numerous features of online scientific infrastructures. After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output. The issue became politically relevant after the successful launch of
Sputnik Sputnik 1 (, , ''Satellite 1''), sometimes referred to as simply Sputnik, was the first artificial Earth satellite. It was launched into an elliptical low Earth orbit by the Soviet Union on 4 October 1957 as part of the Soviet space progra ...
: "The Sputnik crisis turned the librarians’ problem of bibliographic control into a national information crisis." The emerging computing technologies were immediately considered as a potential solution to make a larger amount of scientific output readable and searchable. Access to foreign language publication was also a key issue that was expected to be solved by
machine translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...
: in the 1950s, a significant amount of scientific publications were not available in English, especially the one coming from the Soviet bloc. Influent members of the
National Science Foundation The U.S. National Science Foundation (NSF) is an Independent agencies of the United States government#Examples of independent agencies, independent agency of the Federal government of the United States, United States federal government that su ...
like Joshua Ledeberg advocated for the creation of a "centralized information system", SCITEL that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency. In the plan laid out by Ledeberg to Eugen Garfield in November 1961, the deposit would index as much as 1,000,000 scientific articles per year. Beyond full-text searching, the infrastructure would also ensure the indexation of citation and other metadata, as well as the automated translation of foreign language articles. Although it anticipates key features of online scientific platforms, the SCITEL plan was technically irrealistic at the time. The first working prototype on an online retrieval system developed in 1963 by Doug Engelhart and Charles Bourne at the Stanford Research Institute was heavily constrained by memory issues: no more than 10,000 words of a few documents could be indexed. Instead of a general purpose publishing platform, the early scientific computing infrastructures focused on specific research areas, such as
MEDLINE MEDLINE (Medical Literature Analysis and Retrieval System Online, or MEDLARS Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medic ...
 for medicine, NASA/RECON for space engineering or OCLC Worldcat for library search: "most of the earliest online retrieval system provided access to a bibliographic database and the rest used a file containing another sort of information—encyclopedia articles, inventory data, or chemical compounds." This early development of scientific computing affected a large variety of disciplines and communities, including the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection". Yet these infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication. To become technically feasible, scientific infrastructure could never be open and became fundamentally hidden to their end users: The development of digital infrastructure for scientific publication was largely undertaken by private companies. In 1963, Eugene Garfield created the
Institute for Scientific Information The Institute for Scientific Information (ISI) was an academic publishing service, founded by Eugene Garfield in Philadelphia in 1956. ISI offered scientometric and bibliographic database services. Its specialty was citation indexing and analysis, ...
that aimed to transform the projects initially envisioned with Lederberg into a profitable business. The
Science Citation Index The Science Citation Index Expanded (SCIE) is a citation index owned by Clarivate and previously by Thomson Reuters. It was created by the Eugene Garfield at the Institute for Scientific Information, launched in 1964 as Science Citation Index ( ...
relied on a computational processing of citation data. It had a massive and lasting influence on the structuration of global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal. Garfield also successfully launched ''Current Contents'', a periodic compilation of scientific abstracts that acted as a simplified commercial version of the central deposit envisioned within SCITEL. Rather than being replaced by a centralized information system, leading scientific publishers have been able to develop their own information infrastructure that ultimately reinforced their business position. By the end of the 1960s, the dutch publisher
Elsevier Elsevier ( ) is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell (journal), Cell'', the ScienceDirect collection of electronic journals, ...
and the german publisher
Springer Springer or springers may refer to: Publishers * Springer Science+Business Media, aka Springer International Publishing, a worldwide publishing group founded in 1842 in Germany formerly known as Springer-Verlag. ** Springer Nature, a multinationa ...
 have started to computarize their internal data, as well as the management of the journal reviews. Until the advent of the web, the landscape of scientific infrastructures remained fragmented. Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols". The birthing place of the World Wide Web, the CERN, had its own version of Internet, CERN-Net and also supported its own protocol for e-mail exchange. The European Space Agency used its own iteration of the RECON system also used by NASA engineers (ESRO/RECON). The insulated scientific infrastructures could hardly be connected before the advent of the web. Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".


The Web Revolution (1990–1995)

The
World Wide Web The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
was originally framed as an open scientific infrastructure. The project was inspired by
ENQUIRE ENQUIRE was a software project written in 1980 by Tim Berners-Lee at CERN, which was the predecessor to the World Wide Web. It was a simple hypertext program that had some of the same ideas as the Web and the Semantic Web but was different in s ...
, an information management software commissioned to
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
by the
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in Meyrin, western suburb of Gene ...
for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth". While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community". Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases, and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other". Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data". The web rapidly superseded pre-existing online infrastructure, even when they included more advanced computing features. From 1991 to 1994, users of the Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the ''Worm Community System'' could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services." The Web and similar protocols developed at the time have had a similar impact on scientific publications. Early forms of open access publishing were not developed by large scale institutional infrastructures but through small initiatives. Universal access, regardless of the operating system, made it possible to maintain and share community-driven electronic journals year before online commercial scientific publishings became viable: The first open-access repositories were individual or community initiatives as well. In August 1991,
Paul Ginsparg Paul Henry Ginsparg is an American physicist. He developed the arXiv.org e-print archive. Education He is a graduate of Syosset High School in Syosset, New York, on Long Island. He graduated from Harvard University with a Bachelor of Arts in ...
created the first inception of the
arXiv arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...
project at the
Los Alamos National Laboratory Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development Laboratory, laboratories of the United States Department of Energy National Laboratories, United States Department of Energy ...
in answer to recurring storage issue of academic mailboxes on account of the increasing sharing of scientific articles.


Building scientific infrastructures for the web (1995-2015)

The development of the World-Wide Web had rendered numerous pre-existing scientific infrastructure obsolete. It also lifted numerous restrictions and obstacles to online contribution and network management that made it possible to attempt more ambitious project. By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue. The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific time period, critical databases, online tools or publishing platforms could hardly be maintained; and project managers were faced with a ''valley of death'' "between grant funding and ongoing operational funding". Several competing terms appeared to fill this need. In the United States, the ''cyber-infrastructure'' was used in a scientific context by a US National Science Foundation (NSF) blue-ribbon committee in 2003: "The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy." E-infrastructure or e-science were used in a similar meaning in the United Kingdom and European countries. Thanks to "sizable investments", major national and international infrastructures have been incepted from the initial policy discussion in the early 2000s to the economic crisis of 2007–2008, such as the
Open Science Grid The Open Science Grid Consortium is an organization that administers a worldwide grid of technological resources called the Open Science Grid, which facilitates distributed computing for scientific research. Founded in 2004, the consortium is com ...
, BioGRID, the
JISC Jisc is a United Kingdom not-for-profit organisation that provides network and IT services and digital resources in support of further and higher education and research, as well as the public sector. Its head office is based in Bristol with ...
, or the Project Bamboo. Specialized free software for scientific publishing like
Open Journal Systems Open Journal Systems, also known as OJS, is an open source and free software for the management of peer-reviewed academic journals, created by the Public Knowledge Project, and released under the GNU General Public License. History Open Journa ...
became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal website and the digital conversion of existing journals. Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and not evolved significantly since then. By 2010, infrastructure are "no longer in infancy" and yet "they are also not yet fully mature". While the development of the web solved a large range of technical issues regarding network management, building scientific infrastructure remained challenging. Governance, communication across all involved stakeholders, and strategical divergences were major factors of success or failure. One of the first major infrastructure for the humanities and the social science, the Project Bamboo was ultimately unable to achieve its ambitious aims: "From the early planning workshops to the
Mellon Foundation The Andrew W. Mellon Foundation, commonly known as the Mellon Foundation, is a New York City-based private foundation with wealth accumulated by Andrew Mellon of the Mellon family of Pittsburgh, Pennsylvania. It is the product of the 1969 merger ...
's rejection of the project's final proposal attempt, Bamboo was dogged by its reluctance and/or inability to concretely define itself". This lack of clarity was further aggravated by recurring communication missteps between the project initiators and the community it aimed to serve. "The community had spoken and made it clear that continuing to emphasize
Service-oriented architecture In software engineering, service-oriented architecture (SOA) is an architectural style that focuses on discrete services instead of a monolithic design. SOA is a good choice for system integration. By consequence, it is also applied in the field ...
would alienate the very members of the community Bamboo was intended to benefit most: the scholars themselves". Budgets cuts following the economic crisis of 2007-2008 underlined the fragility of ambitious infrastructure plans relying on a significant recurring funds. Leading commercial publishers were initially distanced by the unexpected rise of the Web for academic publication: the executive board of
Elsevier Elsevier ( ) is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell (journal), Cell'', the ScienceDirect collection of electronic journals, ...
"had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal". The persistence of high revenues from subscription and the consolidation of the sector made it possible to fund the conversion of the pre-existing online services to the web as well as the digitization of past collections. By the 2010s, leading publishers have been "moving from a content-provision to a data analytics business" and developed or acquired new key infrastructures for the management scientific and pedagogic activities: "Elsevier has acquired and launched products that extend its influence and its ownership of the infrastructure to all stages of the academic knowledge production process". Since it has expanded beyond publishing, the ''vertical integration'' of privately owned infrastructures has become extensively integrated to daily research activities.


Toward open science infrastructures (2015-…)

The consolidation and expansion of commercial scientific infrastructure had entailed renewed calls to secure "community-controlled infrastructure". The acquisition of the open repositories
Digital Commons Digital Commons is a commercial, hosted institutional repository platform owned by RELX Group. This hosted service, licensed by bepress, is used by over 600 academic institutions, healthcare centers, public libraries, and research centers to show ...
and
SSRN The Social Science Research Network (SSRN) is an open access research platform that functions as a repository for sharing early-stage research and the rapid dissemination of scholarly research in the social sciences, humanities, life sciences, ...
by Elsevier has highlighted the lack of reliability of critical scientific infrastructure for open science. The SPARC report on European Infrastructures underlines that "a number of important infrastructures at risk and as a consequence, the products and services that comprise open infrastructure are increasingly being tempted by buyout offers from large commercial enterprises. This threat affects both not-for-profit open infrastructure as well as closed, and is evidenced by the buyout in recent years of commonly relied on tools and platforms such as SSRN, bepress, Mendeley, and Github." In contrast with the consolidation of privately owned infrastructure, the open science movement "has tended to overlook the importance of social structures and systemic constraints in the design of new forms of knowledge infrastructures". It remained mostly focused to the content of scientific research, with little integration of technical tools and few large community initiatives. "Common pool of resources is not governed or managed by the current scholarly commons initiative. There is no dedicated hard infrastructure and though there may be a nascent community, there is no formal membership." More precise concepts were needed to embed ethical principles of openness, community-service and autonomous governance in the building of infrastructure and ensure the transformation of small localized scholarly networks into large, "community-wide" structures. In 2013,
Cameron Neylon David Cameron Neylon is an advocate for open access and Professor of Research Communications at thCentre for Culture and Technologyat Curtin University. From 2012 to 2015 they were the Advocacy Director at the Public Library of Science. Educatio ...
underlined that the lack of common infrastructure was one of the main weakness of the open science ecosystem: "in a world where it can be cheaper to re-do an analysis than to store the data, we need to consider seriously the social, physical, and material infrastructure that might support the sharing of the material outputs of research". Two years later, Neylon, Geoffrey Bilder and Jenifer Lin defined a series of ''Principles for Open Scholarly Infrastructure'' that reacted primarily to the discrepancy between the increasing openness of scientific publications or datasets and the closeness of the infrastructure that control their circulation. Since 2015 these principles have become the most influential definition of Open Science Infrastructures and been endorsed by leading infrastructures such as Crossref, OpenCitations or Data Dryad and has become a common basis for the institutional evaluation of existing open infrastructures. The main focus of the ''Principles'' is to build "trustworthy institutions" with significant commitments in terms of governance, financial sustainability and technical efficiency sot that it can be durably relied on by scientific communities. By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer." According to the 2021 Roadmap of the (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm." Examples of extensive data sharing programs include the European Social Survey (in social science), ECRIN ERIC (for clinical data) or the Cherenkov Telescope Array (in Astronomy). In agreement with the original intent of the ''Principles'', open science infrastructure are "seen as an antidote to the increased market concentration observed in the scholarly communication space." In November 2021, the UNESCO Recommendation for Open Science acknowledged open science infrastructure as one of the four pillar of open science, along with open science knowledge, open engagement of societal actors and open dialog with other knowledge system and called for sustained investment and funding: "open science infrastructures are often the result of community-building efforts, which are crucial for their longterm sustainability and therefore should be not-for-profit and guarantee permanent and unrestricted access to all public to the largest extent possible." The development of open scientific infrastructure has become a debated topic regarding the future of online scientific research. In January 2021, a collective of researchers called for a ''Plan I'' or ''Plan Infrastructure'' in reaction to perceived shortcomings of the international initiative for open science of the cOAlition S, the ''Plan S''. In contrast with the focus of Plan S on scientific publication, Plan I aims to integrate all research outputs on large interoperable infrastructures: "research and scholarship are crucially dependent on an information infrastructure that treats all scholarly output, text, data and code, equally and that is based on open standards and open markets."


Organization of open infrastructures

Most of the landscape reports on Open Infrastructure have been undertaken in Europe and, to a lesser extent, in Latin America. For Europe, the main sources include the SPARC report from 2020, the OPERAS report on social science and humanities infrastructure, as well as the 2019 report of Katherine Skinner (that also extends to a few North American infrastructures). International studies include European Commission 2010 report on The Role of E-Infrastructure which mostly receive input from Europe, South America and North America. These reports underline that important open science infrastructures may be already existing and yet remain invisible to funders and scientific policies: "alternative practices and projects exist inside and outside Europe, but these projects are almost invisible to the eyes of the public authorities".


Type and roles

Open Access repositories are the most frequent form of Open Science Infrastructure with 5,791 repositories in existence in December 2021 according to OpenDOAR Yet, there is a significant diversification of the roles and the activities of open science infrastructure, at least among the largest infrastructures. In the survey of European infrastructure conducted by SPARC Europe, 95% of the respondents mention that they provide services in at least three different stages of research production out of six (Creation, Evaluation, Publishing, Hosting, Discovering and Archiving). Aggregation, hosting and indexing are especially central activities, common to most Open Science Infrastructures regardless of their focus. Specialization does happen at a higher level. A network analysis identifies "two main clusters of activities": * Publishing-focused infrastructures which are associated with the "publishing and hosting traditional text formats". Among them, "paper submission (41 out of 70) and review (30) were the most commonly reported activities". * Creation-focused infrastructures which deal preferably with the "processing and storing research outputs, particularly data". Theses actors provide specific services in the field of "data gathering (47 out of 71), and data analysis (40)". Besides, "computation and machine learning (18) and Experimentation (15) were roughly half as common".


Standards and technologies

Standardization is a major function of open science infrastructure as they aim to insure that the content they share and support is distributed consistently as well as ease reuse. Maintaining open standards is one of the main challenge identified by leading European open infrastructures, as it implies choosing among competing standards in some case, as well as ensuring that the standards are correctly updated and accessibile through APIs or other endpoints. Two third of the respondents have undertaken an evaluation of their technological environment during the past year, to ensure that key components have not become obsolete. As a consequence of this sustained efforts, most open infrastructure complies with the new established standards of open science, such as FAIR data or
Plan S Plan S is an initiative for open-access science publishing launched in 2018 by "cOAlition S", a consortium of national research agencies and funders from twelve European countries. The plan requires scientists and researchers who benefit from ...
. Open science infrastructures preferably integrate standards from other open science infrastructures. Among European infrastructures: "The most commonly cited systems – and thus essential infrastructure for many – are
ORCID The ORCID (; Open Researcher and Contributor ID) is a nonproprietary alphanumeric code to uniquely identify authors and contributors of scholarly communication. This addresses the problem that a particular author's contributions to the scien ...
, Crossref,
DOAJ The Directory of Open Access Journals (DOAJ) is a website that hosts a community-curated list of open access journals, maintained by Infrastructure Services for Open Access (IS4OA). It was launched in 2003 with 300 open access journals. The miss ...
, BASE,
OpenAIRE The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europe ...
, Altmetric, and
Datacite DataCite is an international not-for-profit organization which aims to improve ''data citation'' in order to: *establish easier access to research data on the Internet *increase acceptance of research data as legitimate, citable contributions to ...
, most of which are not-for-profit".
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of Academic publishing, scholarly literature across an array of publishing formats and disciplines. Released in Beta release, beta in November 2004, th ...
is the first mentioned commercial service, while Scopus, the leading proprietary academic search engine developed by
Elsevier Elsevier ( ) is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell (journal), Cell'', the ScienceDirect collection of electronic journals, ...
, is one of least quoted leading service. Open science infrastructure are then part of an emerging "truly interoperable Open Science commons" that hold the premise of "researcher-centric, low-cost, innovative, and interoperable tools for research, superior to the present, largely closed system." Infrastructures are frequently dependent on choices made by external stakeholders, especially scientific publishers: they "do not themselves decide on the openness of content since they are dependent on the policies of content providers". This affects not only the content but also the "user data policies
hat A hat is a Headgear, head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorpor ...
are set by publishers which limits what can be made available". Open Science Infrastructure have strong ties with the
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
movement. 82% of the European infrastructures surveyed by SPARC claim to have partially built open source software and 53% have their entire technological infrastructure in open source.


Governance

Governance has been self-identified as a potential weakness by the European infrastructure surveyed by SPARC. Less than half of the respondents considering that they are at a "mature" stage in this regard and a "good governance" is quoted as the main challenge. Interaction between the communities they aim to support and the other stakeholders and funders is especially complicated: "One specific challenge identified was the tension between serving the needs of the community of users versus prioritising the needs of clients that provide financial support to the OSI". The tension between centralization and diversity largely characterizes Open Science Infrastructure. While historically defined as a "centralized pen Accessproject", Redalyc aims to become a "community-based sustainable infrastructure in Latin America" (Berrecil). The leading European open infrastructures have reported "challenges around ensuring sufficient (and sufficiently diverse) representation" as well as the involvement from some professional communities like researchers and librarians.


Audience

Open Science Infrastructure "target and serve a wide range of stakeholders". Researchers remain the primary target, but libraries, teachers and learners are among the expected audience of more than half of the infrastructure surveyed by Sparc Europe. A majority of European infrastructures "operate at a global scale", with English being the primary language of 82% of the respondents. These infrastructures are also frequently multilingual and integrate a specific national focus: they "provide access to a range of language content of local and international significance". Open Science Infrastructures benefit to diverse disciplines and scientific communities. In 2020, 72% of the European infrastructures surveyed by Sparc Europe claim to support all disciplines. The social sciences and the humanities are the most mentioned disciplines, which is partly attributed to the fact that the survey was "distributed widely by the
OPERAS Opera is a form of Western theatre in which music is a fundamental component and dramatic roles are taken by singers. Such a "work" (the literal translation of the Italian word "opera") is typically a collaboration between a composer and a li ...
network". In 2010, the infrastructures supporting the social sciences and the humanities were much less prevalent and most of the uses cases came from "biosciences,
High Energy Physics Particle physics or high-energy physics is the study of fundamental particles and forces that constitute matter and radiation. The field also studies combinations of elementary particles up to the scale of protons and neutrons, while the stu ...
and other fields of physics, earth and environmental sciences, computer science, astronomy and astrophysics".


Economics

Many Open Science Infrastructure run "at a relatively low cost" as small infrastructures are an important part of the open science ecosystem. In 2020, 21 out of 53 surveyed European infrastructures "report spending less than €50,000". Consequently, more than 75% of surveyed European infrastructures are run by small teams of 5 FTEs or less. The size of the infrastructure and the extent of its funding is far from always proportional to the critical service it offers: "some of the most heavily used services make ends meet with a tiny core team of two to five people." Volunteer contributions are significant as well with is both "a strength and weakness to an OSI's sustainability". The landscape of open science infrastructures is therefore rather close to the ideals of a "decentralised network of small projects" envisioned by theoricians of the scholarly commons. A very large majority of open science infrastructure are non-commercial and collaborations or financial support from the private sector remain very limited. Overall, European infrastructures were financially sustainable in 2020 which contrasts with the situation ten years prior: in 2010, European infrastructures had much less visibility: they usually lacked "a long-term perspective" and struggled "with securing the funding for more than 5 years". In 2020, European infrastructures frequently relies on grants from National funds and from the European Commission. Without theses grants, most of theses actors would "could only remain viable for less than a year". Yet, one quarter of surveyed European infrastructures was not supported by any grants and subventions and used either alternative means of incomes or voluntary contributions. As they can be "difficult to define adequately", open science infrastructures can be overlooked by funding bodies, which "contributes to the challenge of securing funding".


References


Bibliography


Definitions

* ** * *


Report

* * * * * * * * * * * * * * * * *


Book & thesis

* * * * * * * * * * * *


Article

* * * * * * * * * * * * * * * * * * * * * * * * * * * *


Conference

* * * * *


Other resources

* * * * * * * * {{Open navbox Open science Open access (publishing) Data publishing