Open-source artificial intelligence is an AI system that is freely available to use, study, modify, and share. These attributes extend to each of the system's components, including datasets, code, and model parameters, promoting a collaborative and transparent approach to AI development.

Free and open-source software Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...

(FOSS) licenses, such as the

Apache License The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...

MIT License The MIT License is a permissive software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts very few restrictions on reuse and therefore has high license compatibility. Unl ...

, and

GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...

, outline the terms under which open-source artificial intelligence can be accessed, modified, and redistributed. The open-source model provides widespread access to new AI technologies, allowing individuals and organizations of all sizes to participate in AI research and development. This approach supports collaboration and allows for shared advancements within the field of artificial intelligence. In contrast, closed-source artificial intelligence is proprietary, restricting access to the source code and internal components. Only the owning company or organization can modify or distribute a closed-source artificial intelligence system, prioritizing control and protection of intellectual property over external contributions and transparency. Companies often develop closed products in an attempt to keep a competitive advantage in the marketplace. However, some experts suggest that open-source AI tools may have a development advantage over closed-source products and have the potential to overtake them in the marketplace. Popular open-source artificial intelligence project categories include

large language models A large language model (LLM) is a language model trained with Self-supervised learning, self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially Natural language generation, language g ...

machine translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...

tools, and

chatbots A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...

. For

software developers A programmer, computer programmer or coder is an author of computer source code someone with skill in computer programming. The professional titles Software development, ''software developer'' and Software engineering, ''software engineer' ...

to produce open-source artificial intelligence (AI) resources, they must trust the various other open-source software components they use in its development. Open-source AI software has been speculated to have potentially increased risk compared to closed-source AI as bad actors may remove safety protocols of public models as they wish. Similarly, closed-source AI has also been speculated to have an increased risk compared to open-source AI due to issues of dependence, privacy, opaque algorithms, corporate control and limited availability while potentially slowing beneficial innovation. There also is a debate about the openness of AI systems as openness is differentiated – an article in ''

Nature Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...

'' suggests that some systems presented as open, such as Meta's

Llama 3 Llama (Large Language Model Meta AI, formerly stylized as LLaMA) is a family of large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 4, released in April 2025. Llama models come in different s ...

, "offer little more than an API or the ability to download a model subject to distinctly non-open use restrictions". Such software has been criticized as "

openwashing Openwashing or open washing (a compound word modeled on "whitewash" and derived from "greenwashing") is a term to describe presenting something as open, when it is not actually open. In the context of openwashing, "open" refers to transparency, ac ...

" systems that are better understood as closed. There are some works and frameworks that assess the openness of AI systems as well as a new definition by the

Open Source Initiative The Open Source Initiative (OSI) is a California public benefit corporation "actively involved in Open Source community-building, education, and public advocacy to promote awareness and the importance of non-proprietary software". Governance The ...

about what constitutes open source AI.

History

The history of open-source artificial intelligence (AI) is intertwined with both the development of AI technologies and the growth of the open-source software movement. Open-source AI has evolved significantly over the past few decades, with contributions from various academic institutions, research labs, tech companies, and independent developers. This section explores the major milestones in the development of open-source AI, from its early days to its current state.

Early development of AI and open-source software

The concept of AI dates back to the mid-20th century, when computer scientists like

Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. He was highly influential in the development of theoretical computer ...

and

John McCarthy John McCarthy may refer to: Government * John George MacCarthy (1829–1892), Member of Parliament for Mallow constituency, 1874–1880 * John McCarthy (Irish politician) (1862–1893), Member of Parliament for the Mid Tipperary constituency, ...

laid the groundwork for modern AI theories and algorithms. Early AI research focused on developing symbolic reasoning systems and rule-based expert systems. During this period, the idea of open-source software was beginning to take shape, with pioneers like

Richard Stallman Richard Matthew Stallman ( ; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...

advocating for free software as a means to promote collaboration and innovation in programming. The

Free Software Foundation The Free Software Foundation (FSF) is a 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985. The organisation supports the free software movement, with the organization's preference for software being distributed ...

, founded in 1985 by Stallman, was one of the first major organizations to promote the idea of software that could be freely used, modified, and distributed. The ideas from this movement eventually influenced the development of open-source AI, as more developers began to see the potential benefits of open collaboration in software creation, including AI models and algorithms.

Emergence of open-source AI (1990s-2000s)

In the 1990s, open-source software began to gain more traction as the internet facilitated collaboration across geographical boundaries. The rise of machine learning and statistical methods also led to the development of more practical AI tools. However, it wasn't until the early 2000s that open-source AI began to take off, with the release of foundational libraries and frameworks that were available for anyone to use and contribute to. One of the early open-source AI frameworks was

Scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support ...

, released in 2007. Scikit-learn became one of the most widely used libraries for machine learning due to its ease of use and robust functionality, providing implementations of common algorithms like regression, classification, and clustering. Around the same time, other open-source machine learning libraries such as

OpenCV OpenCV (Open Source Computer Vision Library) is a Library (computing), library of programming functions mainly for Real-time computing, real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez ...

(2000),

Torch A torch is a stick with combustible material at one end which can be used as a light source or to set something on fire. Torches have been used throughout history and are still used in processions, symbolic and religious events, and in juggl ...

(2002), and

Theano In Greek mythology, Theano (; Ancient Greek: Θεανώ) may refer to the following personages: * Theano, wife of Metapontus, king of Icaria. Metapontus demanded that she bear him children, or leave the kingdom. She presented the children of M ...

(2007) were developed by tech companies and research labs, further cementing the growth of open-source AI.

Rise of open-source AI models and frameworks (2010s)

The 2010s marked a significant shift in the development of AI, driven by the advent of deep learning and neural networks. Open-source deep learning frameworks such as

TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

(developed by

Google Brain Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence ...

) and

PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the mo ...

(developed by Facebook's AI Research Lab) revolutionized the AI landscape by making complex deep learning models more accessible. These frameworks allowed researchers and developers to build and train sophisticated neural networks for tasks like image recognition, natural language processing (NLP), and autonomous driving. During this time, AI models like Google's BERT (2018) for natural language processing and OpenAI's GPT series (2018–present) for text generation also became widely available in open-source form. These models demonstrated the potential for AI to revolutionize industries by improving understanding and generation of human language, sparking further interest in open-source AI development.

Key milestones in open-source AI (2020s–Present)

Companies and models

The 2020s saw the continued growth and maturation of open-source AI. Companies and research organizations began to release large-scale pre-trained models to the public, which led to a boom in both commercial and academic applications of AI. Notably,

Hugging Face Hugging Face, Inc. is a French-American company based in List of tech companies in the New York metropolitan area, New York City that develops computation tools for building applications using machine learning. It is most notable for its Transf ...

, a company focused on NLP, became a hub for the development and distribution of state-of-the-art AI models, including open-source versions of transformers like

GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of Generative pre-trained transformer, GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was par ...

and BERT. With the announcement of GPT-2, OpenAI originally planned to keep the source code of their models private citing concerns about malicious applications. After OpenAI faced public backlash, however, it released the source code for GPT-2 to GitHub three months after its release. OpenAI has not publicly released the source code or pretrained weights for the GPT-3 or GPT-4 models, though their functionalities can be integrated by developers through the OpenAI API. The rise of

large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...

s (LLMs) and

generative AI Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and str ...

, such as OpenAI's GPT-3 (2020), further propelled the demand for open-source AI frameworks. These models have been used in a variety of applications, including chatbots, content creation, and code generation, demonstrating the broad capabilities of AI systems. The LF AI & Data Foundation, a project under the

Linux Foundation The Linux Foundation (LF) is a non-profit organization established in 2000 to support Linux development and open-source software projects. Background The Linux Foundation started as Open Source Development Labs in 2000 to standardize and prom ...

, has significantly influenced the open-source AI landscape by fostering collaboration and innovation, and supporting open-source projects. By providing a neutral platform, LF AI & Data unites developers, researchers, and organizations to build cutting-edge AI and data solutions, addressing critical technical challenges and promoting ethical AI development. As of October 2024, the foundation comprised 77 member companies from North America, Europe, and Asia, and hosted 67 open-source software (OSS) projects contributed by a diverse array of organizations, including silicon valley giants such as

Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...

Amazon Amazon most often refers to: * Amazon River, in South America * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon (company), an American multinational technology company * Amazons, a tribe of female warriors in Greek myth ...

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...

, and

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

. Other large conglomerates like

Alibaba Ali Baba is a character from the folk tale "Ali Baba and the Forty Thieves". Alibaba Group is a Chinese multinational internet technology company. Ali Baba or Alibaba may also refer to: Arts and entertainment Films * ''Ali Baba and the Forty T ...

TikTok TikTok, known in mainland China and Hong Kong as Douyin (), is a social media and Short-form content, short-form online video platform owned by Chinese Internet company ByteDance. It hosts user-submitted videos, which may range in duration f ...

AT&T AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...

, and

IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...

have also contributed. Research organizations such as NYU, University of Michigan AI labs, Columbia University, Penn State are also associate members of the LF AI & Data Foundation. In September 2022, the PyTorch Foundation was established to oversee the widely used

deep learning framework, which was donated by Meta. The foundation's mission is to drive the adoption of AI tools by fostering and sustaining an ecosystem of open-source, vendor-neutral projects integrated with PyTorch, and to democratize access to state-of-the-art tools, libraries, and other components, making these innovations accessible to everyone. The PyTorch Foundation also separates business and technical governance, with the PyTorch project maintaining its technical governance structure, while the foundation handles funding, hosting expenses, events, and management of assets such as the project's website, GitHub repository, and social media accounts, ensuring open community governance. Upon its inception, the foundation formed a governing board comprising representatives from its initial members:

AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...

Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...

Google Cloud Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools ...

, IBM, Intel, Meta, Microsoft, and NVIDIA. In 2024, Meta released a collection of large AI models, including

Llama The llama (; or ) (''Lama glama'') is a domesticated South American camelid, widely used as a List of meat animals, meat and pack animal by Inca empire, Andean cultures since the pre-Columbian era. Llamas are social animals and live with ...

3.1 405B, comparable to the most advanced closed-source models. The company claimed its approach to AI would be open-source, differing from other major tech companies. The

and others stated that Llama is not open-source despite Meta describing it as open-source, due to Llama's software license prohibiting it from being used for some purposes.

DeepSeek R1 Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepseek is owned ...

reasoning model released as an open source project on January 20, 2025.

Ethics

In parallel with the development of AI models, there has been growing interest in ensuring ethical standards in open-source AI development. This includes addressing concerns such as bias, privacy, and the potential for misuse of AI systems. As a result, frameworks for responsible AI development and the creation of guidelines for documenting ethical considerations, such as the Model Card concept introduced by Google, have gained popularity, though studies show the continued need for their adoption to avoid unintended negative outcomes.

Applications

Machine learning

Open-source artificial intelligence has brought widespread accessibility to machine learning (ML) tools, enabling developers to implement and experiment with ML models across various industries. Sci-kit Learn, Tensorflow, and PyTorch are three of the most widely used open-source ML libraries, each contributing unique capabilities to the field. Sci-kit Learn is known for its robust toolkit, offering accessible functions for classification, regression, clustering, and dimensionality reduction. This library simplifies the ML pipeline from data preprocessing to model evaluation, making it ideal for users with varying levels of expertise. Tensorflow, initially developed by Google, supports large-scale ML models, especially in production environments requiring scalability, such as healthcare, finance, and retail. PyTorch, favored for its flexibility and ease of use, has been particularly popular in research and academia, supporting everything from basic ML models to advanced deep learning applications, and it is now widely used by the industry, too.

Natural Language Processing

Large language models

Open-source AI has played a crucial role in developing and adopting of Large Language Models (LLMs), transforming text generation and comprehension capabilities. While proprietary models like OpenAI's GPT series have redefined what is possible in applications such as interactive dialogue systems and automated content creation, fully open-source models have also made significant strides. Google's BERT, for instance, is an open-source model widely used for tasks like entity recognition and language translation, establishing itself as a versatile tool in NLP. These open-source LLMs have democratized access to advanced language technologies, enabling developers to create applications such as personalized assistants, legal document analysis, and educational tools without relying on proprietary systems.

Machine Translation

Open-source machine translation models have paved the way for multilingual support in applications across industries. Hugging Face's MarianMT is a prominent example, providing support for a wide range of language pairs, becoming a valuable tool for translation and global communication. Another notable model, OpenNMT, offers a comprehensive toolkit for building high-quality, customized translation models, which are used in both academic research and industries. Alongside these open-source models, open-source datasets such as the WMT (Workshop on Machine Translation) datasets, Europarl Corpus, and OPUS have played a critical role in advancing machine translation technology. These datasets provide diverse, high-quality parallel text corpora that enable developers to train and fine-tune models for specific languages and domains.

Text-to-image models

Computer vision models

Open-source AI has led to considerable advances in the field of computer vision, with libraries such as

(Open Computer Vision Library) playing a pivotal role in the democratization of powerful image processing and recognition capabilities. OpenCV provides a comprehensive set of functions that can support real-time computer vision applications, such as image recognition, motion tracking, and facial detection. Originally developed by

, OpenCV has become one of the most popular libraries for computer vision due to its versatility and extensive community support. The library includes a range of pre-trained models and utilities for handling common tasks, making OpenCV into a valuable resource for both beginners and experts of the field. Beyond OpenCV, other open-source computer vision models like YOLO (You Only Look Once) and Detectron2 offer specialized frameworks for object detection, classification, and segmentation, contributing to advancements in applications like security, autonomous vehicles, and medical imaging. Unlike the previous generations of Computer Vision models, which process image data through convolutional layers, newer generations of computer vision models, referred to as

Vision Transformer A vision transformer (ViT) is a Transformer (machine learning model), transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into Byte pair encoding, tokens), serializes each patch into ...

(ViT), rely on attention mechanisms similar to those found in the area of

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

. ViT models break down an image into smaller patches and apply self-attention to identify which areas of the image are most relevant, effectively capturing long-range dependencies within the data. This shift from convolutional operations to attention mechanisms enables ViT models to achieve state-of-the-art accuracy in image classification and other tasks, pushing the boundaries of computer vision applications.

Robotics

Open-source artificial intelligence has made a notable impact in robotics by providing a flexible, scalable development environment for both academia and industry. The

Robot Operating System Robot Operating System (ROS or ros) is an Open-source software, open-source robotics middleware suite. Although ROS is not an operating system (OS) but a set of software frameworks for robot software software development, development, it provide ...

(ROS) stands out as a leading open-source framework, offering tools, libraries, and standards essential for building robotics applications. ROS simplifies the development process, allowing developers to work across different hardware platforms and robotic architectures. Furthermore,

Gazebo A gazebo is a pavilion structure, sometimes octagonal or Gun turret, turret-shaped, often built in a park, garden, or spacious public area. Some are used on occasions as bandstands. In British English, the word is also used for a tent-like can ...

, an open-source robotic simulation software often paired with ROS, enables developers to test and refine their robotic systems in a virtual environment before real-world deployment.

Healthcare

In the

healthcare industry The healthcare industry (also called the medical industry or health economy) is an aggregation and integration of sectors within the economic system that provides goods and services to treat patients with curative, preventive, rehabilitative, ...

, open-source AI has revolutionized

diagnostics Diagnosis (: diagnoses) is the identification of the nature and cause of a certain phenomenon. Diagnosis is used in a lot of different disciplines, with variations in the use of logic, analytics, and experience, to determine " cause and effect". ...

patient care Health care, or healthcare, is the improvement or maintenance of health via the prevention, diagnosis, treatment, amelioration or cure of disease, illness, injury, and other physical and mental impairments in people. Health care is delivered ...

, and personalized treatment options. Open-source libraries like

Tensorflow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

and

have been applied extensively in medical imaging for tasks such as tumor detection, improving the speed and accuracy of diagnostic processes. Additionally, OpenChem, an open-source library specifically geared toward chemistry and biology applications, enables the development of predictive models for

drug discovery In the fields of medicine, biotechnology, and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or ...

, helping researchers identify potential compounds for treatment. NLP models, adapted for analyzing

electronic health record An electronic health record (EHR) is the systematized collection of electronically stored patient and population health information in a digital format. These records can be shared across different health care settings. Records are shared thro ...

s (EHRs), have also become instrumental in healthcare. By summarizing patient data, detecting patterns, and flagging potential issues, open-source AI has enhanced clinical decision-making and improved patient outcomes, demonstrating the transformative power of AI in medicine.

Military

Open-source AI has become a critical component in military applications, highlighting both its potential and its risks. Meta's Llama models, which have been described as open-source by Meta, were adopted by U.S. defense contractors like

Lockheed Martin The Lockheed Martin Corporation is an American Arms industry, defense and aerospace manufacturer with worldwide interests. It was formed by the merger of Lockheed Corporation with Martin Marietta on March 15, 1995. It is headquartered in North ...

and

Oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...

after unauthorized adaptations by Chinese researchers affiliated with the

People's Liberation Army The People's Liberation Army (PLA) is the military of the Chinese Communist Party (CCP) and the People's Republic of China (PRC). It consists of four Military branch, services—People's Liberation Army Ground Force, Ground Force, People's ...

(PLA) came to light. The

and others have contested Meta's use of the term ''open-source'' to describe Llama, due to Llama's license containing an

acceptable use policy An acceptable use policy (AUP)—also referred to as an acceptable usage policy or, in certain commercial contexts, a fair use policy (FUP)—is a formal set of guidelines established by the administrator, proprietor, or operator of a computer ...

that prohibits use cases including non-U.S. military use. Chinese researchers used an earlier version of Llama to develop tools like ChatBIT, optimized for military intelligence and decision-making, prompting Meta to expand its partnerships with U.S. contractors to ensure the technology could be used strategically for national security. These applications now include logistics, maintenance, and cybersecurity enhancements.

Benefits

The open-source movement has influenced the development of artificial intelligence, enabling the widespread adoption and collaboration that are key to its rapid evolution. By making AI tools freely available, open-source platforms empower individuals, research institutions, and companies to contribute, adapt, and innovate on top of existing technologies.

Democratizing access

Open-source AI democratizes access to cutting-edge tools, lowering entry barriers for individuals and smaller organizations that may lack resources. By making these technologies freely available, open-source AI allows developers to innovate and create AI solutions that might have been otherwise inaccessible due to financial constraints, enabling independent developers and researchers, smaller organizations, and startups to utilize advanced AI models without the financial burden of proprietary software licenses. This affordability encourages innovation in niche or specialized applications, as developers can modify existing models to meet unique needs.

Collaboration and faster advancements

By sharing code, data, and research findings, open-source AI enables collective problem-solving and innovation. Large-scale collaborations, such as those seen in the development of frameworks like TensorFlow and PyTorch, have accelerated advancements in machine learning (ML) and deep learning. The open-source nature of these platforms also facilitates rapid iteration and improvement, as contributors from across the globe can propose modifications and enhancements to existing tools. Beyond enhancements directly within ML and deep learning, this collaboration can lead to faster advancements in the products of AI, as shared knowledge and expertise are pooled together.

Equitable development

The openness of the development process encourages diverse contributions, making it possible for underrepresented groups to shape the future of AI. This inclusivity not only fosters a more equitable development environment but also helps to address biases that might otherwise be overlooked by larger, profit-driven corporations. With contributions from a broad spectrum of perspectives, open-source AI has the potential to create more fair, accountable, and impactful technologies that better serve global communities.

Transparency and obscurity

One key benefit of open-source AI is the increased transparency it offers compared to closed-source alternatives. With open-source models, the underlying algorithms and code are accessible for inspection, which promotes accountability and helps developers understand how a model reaches its conclusions. Additionally, open-weight models, such as Llama and

Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...

, allow developers to directly access model parameters, potentially facilitating the reduced bias and increased fairness in their applications. This transparency can help create systems with human-readable outputs, or "explainable AI", which is a growingly key concern, especially in high-stakes applications such as healthcare, criminal justice, and finance, where the consequences of decisions made by AI systems can be significant (though may also pose certain risks, as mentioned in the ''Concerns'' section).

Privacy and independence

A ''

'' editorial suggests medical care could become dependent on AI models that could be taken down at any time, are difficult to evaluate, and may threaten patient privacy. Its authors propose that health-care institutions, academic researchers, clinicians, patients and technology companies worldwide should collaborate to build open-source models for health care of which the underlying code and base models are easily accessible and can be fine-tuned freely with own data sets.

Concerns

In parallel with its benefits, open-source AI brings with it important ethical and social implications, as well as quality and security concerns.

Quality and security

Open-sourced development of AI has been criticized by researchers for additional quality and security concerns beyond general concerns regarding

AI safety AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to ensure AI systems are mor ...

. Current open-source models underperform closed-source models on most tasks, but open-source models are improving faster to close the gap. Open-source development of models has been deemed to have theoretical risks. Once a model is public, it cannot be rolled back or updated if serious security issues are detected. For example, Open-source AI may allow

bioterrorism Bioterrorism is terrorism involving the intentional release or dissemination of biological agents. These agents include bacteria, viruses, insects, fungi, and/or their toxins, and may be in a naturally occurring or a human-modified form, in mu ...

groups like

Aum Shinrikyo , better known by their former name , is a Japanese new religions, Japanese new religious movement and doomsday cult founded by Shoko Asahara in 1987. It carried out the deadly Tokyo subway sarin attack in 1995 and was found to have been respo ...

to remove fine-tuning and other safeguards of AI models to get AI to help develop more devastating terrorist schemes. The main barrier to developing real-world terrorist schemes lies in stringent restrictions on necessary materials and equipment. Furthermore, the rapid pace of AI advancement makes it less appealing to use older models, which are more vulnerable to attacks but also less capable. In July 2024, the

United States The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 U.S. state, states and a federal capital district, Washington, D.C. The 48 ...

released a presidential report saying it did not find sufficient evidence to restrict revealing model weights.

Equity, social, and ethical implications

There have been numerous cases of artificial intelligence leading to unintentionally biased products. Some notable examples include AI software predicting higher risk of future crime and recidivism for African-Americans when compared to white individuals, voice recognition models performing worse for non-native speakers, and facial-recognition models performing worse for women and darker-skinned individuals. Researchers have also criticized open-source artificial intelligence for existing security and ethical concerns. An analysis of over 100,000 open-source models on

and

GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...

using code vulnerability scanners like Bandit, FlawFinder, and Semgrep found that over 30% of models have high-severity vulnerabilities. Furthermore, closed models typically have fewer safety risks than open-sourced models. The freedom to augment open-source models has led to developers releasing models without ethical guidelines, such as GPT4-Chan. With AI systems increasingly employed into critical frameworks of society such as law enforcement and healthcare, there is a growing focus on preventing biased and unethical outcomes through guidelines, development frameworks, and regulations. Open-source AI has the potential to both exacerbate and mitigate bias, fairness, and equity, depending on its use.

Improving AI models

While AI suffers from a lack of centralized guidelines for ethical development, frameworks for addressing the concerns regarding AI systems are emerging. These frameworks, often products of independent studies and interdisciplinary collaborations, are frequently adapted and shared across platforms like GitHub and Hugging Face to encourage community-driven enhancements.

Common development malpractices

Data quality

There are numerous systemic problems that may contribute to inequitable and biased AI outcomes, stemming from causes such as biased data, flaws in model creation, and failing to recognize or plan for the possibility of these outcomes. As highlighted in research, poor data quality—such as the underrepresentation of specific demographic groups in datasets—and biases introduced during data curation lead to skewed model outputs. A study of open-source AI projects revealed a failure to scrutinize for data quality, with less than 28% of projects including data quality concerns in their documentation. This study also showed a broader concern that developers do not place enough emphasis on the ethical implications of their models, and even when developers do take ethical implications into consideration, these considerations overemphasize certain metrics (behavior of models) and overlook others (data quality and risk-mitigation steps). These issues are compounded by AI documentation practices, which often lack actionable guidance and only briefly outline ethical risks without providing concrete solutions.

Transparency and "black boxes"

Another key flaw notable in many of the systems shown to have biased outcomes is their lack of transparency. Many open-source AI models operate as "black boxes", where their decision-making process is not easily understood, even by their creators. This lack of interpretability can hinder accountability, making it difficult to identify why a model made a particular decision or to ensure it operates fairly across diverse groups. Furthermore, when AI models are closed-source (proprietary), this can facilitate biased systems slipping through the cracks, as was the case for numerous widely adopted facial recognition systems. These hidden biases can persist when those proprietary systems fail to publicize anything about the decision process which could help reveal those biases, such as confidence intervals for decisions made by AI. Especially for systems like those used in healthcare, being able to see and understand systems' reasoning or getting "an ccurateexplanation" of how an answer was obtained is "crucial for ensuring trust and transparency".

Frameworks for improvement

Efforts to counteract these challenges have resulted in the creation of structured documentation frameworks that guide the ethical development and deployment of AI: * Model Cards: Introduced in a Google research paper, these documents provide transparency about an AI model's intended use, limitations, and performance metrics across different demographics. They serve as a standardized tool to highlight ethical considerations and facilitate informed usage. Though still relatively new, Google believes this framework will play a crucial role in helping increase AI transparency. * Measurement Modeling: This method combines qualitative and quantitative methods through a social sciences lens, providing a framework that helps developers check if an AI system is accurately measuring what it claims to measure. The framework focuses on two key concepts, examining test-retest reliability ("construct reliability") and whether a model measures what it aims to model ("construct validity"). Through these concepts, this model can help developers break down abstract ideas which can't be directly measured (like socioeconomic status) into specific, measurable components while checking for errors or mismatches that could lead to bias. By making these assumptions clear, this framework helps create AI systems that are more fair and reliable. * Datasheets for Datasets: This framework emphasizes documenting the motivation, composition, collection process, and recommended use cases of datasets. By detailing the dataset's lifecycle, datasheets enable users to assess its appropriateness and limitations. * Opening up ChatGPT: tracking openness of instruction-tuned LLMs: A community-driven public resource that evaluates openness of text generation models . * Model Openness Framework: This emerging approach includes principles for transparent AI development, focusing on the accessibility of both models and datasets to enable auditing and accountability. * European Open Source AI Index: This index collects information on model openness, licensing, and EU regulation of generative AI systems and providers. It is a non-profit public resource hosted at

Radboud University Nijmegen Radboud University (abbreviated as RU, , formerly ) is a public university, public research university located in Nijmegen, Netherlands. RU has seven faculties and more than 24,000 students. Established in 1923, Radboud University has consistentl ...

, the

Netherlands , Terminology of the Low Countries, informally Holland, is a country in Northwestern Europe, with Caribbean Netherlands, overseas territories in the Caribbean. It is the largest of the four constituent countries of the Kingdom of the Nether ...

. As AI use grows, increasing AI transparency and reducing model biases has become increasingly emphasized as a concern. These frameworks can help empower developers and stakeholders to identify and mitigate bias, fostering fairness and inclusivity in AI systems. Using these frameworks can help the open-source community create tools that are not only innovative but also equitable and ethical.

References

{{reflist

External links

Is keeping AI closed source safer and better for society than open sourcing AI?
interactive

argument map An argument map or argument diagram is a visual representation of the structure of an argument. An argument map typically includes all the key components of the argument, traditionally called the ''Logical consequence, conclusion'' and the ''prem ...

Kialo Kialo is an online structured debate platform with argument maps in the form of debate trees. It is a collaborative reasoning tool for thoughtful discussion, understanding different points of view, and collaborative decision-making, showing argum ...

(Ocean of AI) AI Community

History

Early development of AI and open-source software

Emergence of open-source AI (1990s-2000s)

Rise of open-source AI models and frameworks (2010s)

Key milestones in open-source AI (2020s–Present)

Companies and models

Ethics

Applications

Machine learning

Natural Language Processing

Large language models

Machine Translation

Text-to-image models

Computer vision models

Robotics

Healthcare

Military

Benefits

Democratizing access

Collaboration and faster advancements

Equitable development

Transparency and obscurity

Privacy and independence

Concerns

Quality and security

Equity, social, and ethical implications

Improving AI models

Common development malpractices

Data quality

Transparency and "black boxes"

Frameworks for improvement

See also

References

External links