Artificial Intelligence In Wikimedia Projects
   HOME

TheInfoList



OR:

Artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
is used in
Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
and other
Wikimedia projects The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
for the purpose of developing those projects. Human and
bot Bot or BOT may refer to: Sciences Computing and technology * Chatbot, a computer program that converses in natural language * Internet bot, a software application that runs automated tasks (scripts) over the Internet **Spambot, an internet bot ...
interaction in Wikimedia projects is routine and iterative.


Using artificial intelligence for Wikimedia projects

Various projects seek to improve Wikipedia and Wikimedia projects by using artificial intelligence tools.


ORES

The Objective Revision Evaluation Service (ORES) project is an artificial intelligence service for grading the quality of Wikipedia edits. The Wikimedia Foundation presented the ORES project in November 2015.


Wiki bots


Detox

Detox was a project by Google, in collaboration with the Wikimedia Foundation, to research methods that could be used to address users posting unkind comments in Wikimedia community discussions. Among other parts of the Detox project, the Wikimedia Foundation and Jigsaw collaborated to use artificial intelligence for basic research and to develop technical solutions to address the problem. In October 2016 those organizations published "Ex Machina: Personal Attacks Seen at Scale" describing their findings. Various popular media outlets reported on the publication of this paper and described the social context of the research.


Bias reduction

In August 2018, a company called Primer reported attempting to use artificial intelligence to create Wikipedia articles about women as a way to address
gender bias on Wikipedia Gender bias on Wikipedia is the phenomenon that men are more likely than women to be volunteer contributors and article subjects of Wikipedia (although the English Wikipedia has almost 400,000 encyclopedic biographies about women, men have abo ...
.


Generative models


Text

In 2022, the public release of
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
inspired more experimentation with AI and writing Wikipedia articles. A debate was sparked about whether and to what extent such
large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...
s are suitable for such purposes in light of their tendency to generate plausible-sounding misinformation, including fake references; to generate prose that is not encyclopedic in tone; and to reproduce biases. Since 2023, work has been done to draft Wikipedia policy on ChatGPT and similar
large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...
s (LLMs), e.g. at times recommending that users who are unfamiliar with LLMs should avoid using them due to the aforementioned risks, as well as noting the potential for
libel Defamation is a communication that injures a third party's reputation and causes a legally redressable injury. The precise legal definition of defamation varies from country to country. It is not necessarily restricted to making assertions ...
or
copyright infringement Copyright infringement (at times referred to as piracy) is the use of Copyright#Scope, works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the c ...
. Some relevant policies are linked at WikiProject AI Cleanup/Policies.


Other media

A
WikiProject A WikiProject, or Wikiproject, is an affinity group for contributors with shared goals within the Wikimedia movement. WikiProjects are prevalent within the largest wiki, Wikipedia, and exist to varying degrees within Wikimedia project, sibling pr ...
exists for finding and removing AI-generated text and images, called WikiProject AI Cleanup.


Using Wikimedia projects for artificial intelligence

Content in Wikimedia projects is useful as a dataset in advancing artificial intelligence research and applications. For instance, in the development of the Google's
Perspective API Jigsaw LLC (formerly Google Ideas) is a technology incubator created by Google. It formerly operated as an independent subsidiary of Alphabet Inc., but came under Google management in February 2020. Based in New York City, Jigsaw is dedicated ...
that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used. Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training. A 2012 paper reported that more than 1,000 academic articles, including those using artificial intelligence, examine Wikipedia, reuse information from Wikipedia, use technical extensions linked to Wikipedia, or research communication about Wikipedia. A 2017 paper described Wikipedia as the
mother lode Mother lode is a principal vein or zone of gold or silver ore. The term is also used colloquially to refer to the real or imaginary origin of something valuable or in great abundance. Term The term probably came from a literal translation of ...
for human-generated text available for machine learning. A 2016 research project called "One Hundred Year Study on Artificial Intelligence" named Wikipedia as a key early project for understanding the interplay between artificial intelligence applications and human engagement. There is a concern about the lack of attribution to Wikipedia articles in large-language models like ChatGPT. While Wikipedia's licensing policy lets anyone use its texts, including in modified forms, it does have the condition that credit is given, implying that using its contents in answers by AI models without clarifying the sourcing may violate its terms of use.


See also

* ORES Mediawiki page * Wikipedia:Artificial intelligence *
Open-source artificial intelligence Open-source artificial intelligence is an AI system that is freely available to use, study, modify, and share. These attributes extend to each of the system's components, including datasets, code, and model parameters, promoting a collaborative an ...


References


External links

* meta:Artificial intelligence * wikitech:Machine Learning/LiftWing {{Wikimedia Foundation, state=collapsed AI software Wikimedia projects Commercial use of Wikimedia projects