Toloka
   HOME

TheInfoList



OR:

Toloka, based in
Amsterdam Amsterdam ( , ; ; ) is the capital of the Netherlands, capital and Municipalities of the Netherlands, largest city of the Kingdom of the Netherlands. It has a population of 933,680 in June 2024 within the city proper, 1,457,018 in the City Re ...
, is a
crowdsourcing Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digit ...
and
generative AI Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and str ...
services provider. The company helps development of
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
from training to evaluation and provides
generative artificial intelligence Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models Machine learning, learn the underlyin ...
and
large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...
-related services.


History

Toloka was founded in 2014 by Olga Megorskaya, a member of the
board of directors A board of directors is a governing body that supervises the activities of a business, a nonprofit organization, or a government agency. The powers, duties, and responsibilities of a board of directors are determined by government regulatio ...
of
Yandex Yandex LLC ( rus, Яндекс, r=Yandeks, p=ˈjandəks) is a Russian technology company that provides Internet-related products and services including a web browser, search engine, cloud computing, web mapping, online food ordering, streaming ...
, as a
crowdsourcing Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digit ...
and
microtasking Microwork is a series of many small tasks which together comprise a large unified project, and it is completed by many people over the Internet. Microwork is considered the smallest unit of work in a virtual assembly line. It is most often used ...
platform. It was founded primarily for data markup to improve
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the Feasible region, search space of a problem do ...
s As generative AI evolved, the platform adapted to provide expert data labeling to generational AI app producers. In 2024, the company's Russian operations were sold to Russian investors.


Services


Generative AI

In the
generative AI Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and str ...
domain, Toloka provides services such as model fine tuning,
reinforcement learning from human feedback In machine learning, reinforcement learning from human feedback (RLHF) is a technique to AI alignment, align an intelligent agent with human preferences. It involves training a reward model to represent preferences, which can then be used to trai ...
, evaluation, adhoc datasets, which require large volumes of highly skilled experts annotation.


Machine learning

On Toloka, trainers are tasked with identifying the presence or absence of objects in content, as specified by
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s. They also assess chatbot responses within given dialogues for relevance and engagement. Additionally, translation verification tasks involve evaluating the accuracy of
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
s from multiple annotators. For the fine-tuning of
large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...
s (LLMs), experts are required to generate and provide context-based prompts that can be single-turn or multi-turn, serving various domains and purposes.


Natural language processing

In the
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP) domain, Toloka facilitates
optical character recognition Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo ...
and classification,
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subje ...
, named-entity recognition, and search relevance evaluation. It also provides
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, often th ...
and classification of audio data.


Annotators

Toloka mainly works with domain experts, such as
physicist A physicist is a scientist who specializes in the field of physics, which encompasses the interactions of matter and energy at all length and time scales in the physical universe. Physicists generally are interested in the root or ultimate cau ...
s,
scientist A scientist is a person who Scientific method, researches to advance knowledge in an Branches of science, area of the natural sciences. In classical antiquity, there was no real ancient analog of a modern scientist. Instead, philosophers engag ...
s,
lawyer A lawyer is a person who is qualified to offer advice about the law, draft legal documents, or represent individuals in legal matters. The exact nature of a lawyer's work varies depending on the legal jurisdiction and the legal system, as w ...
s, and
software engineer Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining software applications. It involves applying engineering principles and computer programming expertise to develop ...
s, to develop specialized data for models targeting niche tasks. Toloka also works with
freelancer ''Freelance'' (sometimes spelled ''free-lance'' or ''free lance''), ''freelancer'', or ''freelance worker'', are terms commonly used for a person who is self-employed and not necessarily committed to a particular employer long-term. Freelance w ...
s, referred to as "Tolokers," who annotate and create data for diverse applications. They perform tasks such as labeling personally identifiable information for AI projects, translating content, summarizing information, and transcribing audio to text. Upon completion of each task the performer receives a reward based on the volume of
image An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...
s,
video Video is an Electronics, electronic medium for the recording, copying, playback, broadcasting, and display of moving picture, moving image, visual Media (communication), media. Video was first developed for mechanical television systems, whi ...
s, and unstructured text.


Research

In May 2019, Toloka's research team began publishing datasets for non-commercial and academic purposes to support the scientific community and attract researchers to Toloka. Such datasets are addressed to researchers in different directions like
linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
,
computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
, testing of result aggregation models, and
chatbot A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...
training. Toloka research has been showcased at a range of conferences, including the Conference on Neural Information Processing Systems (NeurIPS), the International Conference on Machine Learning (ICML) and the International Conference on Very Large Data Bases (VLDB). In February 2024, Toloka conducted a tutorial at the
AAAI Conference on Artificial Intelligence The AAAI Conference on Artificial Intelligence (AAAI) is a leading international academic conference in artificial intelligence held annually. It ranks 4th in terms of H5 Index in Google Scholar's list of top AI publications, after ICLR, NeurIP ...
, focusing on aligning Large Language Models to Low-Resource Languages. The company participated in BigCode, a joint scientific initiative led by
HuggingFace Hugging Face, Inc. is a French-American company based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing appli ...
and
ServiceNow ServiceNow, Inc. is an American software company based in Santa Clara, California, that supplies a cloud computing platform for the creation and management of automated business workflows. It is used predominantly for the automation of informati ...
, where it served as the primary data partner.


Controversies


Enabling arrests of protesters via facial recognition software (March 2024)

In March 2024, Toloka's Russian division was criticized for helping develop the
facial recognition software A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification ser ...
used by Russia to track and arrest protesters after the death of
Alexei Navalny Alexei Anatolyevich Navalny (, ; 4 June 197616 February 2024) was a Russian Opposition to Vladimir Putin in Russia, opposition leader, anti-corruption in Russia, corruption activist and political prisoner. He founded the Anti-Corruption Found ...
. The company's Russian operations were sold in July 2024.


References


External links

* {{Official, https://toloka.ai/ Companies based in Amsterdam Crowdsourcing Human-based computation Social information processing Web services