Seq2seq

picture info	Seq2seq Seq2seq is a family of machine learning approaches used for natural language processing. Applications include language translation, image captioning, conversational models, speech recognition, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence. History seq2seq is an approach to machine translation (or more generally, Finite-state transducer, sequence transduction) with roots in information theory, where communication is understood as an encode-transmit-decode process, and machine translation can be studied as a special case of communication. This viewpoint was elaborated, for example, in the noisy channel model of machine translation. In practice, seq2seq maps an input sequence into a real-numerical vector by using a neural network (the ''encoder''), and then maps it back to an output sequence using another neural network (the ''decoder''). The idea of encoder-decoder sequence transduction had been developed in the early 20 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
picture info	Recurrent Neural Network Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which process inputs independently, RNNs utilize recurrent connections, where the output of a neuron at one time step is fed back as input to the network at the next time step. This enables RNNs to capture temporal dependencies and patterns within sequences. The fundamental building block of RNNs is the ''recurrent unit'', which maintains a ''hidden state''—a form of memory that is updated at each time step based on the current input and the previous hidden state. This feedback mechanism allows the network to learn from past inputs and incorporate that knowledge into its current processing. RNNs have been successfully applied to tasks such as unsegmented, connected handwriting recognition, speech recognition, natural language processing, and neural ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
picture info	Transformer (deep Learning Architecture) The transformer is a deep learning architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLM) on large (language) datasets. The modern version of the transformer was proposed in the 2017 paper " Attention Is All You Need" by researchers at Google. Transformers were first developed as an improvement ov ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Natural Language Processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Major tasks in natural language processing are speech recognition, text classification, natural-language understanding, natural language understanding, and natural language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Google Neural Machine Translation Google Neural Machine Translation (GNMT) was a neural machine translation (NMT) system developed by Google and introduced in November 2016 that used an artificial neural network to increase fluency and accuracy in Google Translate. The neural network consisted of two main blocks, an encoder and a decoder, both of LSTM architecture with 8 1024-wide layers each and a simple 1-layer 1024-wide feedforward attention mechanism connecting them. The total number of parameters has been variously described as over 160 million, 210 million, 278 million or 380 million. It used WordPiece tokenizer, and beam search decoding strategy. It ran on Tensor Processing Units. By 2020, the system had been replaced by another deep learning system based on a Transformer encoder and an RNN decoder. GNMT improved on the quality of translation by applying an example-based (EBMT) machine translation method in which the system learns from millions of examples of language translation. GNMT's proposed archi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
picture info	Long Short-term Memory Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps (thus "''long'' short-term memory"). The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century. An LSTM unit is typically composed of a cell and three gates: an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and the gates regulate the flow of information into and out of the cell. Forget gates decide what information to discard from the previous state, by mapping the previous state and the current input to a value between 0 and 1. A (rounded) ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Tomáš Mikolov Tomáš Mikolov is a Czech computer scientist working in the field of machine learning. In March 2020, Mikolov became a senior research scientist at the Czech Institute of Informatics, Robotics and Cybernetics. Career Mikolov obtained his PhD in Computer Science from Brno University of Technology for his work on recurrent neural network-based language models. He is the lead author of the 2013 paper that introduced the Word2vec technique in natural language processing and is an author on the FastText architecture. Mikolov came up with the idea to generate text from neural language models in 2007 and his RNNLM toolkit was the first to demonstrate the capability to train language models on large corpora, resulting in large improvements over the state of the art. Prior to joining Facebook in 2014, Mikolov worked as a visiting researcher at Johns Hopkins University, Université de Montréal, Microsoft and Google. He left Facebook at some time in 2019/2020 to join the Czech Institut ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Google Brain Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023. History The Google Brain project began in 2011 as a part-time research collaboration between Google fellow Jeff Dean (computer scientist), Jeff Dean and Google Researcher Greg Corrado. Google Brain started as a Google X project and became so successful that it was graduated back to Google: As ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Ilya Sutskever Ilya Sutskever (; born 8 December 1986) is an Israeli-Canadian computer scientist who specializes in machine learning. He has made several major contributions to the field of deep learning. With Alex Krizhevsky and Geoffrey Hinton, he co-invented AlexNet, a convolutional neural network. Sutskever co-founded and was a former chief scientist at OpenAI. In 2023, he was one of the members of OpenAI's board that ousted Sam Altman from his position as the organization's CEO; Altman was reinstated a week later, and Sutskever stepped down from the board. In June 2024, Sutskever co-founded the company Safe Superintelligence alongside Daniel Gross and Daniel Levy. Early life and education Sutskever was born into a Jewish family in Nizhny Novgorod, Russia (then Gorky, Soviet Union). At the age of 5, he made aliyah with his family and lived in Jerusalem, until he was 16, when his family moved to Canada. Sutskever attended the Open University of Israel from 2000 to 2002. After moving ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
picture info	Word2vec Word2vec is a technique in natural language processing (NLP) for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these representations by modeling text in a large corpus. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. Word2vec was developed by Tomáš Mikolov, Kai Chen, Greg Corrado, Ilya Sutskever and Jeff Dean at Google, and published in 2013. Word2vec represents a word as a high-dimension vector of numbers which capture relationships between words. In particular, words which appear in similar contexts are mapped to vectors which are nearby as measured by cosine similarity. This indicates the level of semantic similarity between the words, so for example the vectors for ''walk'' and ''ran'' are nearby, as are those for "but" and "however", and "Berlin" and "Germany". Approach Word2vec is a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
picture info	Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, and Chris Hughes, its name derives from the face book directories often given to American university students. Membership was initially limited to Harvard students, gradually expanding to other North American universities. Since 2006, Facebook allows everyone to register from 13 years old, except in the case of a handful of nations, where the age requirement is 14 years. , Facebook claimed almost 3.07 billion monthly active users worldwide. , Facebook ranked as the List of most-visited websites, third-most-visited website in the world, with 23% of its traffic coming from the United States. It was the most downloaded mobile app of the 2010s. Facebook can be accessed from devices with Internet connectivit ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]