Ternary Large Language Model

Ternary Large Language Model
A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a Transformer (deep learning architecture), transformer large language model with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and Perplexity (LLM), perplexity of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit FP16 or BF16) counterparts, this design allows reaching the same artificial intelligence goals with much lower hardware requirements, latency, and training effort. The name comes from a fact that a single Ternary numeral system, trit, a ternary arithmetic equivalent of a bit that can take the values, carries log_2 3 \approx 1.58 bits of information. The 1.58-bit LLM models are also called 1-bit LLMs (the true 1-bit models also exist). BitNet In 2024, Ma et al., ...
[...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]

picture info

Transformer (deep Learning Architecture)
The transformer is a deep learning architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLM) on large (language) datasets. The modern version of the transformer was proposed in the 2017 paper " Attention Is All You Need" by researchers at Google. Transformers were first developed as an improvement ov ...
[...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]