Llama.cpp

	Llama.cpp llama.cpp is an open source software library that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library. Command-line tools are included with the library, alongside a server with a simple web interface. Background Towards the end of September 2022, Georgi Gerganov started work on the GGML library, a C library implementing tensor algebra. Gerganov developed the library with the intention of strict memory management and multi-threading. The creation of GGML was inspired by Fabrice Bellard's work on LibNC. Before llama.cpp, Gerganov worked on a similar library called whisper.cpp which implemented Whisper, a speech to text model by OpenAI. Development llama.cpp began development in March 2023 by Georgi Gerganov as an implementation of the Llama inference code in pure C/C++ with no dependencies. This improved performance on computers without GPU or other dedicated hardware, which was a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Llama (language Model) Llama (Large Language Model Meta AI, formerly stylized as LLaMA) is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3.3, released in December 2024. Llama models are trained at different parameter sizes, ranging between 1B and 405B. Originally, Llama was only available as a foundation model. Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models. Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis. Unauthorized copies of the first model were shared via BitTorrent. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use. Alongside the release of Llama 3, Meta added virtual assistant features to Facebook and WhatsApp in select regions, and a standalone webs ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	C (programming Language) C (''pronounced like the letter c'') is a General-purpose language, general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems. A successor to the programming language B (programming language), B, C was originally developed at Bell Labs by Ritchie between 1972 and 1973 to construct utilities running on Unix. It was applied to re-implementing the kernel of the Unix operating system. During the 1980s, C gradually gained popularity. It has become one of the measuring programming language popularity, most widely used programming languages, with C compilers avail ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Justine Tunney Justine Alexandra Roberts Tunney (born 1984) is a software developer and a former activist for Occupy Wall Street. Biography Tunney started publishing software in 1998. She built software for other hackers and fiddled with AOL. In 1999, at the age of 14, Tunney used the nickname "Oogle". Around that time, Christopher Neuman registered the domain name oogle.com. In 2012, Google tried to obtain the domain in a UDRP case but did not meet all ICANN requirements for it. Neuman stated that he registered it because he: "intended to collaborate with Tunney". ''Note: Tunney was known as Justin at the time she developed Rampage Toolz.'' In July 2011, Tunney registered the @occupywallst Twitter handle and occupywallst.org domain, which became the main online hub for the Occupy movement. In 2012, Tunney started working for Google as a software engineer. In March 2014, Tunney petitioned the US government on '' We the People'' to hold a referendum asking for support to retire all go ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers. JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. JSON filenames use the extension .json. Any valid JSON file is a valid JavaScript (.js) file, even though it makes no changes to a web page on its own. Douglas Crockford originally specified the JSON format in the early 2000s. He and Chip Morningstar sent the first JSON message in April 2001. Naming and pronunciation The 2017 international standard (ECMA-404 and ISO/IEC 21778:2017) specifies "Pronounced , as in ' Jaso ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Apple Silicon Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture. It is the basis of most new Mac computers as well as iPhone, iPad, iPod Touch, Apple TV, and Apple Watch, and of products such as AirPods, HomePod, HomePod Mini, and AirTag. Apple announced its plan to switch Mac computers from Intel processors to Apple silicon at WWDC 2020 on June 22, 2020. The first Macs built with the Apple M1 processor were unveiled on November 10, 2020. In 2022, the newest Mac models were built with Apple silicon; only older models of the Mac Mini and the Mac Pro still use Intel Core and Xeon processors respectively. Apple fully controls the integration of Apple silicon chips with the company's hardware and software products. Johny Srouji is in charge of Apple's silicon design. Manufacturing of the chips is outsourced to semiconductor contract manufacturers such as Samsung and TSMC. A s ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	X86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode. With 64-bit mode and the new paging mode, it supports vastly larger amounts of virtual memory and physical memory than was possible on its 32-bit predecessors, allowing programs to store larger amounts of data in memory. x86-64 also expands general-purpose registers to 64-bit, and expands the number of them from 8 (some of which had limited or fixed functionality, e.g. for stack management) to 16 (fully general), and provides numerous other enhancements. Floating-point arithmetic is supported via mandatory SSE2-like instructions, and x87/ MMX style registers are generally not used (but still available even in 64-bit mode); instead, a set of 16 vector registers, 128 bits each, is used. (Each register can store one or two double- ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing) and Skylake-X CPUs; this includes the Core-X series (excluding the Core i5-7640X and Core i7-7740X), as well as the new Xeon Scalable Processor Family and Xeon D-2100 Embedded Series. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F (AVX-512 Foundation) is required by all AVX-512 implementations. Besides widening most 256-bit instructions, the extensions introduce various new operations, such as new data conversions, scatter operations, and permutations. The number of AVX registers is increased from 16 to 32, and eight new "mask registers" are added, which allow for variable selection and ble ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	AVX2 Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme. AVX2 (also known as Haswell New Instructions) expands most integer commands to 256 bits and introduces new instructions. They were first supported by Intel with the Haswell processor, which shipped in 2013. AVX-512 expands AVX to 512-bit support using a new EVEX prefix encoding proposed by Intel in July 2013 and first supported by Intel with the Knights Landing co-processor, which shipped in 2016. In conventional processors, AVX-512 was introduced with Skylake server and HEDT processors in 2017. Advanced Vector Extensions AVX uses sixteen YMM registers to perform a sin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Quantization (signal Processing) Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms. The difference between an input value and its quantized value (such as round-off error) is referred to as quantization error. A device or algorithmic function that performs quantization is called a quantizer. An analog-to-digital converter is an example of a quantizer. Example For example, rounding a real number x to the nearest integer value forms a very basic type of quantizer – a ''uniform'' one. A typical (''mid-tread'') uni ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	SYCL SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014. Origin of the name SYCL (pronounced ‘sickle’) is a name and not an acronym. In particular, SYCL developers made clear that the name contains no reference to OpenCL. Purpose SYCL is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency inspired by OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single-source development where C++ template functions can contain both host and device code to construct complex algorithms that use hardware accelerators, and then re-use them throughout their source code on different types of data. While the SYCL standard started a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Vulkan Vulkan is a low- overhead, cross-platform API, open standard for 3D graphics and computing. Vulkan targets high-performance real-time 3D graphics applications, such as video games and interactive media. Vulkan is intended to offer higher performance and more efficient CPU and GPU usage compared to older OpenGL and Direct3D 11 APIs. It provides a considerably lower-level API for the application than the older APIs, making Vulkan comparable to Apple's Metal API and Microsoft's Direct3D 12. In addition to its lower CPU usage, Vulkan is designed to allow developers to better distribute work among multiple CPU cores. Vulkan was first announced by the non-profit Khronos Group at GDC 2015. The Vulkan API was initially referred to as the "next generation OpenGL initiative", or "OpenGL next" by Khronos, but use of those names was discontinued when Vulkan was announced. Vulkan is derived from and built upon components of AMD's Mantle API, which was donated by AMD to Khronos with the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]