HOME

TheInfoList



OR:

The Hamming weight of a
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
is the number of symbols that are different from the zero-symbol of the
alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
used. It is thus equivalent to the
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
from the all-zero string of the same length. For the most typical case, a given set of
bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
s, this is the number of bits set to 1, or the
digit sum In mathematics, the digit sum of a natural number in a given radix, number base is the sum of all its numerical digit, digits. For example, the digit sum of the decimal number 9045 would be 9 + 0 + 4 + 5 = 18. Definition Let n be a natural number. ...
of the
binary representation A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method for representing numbers that uses only two symbols for the natural numbers: typically "0" (zero) and "1" (one). A ''binary number'' may also ...
of a given number and the ''ℓ''₁ norm of a bit vector. In this binary case, it is also called the population count, popcount, sideways sum, or bit summation.


History and usage

The Hamming weight is named after the American mathematician
Richard Hamming Richard Wesley Hamming (February 11, 1915 – January 7, 1998) was an American mathematician whose work had many implications for computer engineering and telecommunications. His contributions include the Hamming code (which makes use of a Ha ...
, although he did not originate the notion. The Hamming weight of binary numbers was already used in 1899 by James W. L. Glaisher to give a formula for the number of odd binomial coefficients in a single row of
Pascal's triangle In mathematics, Pascal's triangle is an infinite triangular array of the binomial coefficients which play a crucial role in probability theory, combinatorics, and algebra. In much of the Western world, it is named after the French mathematician Bla ...
. Irving S. Reed introduced a concept, equivalent to Hamming weight in the binary case, in 1954. Hamming weight is used in several disciplines including
information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
,
coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and computer data storage, data sto ...
, and
cryptography Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...
. Examples of applications of the Hamming weight include: * In modular exponentiation by squaring, the number of modular multiplications required for an exponent ''e'' is log2 ''e'' + weight(''e''). This is the reason that the public key value ''e'' used in RSA is typically chosen to be a number of low Hamming weight. * The Hamming weight determines path lengths between nodes in Chord distributed hash tables. * IrisCode lookups in biometric databases are typically implemented by calculating the
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
to each stored record. * In
computer chess Computer chess includes both hardware (dedicated computers) and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysi ...
programs using a
bitboard A bitboard is a specialized bit array data structure commonly used in computer systems that play board games, where each bit corresponds to a game board space or piece. This allows parallel bitwise operations to set or query the game state, or d ...
representation, the Hamming weight of a bitboard gives the number of pieces of a given type remaining in the game, or the number of squares of the board controlled by one player's pieces, and is therefore an important contributing term to the value of a position. * Hamming weight can be used to efficiently compute
find first set In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned Word (computer architecture), machine word, designates the index or position of the least significant bit set to one in the word c ...
using the identity ffs(x) = pop(x ^ (x - 1)). This is useful on platforms such as SPARC that have hardware Hamming weight instructions but no hardware find first set instruction. * The Hamming weight operation can be interpreted as a conversion from the
unary numeral system The unary numeral system is the simplest numeral system to represent natural numbers: to represent a number ''N'', a symbol representing 1 is repeated ''N'' times. In the unary system, the number 0 (zero) is represented by the empty string, tha ...
to
binary number A binary number is a number expressed in the Radix, base-2 numeral system or binary numeral system, a method for representing numbers that uses only two symbols for the natural numbers: typically "0" (zero) and "1" (one). A ''binary number'' may ...
s.


Efficient implementation

The population count of a bitstring is often needed in cryptography and other applications. The
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
of two words ''A'' and ''B'' can be calculated as the Hamming weight of ''A'' xor ''B''. The problem of how to implement it efficiently has been widely studied. A single operation for the calculation, or parallel operations on bit vectors are available on some processors. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number a = 0110 1100 1011 1010, these operations can be done: Here, the operations are as in
C programming language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
, so means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here: //types and constants used in the functions below //uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language) const uint64_t m1 = 0x5555555555555555; //binary: 0101... const uint64_t m2 = 0x3333333333333333; //binary: 00110011.. const uint64_t m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ... const uint64_t m8 = 0x00ff00ff00ff00ff; //binary: 8 zeros, 8 ones ... const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ... const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3... //This is a naive implementation, shown for comparison, //and to help in understanding the better functions. //This algorithm uses 24 arithmetic operations (shift, add, and). int popcount64a(uint64_t x) //This uses fewer arithmetic operations than any other known //implementation on machines with slow multiplication. //This algorithm uses 17 arithmetic operations. int popcount64b(uint64_t x) //This uses fewer arithmetic operations than any other known //implementation on machines with fast multiplication. //This algorithm uses 12 arithmetic operations, one of which is a multiply. int popcount64c(uint64_t x) The above implementations have the best worst-case behavior of any known algorithm. However, when a value is expected to have few nonzero bits, it may instead be more efficient to use algorithms that count these bits one at a time. As Wegner described in 1960, the
bitwise AND In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operat ...
of ''x'' with ''x'' − 1 differs from ''x'' only in zeroing out the least significant nonzero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If ''x'' originally had ''n'' bits that were 1, then after only ''n'' iterations of this operation, ''x'' will be reduced to zero. The following implementation is based on this principle. //This is better when most bits in x are 0 //This algorithm works the same for all data sizes. //This algorithm uses 3 arithmetic operations and 1 comparison/branch per "1" bit in x. int popcount64d(uint64_t x) If greater memory usage is allowed, we can calculate the Hamming weight faster than the above methods. With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. If we can store a lookup table of the hamming function of every 16 bit integer, we can do the following to compute the Hamming weight of every 32 bit integer. static uint8_t wordbits 5536= ; //This algorithm uses 3 arithmetic operations and 2 memory reads. int popcount32e(uint32_t x) //Optionally, the wordbits[] table could be filled using this function int popcount32e_init(void) A recursive algorithm is given in Donovan & Kernighan /* The weight of i can differ from the weight of i / 2 only in the least significant bit of i */ int popcount32e_init (void) Muła et al. have shown that a vectorized version of popcount64b can run faster than dedicated instructions (e.g., popcnt on x64 processors).


Minimum weight

In error-correcting coding, the minimum Hamming weight, commonly referred to as the minimum weight ''w''min of a code is the weight of the lowest-weight non-zero code word. The weight ''w'' of a code word is the number of 1s in the word. For example, the word 11001010 has a weight of 4. In a
linear block code In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract defi ...
the minimum weight is also the
minimum Hamming distance In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' requi ...
(''d''min) and defines the error correction capability of the code. If ''w''min = ''n'', then ''d''min = ''n'' and the code will correct up to ''d''min/2 errors.


Language support

Some C compilers provide intrinsic functions that provide bit counting facilities. For example, GCC (since version 3.4 in April 2004) includes a builtin function __builtin_popcount that will use a processor instruction if available or an efficient library implementation otherwise. LLVM-GCC has included this function since version 1.5 in June 2005. In the
C++ Standard Library The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard.ISO/ IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the origina ...
, the bit-array data structure bitset has a count() method that counts the number of bits that are set. In
C++20 C20 or C-20 may refer to: Science and technology * Carbon-20 (C-20 or 20C), an isotope of carbon * C20, the smallest possible fullerene (a carbon molecule) * C20 (engineering), a mix of concrete that has a compressive strength of 20 newtons per squ ...
, a new header was added, containing functions std::popcount and std::has_single_bit, taking arguments of unsigned integer types. In Java, the growable bit-array data structure has a method that counts the number of bits that are set. In addition, there are and functions to count bits in primitive 32-bit and 64-bit integers, respectively. Also, the arbitrary-precision integer class also has a method that counts bits. In Python, the int type has a bit_count() method to count the number of bits set. This functionality was introduced in Python 3.10, released in October 2021. In
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
, the function logcount, given a non-negative integer, returns the number of 1 bits. (For negative integers it returns the number of 0 bits in 2's complement notation.) In either case the integer can be a BIGNUM. Starting in GHC 7.4, the
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
base package has a popCount function available on all types that are instances of the Bits class (available from the Data.Bits module).
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
version of
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
language provides BIT_COUNT() as a standard function. Fortran 2008 has the standard, intrinsic, elemental function popcnt returning the number of nonzero bits within an integer (or integer array). Some programmable scientific pocket calculators feature special commands to calculate the number of set bits, e.g. #B on the HP-16C and WP 43S, #BITS or BITSUM on HP-16C emulators, and nBITS on the WP 34S.
FreePascal Free Pascal Compiler (FPC) is a compiler for the closely related programming-language dialects Pascal and Object Pascal. It is free software released under the GNU General Public License, witexception clausesthat allow static linking against it ...
implements popcnt since version 3.0.


Processor support

* The IBM STRETCH computer in the 1960s calculated the number of set bits as well as the number of leading zeros as a by-product of all logical operations. *
Cray Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed ...
supercomputers early on featured a population count
machine instruction In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonbi ...
, rumoured to have been specifically requested by the U.S. government
National Security Agency The National Security Agency (NSA) is an intelligence agency of the United States Department of Defense, under the authority of the director of national intelligence (DNI). The NSA is responsible for global monitoring, collection, and proces ...
for
cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic se ...
applications. *
Control Data Corporation Control Data Corporation (CDC) was a mainframe and supercomputer company that in the 1960s was one of the nine major U.S. computer companies, which group included IBM, the Burroughs Corporation, and the Digital Equipment Corporation (DEC), the N ...
's (CDC)
6000 6000 may refer to: * 6000 (number) and the 6000s * The last year of the 6th millennium, a century leap year starting on Saturday * The Hebrew Year 6000, in the Gregorian 3rd millennium (from 29 September 2239 to 16 September 2240) * 6000 List, t ...
and Cyber 70/170 series machines included a population count instruction; in
COMPASS A compass is a device that shows the cardinal directions used for navigation and geographic orientation. It commonly consists of a magnetized needle or other element, such as a compass card or compass rose, which can pivot to align itself with No ...
, this instruction was coded as CXi. * The 64-bit SPARC version 9 architecture defines a POPC instruction, but most implementations do not implement it, requiring it be emulated by the operating system. *
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
's model computer MMIX that is going to replace MIX in his book
The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive multi-volume monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. it consists of published volumes 1, 2, 3, 4A, and 4 ...
has an SADD instruction since 1999. SADD a,b,c counts all bits that are 1 in b and 0 in c and writes the result to a. *
Compaq Compaq Computer Corporation was an American information technology, information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced some of the first IBM PC compati ...
's Alpha 21264A, released in 1999, was the first Alpha series CPU design that had the count extension (CIX). *
Analog Devices Analog Devices, Inc. (ADI), also known simply as Analog, is an American multinational corporation, multinational semiconductor company specializing in data conversion, signal processing, and power management technology, headquartered in Wilming ...
'
Blackfin Blackfin is a family of 16-/32-bit microprocessors developed, manufactured and marketed by Analog Devices. The processors have built-in, fixed-point digital signal processor (DSP) functionality performed by 16-bit multiply–accumulates (MA ...
processors feature the ONES instruction to perform a 32-bit population count. *
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
's
Barcelona Barcelona ( ; ; ) is a city on the northeastern coast of Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second-most populous municipality of Spain. With a population of 1.6 million within c ...
architecture introduced the advanced bit manipulation (ABM) ISA introducing the POPCNT instruction as part of the
SSE4a SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper;
extensions in 2007. *
Intel Core Intel Core is a line of multi-core (with the exception of Core Solo and Core 2 Solo) central processing units (CPUs) for midrange, embedded, workstation, high-end and enthusiast computer markets marketed by Intel Corporation. These processors ...
processors introduced a POPCNT instruction with the SSE4.2
instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
extension, first available in a Nehalem-based Core i7 processor, released in November 2008. * The
ARM architecture ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer, RISC instruction set architectures (ISAs) for central processing unit, com ...
introduced the VCNT instruction as part of the Advanced SIMD (
NEON Neon is a chemical element; it has symbol Ne and atomic number 10. It is the second noble gas in the periodic table. Neon is a colorless, odorless, inert monatomic gas under standard conditions, with approximately two-thirds the density of ...
) extensions. * The
RISC-V RISC-V (pronounced "risk-five") is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. The project commenced in 2010 at the University of California, Berkeley. It transfer ...
architecture introduced the CPOP instruction as part of the Bit Manipulation (B) extension.


See also

*
Two's complement Two's complement is the most common method of representing signed (positive, negative, and zero) integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the ''greatest'' value as the ''s ...
* Fan out


References


Further reading

*
Item 169
Population count assembly code for the PDP/6-10.)


External links


Aggregate Magic Algorithms
Optimized population count and other algorithms explained with sample code.

Several algorithms with code for counting bits set.
Necessary and Sufficient
- by Damien Wintour - Has code in C# for various Hamming Weight implementations.
Best algorithm to count the number of set bits in a 32-bit integer?
- Stackoverflow {{DEFAULTSORT:Hamming Weight Coding theory Articles with example C code