MAC Address Anonymization
   HOME

TheInfoList



OR:

MAC address anonymization performs a
one-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, s ...
on a
MAC address A MAC address (short for medium access control address or media access control address) is a unique identifier assigned to a network interface controller (NIC) for use as a network address in communications within a network segment. This use i ...
so that the result may be used in
tracking system A tracking system or locating system is used for Surveillance, tracking persons or objects that do not stay in a fixed location, and supplying a time-ordered sequence of positions (track). Applications A myriad of tracking systems exist. ...
s for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
and CrowdVisionwhich track users' movements via their computer hardwareto simultaneously protect the identities of the people they are tracking while tracking the hardware itself.


Flawed approaches


Simple hashing

An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66, the
MD5 The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as Request for Comments, RFC 1321. MD5 ...
hash algorithm produces eb341820cd3a3485461a61b1e97d31b1 (32 hexadecimal digits). An address only one character different (11:22:33:44:55:67) produces 391907146439938c9821856fa181052e, an entirely different hash due to the
avalanche effect In cryptography, the avalanche effect is the desirable property of cryptographic algorithms, typically block ciphers and cryptographic hash functions, wherein if an input is changed slightly (for example, flipping a single bit), the output changes ...
. The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using
rainbow table A rainbow table is a precomputed table for caching the outputs of a cryptographic hash function, usually for cracking password hashes. Passwords are typically stored not in plain text form, but as hash values. If such a database of hashed passw ...
compression, the index can be made small enough to be portable. Building the index is an
embarrassingly parallel In parallel computing, an embarrassingly parallel workload or problem (also called embarrassingly parallelizable, perfectly parallel, delightfully parallel or pleasingly parallel) is one where little or no effort is needed to split the problem into ...
problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily. For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form. In 2018, academics found that with modern computing equipment with the ability to calculate 6 billion MD5 hashes and 844 million SHA-256 hashes per second the authors are able to recover 100% of 1 million hashes in: * 4 minutes 1 second for
MD5 The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as Request for Comments, RFC 1321. MD5 ...
hashes, and; * 13 minutes 22 seconds for
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
respectively.


Truncation

Another approach that has been tested is truncating the MAC Address by removing the
organizationally unique identifier An organizationally unique identifier (OUI) is a 24-bit number that uniquely identifies a vendor, manufacturer, or other organization. OUIs are purchased from the Institute of Electrical and Electronics Engineers (IEEE) Registration Authority ...
(the first 24 bits of the 48 bit MAC Address). However, as only 0.1% of the total organizationally unique identifier space has been allocated and not all manufacturers fully utilise their allocated MAC Address space, this fails to offer any meaningful privacy benefit. Furthermore, manufacturers will frequently assign contiguous address blocks to specific devices allows for fine-grained mapping of the devices in useallowing the device type to be identified with only a small part of the MAC Address.


Ali and Dyo approach

Due to the pitfalls of existing approaches, more robust anonymization approaches have been developed by academics. In particular, Junade Ali and Vladimir Dyo developed an approach which works by: # Using computationally expensive hash functions like
Bcrypt bcrypt is a password-hashing function designed by Niels Provos and David Mazières. It is based on the Blowfish (cipher), Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt (cryptography), salt to protect against rain ...
to prevent background knowledge attacks # Truncating the resulting hash to achieve
K-anonymity ''k''-anonymity is a property possessed by certain anonymized data. The term ''k''-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalen ...
The degree to which a resulting hash is truncated is a balancing act between the privacy offered and the desired collision rate (the probability that one anonymised MAC Address will overlap with another). Previous work has suggested that it is therefore difficult to control the anonymity set size when using approximations of the
birthday paradox In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share the same birthday. The birthday paradox is the counterintuitive fact that only 23 people are needed for that ...
. Instead, Ali and Dyo use the overall rate of collision in the dataset and provide that the probability of there being a collision ''p'' can be calculated by p = 1-(1-1/n)^ where there are ''m'' MAC Addresses and ''n'' possible hash digests. Therefore "for digests of 24 bits it is possible to store up to 168,617 MAC addresses with the rate of collisions less than 1%".


References

{{Reflist Internet geolocation GSM standard Mobile technology Wireless locating Crime prevention Criminal investigation Espionage techniques Mobile telecommunication services Privacy