HOME

TheInfoList



OR:

MAC address anonymization performs a
one-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, sp ...
on a
MAC address A media access control address (MAC address) is a unique identifier assigned to a network interface controller (NIC) for use as a network address in communications within a network segment. This use is common in most IEEE 802 networking tec ...
so that the result may be used in
tracking system A tracking system, also known as a locating system, is used for the observing of persons or objects on the move and supplying a timely ordered sequence of location data for further processing. It is important to be aware of human tracking, fu ...
s for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
,
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
and CrowdVision - which track users movements via computer hardware to simultaneously preserve the identities of the people they are tracking, as well as the hardware itself.


Flawed approaches


Simple hashing

An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1 (32 hexadecimal digits). An address only one character different (11:22:33:44:55:67) produces 391907146439938c9821856fa181052e, an entirely different hash due to the avalanche effect. The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using
rainbow table A rainbow table is an efficient way to store data that has been computed in advance to facilitate cracking passwords. To protect stored passwords from compromise in case of a data breach, organizations avoid storing them directly, instead transfo ...
compression, the index can be made small enough to be portable. Building the index is an
embarrassingly parallel In parallel computing, an embarrassingly parallel workload or problem (also called embarrassingly parallelizable, perfectly parallel, delightfully parallel or pleasingly parallel) is one where little or no effort is needed to separate the problem ...
problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily. For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form. In 2018, academics found that with modern computing equipment with the ability to calculate 6 Giga MD5 hashes and 844 Mega SHA-256 hashes per second the authors are able to recover 100% of 1 million hashes in: * 4 minutes 1 second for MD5 hashes, and; * 13 minutes 22 seconds for
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
respectively.


Truncation

Another approach that has been tested is truncating the MAC Address by removing the Organizationally unique identifier (the first 24 bits of the 48 bit MAC Address). However, as only 0.1% of the total Organizationally unique identifier space has been allocated and not all manufacturers fully utilise their allocated MAC Address space, this fails to offer any meaningful privacy benefit. Furthermore, manufacturers will frequently assign contiguous address blocks to specific devices allows for fine-grained mapping of the devices in use - allowing the device type to be identified with only a small part of the MAC Address.


Ali and Dyo approach

Due to the pitfalls of existing approaches, more robust anonymization approaches have been developed by academics. In particular,
Junade Ali Junade Ali is a British computer scientist known for research in cybersecurity.CEng registration number ''673221''. https://www.engc.org.uk/regcheck Ali studied for a Master of Science degree aged 17 and was awarded Chartered Engineer status b ...
and Vladimir Dyo developed an approach which works by: # Using computationally expensive hash functions like
Bcrypt bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive ...
to prevent background knowledge attacks # Truncating the resulting hash to achieve
K-anonymity ''k''-anonymity is a property possessed by certain anonymized data. The concept of ''k''-anonymity was first introduced by Latanya Sweeney and Pierangela Samarati in a paper published in 1998 as an attempt to solve the problem: "Given person-spe ...
The degree to which a resulting hash is truncated is a balancing act between the privacy offered and the desired collision rate (the probability that one anonymised MAC Address will overlap with another). Previous work has suggested that it is therefore difficult to control the anonymity set size when using approximations of the
Birthday Paradox In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...
. Instead, Ali and Dyo use the overall rate of collision in the dataset and provide that the probability of there being a collision ''p'' can be calculated by p = 1-(1-1/n)^ where there are ''m'' MAC Addresses and ''n'' possible hash digests. Therefore "for digests of 24 bits it is possible to store up to 168,617 MAC addresses with the rate of collisions less than 1%".


References

{{Reflist Internet geolocation GSM standard Mobile technology Wireless locating Crime prevention Criminal investigation Espionage techniques Mobile telecommunication services Privacy