Streaming Algorithms

	Streaming Algorithms In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically one-pass algorithm, just one. These algorithms are designed to operate with limited memory, generally L (complexity), logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item. As a result of these constraints, streaming algorithms often produce approximate answers based on a summary or "sketch" of the data stream. History Though streaming algorithms had already been studied by Munro and Paterson as early as 1978, as well as Philippe Flajolet and G. Nigel Martin in 1982/83, the field of streaming algorithms was first formalized and popularized in a 1996 paper by Noga Alon, Yossi Matias, and Mario Szegedy. For this paper, the authors later won the Gödel Prize in 2005 "for their foundational contribution to streaming ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Computer Science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, applied disciplines (including the design and implementation of Computer architecture, hardware and Software engineering, software). Algorithms and data structures are central to computer science. The theory of computation concerns abstract models of computation and general classes of computational problem, problems that can be solved using them. The fields of cryptography and computer security involve studying the means for secure communication and preventing security vulnerabilities. Computer graphics (computer science), Computer graphics and computational geometry address the generation of images. Programming language theory considers different ways to describe computational processes, and database theory concerns the management of re ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Elephant Flow In computer networking, an elephant flow is an extremely large (in total bytes) continuous flow set up by a TCP (or other protocol) flow measured over a network link. Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time. It is not clear who coined ''elephant flow'' but the term began occurring in published Internet network research in 2001 when the observations were made that a small number of flows carry the majority of Internet traffic and the remainder consists of a large number of flows that carry very little Internet traffic ( mice flows). For example, researchers Mori et al. studied the traffic flows on several Japanese universities and research networks. At the WIDE network they found elephant flows were only 4.7% of all flows but occupied 41.3% of all data transmitted during the time period. The actual impact of elephant flows on Internet traffic is still an area of research and debate. Some research shows ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Daniel Kane (mathematician) Daniel Mertz Kane (born 1986) is an American mathematician. He is a full professor with a joint position in the Mathematics Department and the Computer Science and Engineering Department at the University of California, San Diego.. Early life and education Kane was born in Madison, Wisconsin, to Janet E. Mertz and Jonathan M. Kane, professors of oncology and of mathematics and computer science, respectively... The article is primarily about a study jointly authored by Kane's parents, but also mentions Kane's IMO results. He attended Wingra School, a small alternative K-8 school in Madison that focuses on self-guided education. By 3rd grade, he had mastered K through 9th-grade mathematics. Starting at age 13, he took honors math courses at the University of Wisconsin–Madison and did research under the mentorship of Ken Ono while dual enrolled at Madison West High School. He earned gold medals in the 2002 and 2003 International Mathematical Olympiads. Prior to his 17th birth ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Moving Average In statistics, a moving average (rolling average or running average or moving mean or rolling mean) is a calculation to analyze data points by creating a series of averages of different selections of the full data set. Variations include: #Simple moving average, simple, #Cumulative moving average, cumulative, or #Weighted moving average, weighted forms. Mathematically, a moving average is a type of convolution. Thus in signal processing it is viewed as a low-pass filter, low-pass finite impulse response filter. Because the boxcar function outlines its filter coefficients, it is called a boxcar filter. It is sometimes followed by Downsampling (signal processing), downsampling. Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by "shifting forward"; that is, excluding the first number of the series and including the next value in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Misra–Gries Summary In the field of streaming algorithms, Misra–Gries summaries are used to solve the Streaming algorithm#Frequent elements, frequent elements problem in the Streaming algorithm#Data stream model, data stream model. That is, given a long stream of input that can only be examined once (and in some arbitrary order), the Misra-Gries algorithm can be used to compute which (if any) value makes up a majority of the stream, or more generally, the set of items that constitute some fixed fraction of the stream. The term "summary" is due to Graham Cormode. The algorithm was presented by Misra and Gries alongside a different algorithm for finding frequent elements, the Misra–Gries heavy hitters algorithm. The Misra–Gries summary As for all algorithms in the data stream model, the input is a finite sequence of integers from a finite domain. The algorithm outputs an associative array which has values from the stream as keys, and estimates of their frequency as the corresponding values. I ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Misra–Gries Heavy Hitters Algorithm Misra and Gries defined the ''heavy-hitters problem'' (though they did not introduce the term ''heavy-hitters'') and described the first algorithm for it in the paper ''Finding repeated elements''. Their algorithm extends the Boyer-Moore majority finding algorithm in a significant way. One version of the heavy-hitters problem is as follows: Given is a bag of elements and an integer . Find the values that occur more than times in . The Misra-Gries algorithm solves the problem by making two passes over the values in , while storing at most values from and their number of occurrences during the course of the algorithm. Misra-Gries is one of the earliest streaming algorithms, and it is described below in those terms in section #Summaries. Misra–Gries algorithm A bag is like a set in which the same value may occur multiple times. Assume that a bag is available as an array of elements. In the abstract description of the algorithm, we treat and its segments also as bags. H ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Bloom Filter In computing, a Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives. Bloom proposed the technique for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error- ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lossy Count Algorithm The lossy count algorithm is an algorithm to identify elements in a data stream whose frequency exceeds a user-given threshold. The algorithm works by dividing the data stream into buckets for frequent items, but fill as many buckets as possible in main memory one time. The frequency computed by this algorithm is not always accurate, but has an error threshold that can be specified by the user. The run time and space required by the algorithm is inversely proportional to the specified error threshold; hence the larger the error, the smaller the footprint. The algorithm was created by computer scientists Rajeev Motwani and Gurmeet Singh Manku. It finds applications in computations where data takes the form of a continuous data stream instead of a finite data set, such as network traffic measurements, web server logs, and clickstream A click path or clickstream is the sequence of hyperlinks one or more website visitors follows on a given site, presented in the order viewed. A visito ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Boyer–Moore Majority Vote Algorithm The Boyer–Moore majority vote algorithm is an algorithm for finding the majority of a sequence of elements using linear time and a constant number of words of memory. It is named after Robert S. Boyer and J Strother Moore, who published it in 1981,. Originally published as a technical report in 1981. and is a prototypical example of a streaming algorithm. In its simplest form, the algorithm finds a majority element, if there is one: that is, an element that occurs repeatedly for more than half of the elements of the input. A version of the algorithm that makes a second pass through the data can be used to verify that the element found in the first pass really is a majority. If a second pass is not performed and there is no majority, the algorithm will not detect that no majority exists. In the case that no strict majority exists, the returned element can be arbitrary; it is not guaranteed to be the element that occurs most often (the mode of the sequence). It is not possible fo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Set (mathematics) In mathematics, a set is a collection of different things; the things are '' elements'' or ''members'' of the set and are typically mathematical objects: numbers, symbols, points in space, lines, other geometric shapes, variables, or other sets. A set may be finite or infinite. There is a unique set with no elements, called the empty set; a set with a single element is a singleton. Sets are ubiquitous in modern mathematics. Indeed, set theory, more specifically Zermelo–Fraenkel set theory, has been the standard way to provide rigorous foundations for all branches of mathematics since the first half of the 20th century. Context Before the end of the 19th century, sets were not studied specifically, and were not clearly distinguished from sequences. Most mathematicians considered infinity as potentialmeaning that it is the result of an endless processand were reluctant to consider infinite sets, that is sets whose number of members is not a natural number. Specific ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hash Function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a hash function are called ''hash values'', ''hash codes'', (''hash/message'') ''digests'', or simply ''hashes''. The values are usually used to index a fixed-size table called a ''hash table''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter-storage addressing''. Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally- and storage-space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and s ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]