Information Harvesting (IH) was an early

data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...

product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachusetts. Wiggins had a background in

genetic algorithms In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...

and

fuzzy logic Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely ...

. IH sought to infer rules from sets of data. It did this first by classifying various input variables into one of a number of bins, thereby putting some structure on the continuous variables in the input. IH then proceeds to generate rules, trading off generalization against memorization, that will infer the value of the prediction variable, possibly creating many levels of rules in the process. It included strategies for checking if

overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...

took place and, if so, correcting for it. Because of its strategies for correcting for overfitting by considering more data, and refining the rules based on that data, IH might also be considered to be a form of

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

. The advantage of IH, as compared with other data mining products of its time and even later, was that it provided a mechanism for finding multiple rules that would classify the data and determining, according to set criteria, the best rules to use.

References

Data mining and machine learning software {{Comp-sci-stub