HOME

TheInfoList



OR:

Data farming is the process of using designed computational experiments to “grow” data, which can then be analyzed using statistical and visualization techniques to obtain insight into complex systems. These methods can be applied to any computational model. Data farming differs from Data mining, as the following metaphors indicate:
Miners seek valuable nuggets of ore buried in the earth, but have no control over what is out there or how hard it is to extract the nuggets from their surroundings. ... Similarly, data miners seek to uncover valuable nuggets of information buried within massive amounts of data. Data-mining techniques use statistical and graphical measures to try to identify interesting correlations or clusters in the data set. Farmers cultivate the land to maximize their yield. They manipulate the environment to their advantage using irrigation, pest control, crop rotation, fertilizer, and more. Small-scale designed experiments let them determine whether these treatments are effective. Similarly, data farmers manipulate simulation models to their advantage, using large-scale designed experimentation to grow data from their models in a manner that easily lets them extract useful information. ...the results can reveal root cause-and-effect relationships between the model input factors and the model responses, in addition to rich graphical and statistical views of these relationships.
A NATO modeling and simulation task group has documented the data farming process in th
Final Report of MSG-088
Here, data farming uses collaborative processes in combining rapid scenario prototyping, simulation modeling, design of experiments, high performance computing, and analysis and visualization in an iterativ


History

The science of
Design of Experiments The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...
(DOE) has been around for over a century, pioneered by
R.A. Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
for agricultural studies. Many of the classic experiment designs can be used in simulation studies. However, computational experiments have far fewer restrictions than do real-world experiments, in terms of costs, number of factors, time required, ability to replicate, ability to automate, etc. Consequently, a framework specifically oriented toward large-scale simulation experiments is warranted. People have been conducting computational experiments for as long as computers have been around. The term “data farming” is more recent, coined in 1998 in conjunction with the Marine Corp'
Project Albert
in which small agent-based distillation models (a type of stochastic simulation) were created to capture specific military challenges. These models were run thousands or millions of times at th
Maui High Performance Computer Center
and other facilities. Project Albert analysts would work with the military subject matter experts to refine the models and interpret the results. Initially, the use of brute-force full factorial (gridded) designs meant that the simulations needed to run very quickly and the studies required
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multid ...
. Even so, only a small number of factors (at a limited number of levels) could be investigated, due to the
curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The ...
. Th
SEED Center for Data Farming
at th
Naval Postgraduate School
also worked closely with Project Albert in model generation, output analysis, and the creation of new experimental designs to better leverage the computing capabilities at Maui and other facilities. Recent breakthroughs in designs specifically developed for data farming can be found in , among others.


Workshops

A series of international data farming workshops have been held since 1998 by th
SEED Center for Data Farming
International Data Farming Workshop 1 occurred in 1991, and since then 16 more workshops have taken place. The workshops have seen a diverse array of representation from participating countries, such as Canada, Singapore, Mexico, Turkey, and the United States.Horne, G., & Schwierz, K. (2008). Data farming around the world overview. Paper presented at the 1442-1447. doi:10.1109/WSC.2008.4736222 The International Data Farming Workshops operate through collaboration between various teams of experts. The most recent workshop held in 2008 saw over 100 teams participating. The teams of data farmers are assigned a specific area of study, such as
robotics Robotics is an interdisciplinarity, interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist human ...
,
homeland security Homeland security is an American national security term for "the national effort to ensure a homeland that is safe, secure, and resilient against terrorism and other hazards where American interests, aspirations, and ways of life can thrive" to ...
, and
disaster relief Emergency management or disaster management is the managerial function charged with creating the framework within which communities reduce vulnerability to hazards and cope with disasters. Emergency management, despite its name, does not actuall ...
. Different forms of data farming are experimented with and utilized by each group, such as the
Pythagoras ABM Pythagoras is a multi-sided agent-based model (ABM) created to support the growth and refinement of Marine Corps Warfighting Laboratory, the U.S. Marine Corps Warfighting Laboratory's's Project Albert. Anything with a behavior can be represented a ...
, the Logistics Battle Command model, and the agent-based sensor effector model (ABSEM).


References


External links


SEED Center for Data Farming
website, with links to numerous papers, applications, designs, and software. * An article on the 27th Data Farming Workshop in Finland i
Defense Media Network from January 2014
* An article on data farming i
Defense News from January 2013
* An article summarizing data farming in th
June 2005 issue of SIGNAL

MITRE Corporation research paper on data farming
{{Data Design of experiments Simulation Cluster computing Data analysis