In research communities (for example,
earth science
Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four spheres ...
s,
astronomy
Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
,
business
Business is the practice of making one's living or making money by producing or Trade, buying and selling Product (business), products (such as goods and Service (economics), services). It is also "any activity or enterprise entered into for ...
, and
government
A government is the system or group of people governing an organized community, generally a State (polity), state.
In the case of its broad associative definition, government normally consists of legislature, executive (government), execu ...
), subsetting is the process of retrieving just the parts (a
subset
In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
) of large files which are of interest for a specific purpose. This occurs usually in a client—server setting, where the extraction of the parts of interest occurs on the server before the data is sent to the client over a network. The main purpose of subsetting is to save bandwidth on the network and storage space on the client computer.
Subsetting may be favorable for the following reasons:
* restrict or divide the time range
* select
cross sections of data
* select particular kinds of
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
* exclude particular observations
Subsetting within programs
You can subset within statistical software programs to help speed up the process of subsetting if needed. There are many different types of subsetting that can provide challenges with using software programs though.
Some types of subsetting are:
* Atomic Vectors
* Lists
* Matrices and Arrays
* Data Frames
* S3 Objects
* S4 Objects
For example, in the software program R as, there are different types of code to help with each type of subsetting.
References
Information retrieval techniques
{{Statistics-stub