machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

and

pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...

, a feature is an individual measurable property or characteristic of a data set. Choosing informative, discriminating, and independent features is crucial to produce effective

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s for

classification Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...

, and regression tasks. Features are usually numeric, but other types such as strings and

graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties * Graph (topology), a topological space resembling a graph in the sense of discre ...

are used in syntactic pattern recognition, after some pre-processing step such as one-hot encoding. The concept of "features" is related to that of

explanatory variable A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...

s used in statistical techniques such as

linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...

Feature types

In feature engineering, two types of features are commonly used: numerical and categorical. Numerical features are continuous values that can be measured on a scale. Examples of numerical features include age, height, weight, and income. Numerical features can be used in machine learning algorithms directly. Categorical features are discrete values that can be grouped into categories. Examples of categorical features include gender, color, and zip code. Categorical features typically need to be converted to numerical features before they can be used in machine learning algorithms. This can be done using a variety of techniques, such as one-hot encoding, label encoding, and ordinal encoding. The type of feature that is used in feature engineering depends on the specific machine learning algorithm that is being used. Some machine learning algorithms, such as decision trees, can handle both numerical and categorical features. Other machine learning algorithms, such as linear regression, can only handle numerical features.

Classification

A numeric feature can be conveniently described by a feature vector. One way to achieve

binary classification Binary classification is the task of classifying the elements of a set into one of two groups (each called ''class''). Typical binary classification problems include: * Medical testing to determine if a patient has a certain disease or not; * Qual ...

is using a

linear predictor function In statistics and in machine learning, a linear predictor function is a linear function (linear combination) of a set of coefficients and explanatory variables (independent variables), whose value is used to predict the outcome of a dependent varia ...

(related to the

perceptron In machine learning, the perceptron is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vect ...

) with a feature vector as input. The method consists of calculating the

scalar product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. Not to be confused wit ...

between the feature vector and a vector of weights, qualifying those observations whose result exceeds a threshold. Algorithms for classification from a feature vector include nearest neighbor classification,

neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

, and statistical techniques such as Bayesian approaches.

Examples

character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a sce ...

, features may include

histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...

s counting the number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection and many others. In

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...

, features for recognizing

phonemes A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...

can include noise ratios, length of sounds, relative power, filter matches and many others. In

spam Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...

detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text. In

computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...

, there are a large number of possible features, such as edges and objects.

Feature vectors

and

, a feature vector is an n-dimensional

vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...

of numerical features that represent some object. Many

s in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms. Feature vectors are equivalent to the vectors of

s used in

statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

procedures such as

. Feature vectors are often combined with weights using a

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...

in order to construct a

that is used to determine a score for making a prediction. The

vector space In mathematics and physics, a vector space (also called a linear space) is a set (mathematics), set whose elements, often called vector (mathematics and physics), ''vectors'', can be added together and multiplied ("scaled") by numbers called sc ...

associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of

dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...

techniques can be employed. Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature 'Age' is useful and is defined as ''Age = 'Year of death' minus 'Year of birth' ''. This process is referred to as feature construction.Liu, H., Motoda H. (1998)
Feature Selection for Knowledge Discovery and Data Mining
'', Kluwer Academic Publishers. Norwell, MA, USA. 1998.Piramuthu, S., Sikora R. T
Iterative feature construction for improving inductive learning algorithms
In Journal of Expert Systems with Applications. Vol. 36 , Iss. 2 (March 2009), pp. 3401-3406, 2009 Feature construction is the application of a set of constructive operators to a set of existing features resulting in construction of new features. Examples of such constructive operators include checking for the equality conditions , the arithmetic operators , the array operators as well as other more sophisticated operators, for example count(S,C)Bloedorn, E., Michalski, R. Data-driven constructive induction: a methodology and its applications. IEEE Intelligent Systems, Special issue on Feature Transformation and Subset Selection, pp. 30-37, March/April, 1998 that counts the number of features in the feature vector S satisfying some condition C or, for example, distances to other recognition classes generalized by some accepting device. Feature construction has long been considered a powerful tool for increasing both accuracy and understanding of structure, particularly in high-dimensional problems.Breiman, L. Friedman, T., Olshen, R., Stone, C. (1984) ''Classification and regression trees'', Wadsworth Applications include studies of disease and

emotion recognition Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Gener ...

from speech.Sidorova, J., Badia T
Syntactic learning for ESEDA.1, tool for enhanced speech emotion detection and analysis
Internet Technology and Secured Transactions Conference 2009 (ICITST-2009), London, November 9–12. IEEE

Selection and extraction

The initial set of raw features can be redundant and large enough that estimation and optimization is made difficult or ineffective. Therefore, a preliminary step in many applications of

and

consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability. Extracting or selecting features is a combination of art and science; developing systems to do so is known as feature engineering. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the

domain expert A subject-matter expert (SME) is a person who has accumulated great knowledge in a particular field or topic and this level of knowledge is demonstrated by the person's degree, licensure, and/or through years of professional experience with the su ...

. Automating this process is

feature learning In machine learning (ML), feature learning or representation learning is a set of techniques that allow a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual fea ...

, where a machine not only uses features for learning, but learns the features itself.

References