Granular computing is an emerging

computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...

paradigm of

information processing In cognitive psychology, information processing is an approach to the goal of understanding human thinking that treats cognition as essentially Computing, computational in nature, with the mind being the ''software'' and the brain being the ''hard ...

that concerns the processing of complex information entities called "information granules", which arise in the process of data abstraction and derivation of knowledge from information or data. Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency, indistinguishability, coherency, or the like. At present, granular computing is more a ''theoretical perspective'' than a coherent set of methods or principles. As a theoretical perspective, it encourages an approach to data that recognizes and exploits the knowledge present in data at various levels of resolution or scales. In this sense, it encompasses all methods which provide flexibility and adaptability in the resolution at which knowledge or information is extracted and represented.

Types of granulation

As mentioned above, ''granular computing'' is not an algorithm or process; there is no particular method that is called "granular computing". It is rather an approach to looking at data that recognizes how different and interesting regularities in the data can appear at different levels of granularity, much as different features become salient in

satellite images Satellite images (also Earth observation imagery, spaceborne photography, or simply satellite photo) are images of Earth collected by imaging satellites operated by governments and businesses around the world. Satellite imaging companies sell i ...

of greater or lesser resolution. On a low-resolution satellite image, for example, one might notice interesting cloud patterns representing

cyclones In meteorology, a cyclone () is a large air mass that rotates around a strong center of low atmospheric pressure, counterclockwise in the Northern Hemisphere and clockwise in the Southern Hemisphere as viewed from above (opposite to an ant ...

or other large-scale weather phenomena, while in a higher-resolution image, one misses these large-scale atmospheric phenomena but instead notices smaller-scale phenomena, such as the interesting pattern that is the streets of

Manhattan Manhattan ( ) is the most densely populated and geographically smallest of the Boroughs of New York City, five boroughs of New York City. Coextensive with New York County, Manhattan is the County statistics of the United States#Smallest, larg ...

. The same is generally true of all data: At different resolutions or granularities, different features and relationships emerge. The aim of granular computing is to try to take advantage of this fact in designing more effective machine-learning and reasoning systems. There are several types of granularity that are often encountered in

data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...

and

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

, and we review them below:

Value granulation (discretization/quantization)

One type of granulation is the quantization of variables. It is very common that in data mining or machine-learning applications the resolution of variables needs to be ''decreased'' in order to extract meaningful regularities. An example of this would be a variable such as "outside temperature" (), which in a given application might be recorded to several decimal places of

precision Precision, precise or precisely may refer to: Arts and media * ''Precision'' (march), the official marching music of the Royal Military College of Canada * "Precision" (song), by Big Sean * ''Precisely'' (sketch), a dramatic sketch by the Eng ...

(depending on the sensing apparatus). However, for purposes of extracting relationships between "outside temperature" and, say, "number of health-club applications" (), it will generally be advantageous to quantize "outside temperature" into a smaller number of intervals.

Motivations

There are several interrelated reasons for granulating variables in this fashion: * Based on prior

domain knowledge Domain knowledge is knowledge of a specific discipline or field in contrast to general (or domain-independent) knowledge. The term is often used in reference to a more general discipline—for example, in describing a software engineer who has ge ...

, there is no expectation that minute variations in temperature (e.g., the difference between ) could have an influence on behaviors driving the number of health-club applications. For this reason, any "regularity" which our learning algorithms might detect at this level of resolution would have to be ''spurious'', as an artifact of overfitting. By coarsening the temperature variable into intervals the difference between which we ''do'' anticipate (based on prior domain knowledge) might influence number of health-club applications, we eliminate the possibility of detecting these spurious patterns. Thus, in this case, reducing resolution is a method of controlling

overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...

. * By reducing the number of intervals in the temperature variable (i.e., increasing its ''grain size''), we increase the amount of sample data indexed by each interval designation. Thus, by coarsening the variable, we increase sample sizes and achieve better statistical estimation. In this sense, increasing granularity provides an antidote to the so-called ''

curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. T ...

'', which relates to the exponential decrease in statistical power with increase in number of dimensions or variable cardinality. *Independent of prior domain knowledge, it is often the case that meaningful regularities (i.e., which can be detected by a given learning methodology, representational language, etc.) may exist at one level of resolution and not at another. Value granulation

For example, a simple learner or pattern recognition system may seek to extract regularities satisfying a

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

threshold such as

p(Y=y_j, X=x_i) \ge \alpha .

In the special case where

\alpha = 1,

this recognition system is essentially detecting ''

logical implication Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure of ...

'' of the form

X=x_i \rightarrow Y=y_j

or, in words, "if

X=x_i,

then The system's ability to recognize such implications (or, in general, conditional probabilities exceeding threshold) is partially contingent on the resolution with which the system analyzes the variables. As an example of this last point, consider the feature space shown to the right. The variables may each be regarded at two different resolutions. Variable

X

may be regarded at a high (quaternary) resolution wherein it takes on the four values

\

or at a lower (binary) resolution wherein it takes on the two values

\.

Similarly, variable

Y

may be regarded at a high (quaternary) resolution or at a lower (binary) resolution, where it takes on the values

\

\,

respectively. At the high resolution, there are no detectable implications of the form

X=x_i \rightarrow Y=y_j,

since every

x_i

is associated with more than one

y_j,

and thus, for all

x_i,

p(Y=y_j, X=x_i) < 1.

However, at the low (binary) variable resolution, two bilateral implications become detectable:

X=X_1 \leftrightarrow Y=Y_1

and

X=X_2 \leftrightarrow Y=Y_2

, since every

X_1

occurs ''iff''

Y_1

and

X_2

occurs ''iff''

Y_2.

Thus, a pattern recognition system scanning for implications of this kind would find them at the binary variable resolution, but would fail to find them at the higher quaternary variable resolution.

Issues and methods

It is not feasible to exhaustively test all possible discretization resolutions on all variables in order to see which combination of resolutions yields interesting or significant results. Instead, the feature space must be preprocessed (often by an

entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...

analysis of some kind) so that some guidance can be given as to how the discretization process should proceed. Moreover, one cannot generally achieve good results by naively analyzing and discretizing each variable independently, since this may obliterate the very interactions that we had hoped to discover. A sample of papers that address the problem of variable discretization in general, and multiple-variable discretization in particular, is as follows: , , , , , , , , , , , , , , , , , , , , .

Variable granulation (clustering/aggregation/transformation)

Variable granulation is a term that could describe a variety of techniques, most of which are aimed at reducing dimensionality, redundancy, and storage requirements. We briefly describe some of the ideas here, and present pointers to the literature.

Variable transformation

A number of classical methods, such as

principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n objects in a set into a configuration of n points mapped into an ...

factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...

, and

structural equation modeling Structural equation modeling (SEM) is a diverse set of methods used by scientists for both observational and experimental research. SEM is used mostly in the social and behavioral science fields, but it is also used in epidemiology, business, ...

, and their relatives, fall under the genus of "variable transformation." Also in this category are more modern areas of study such as

dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...

projection pursuit Projection pursuit (PP) is a type of statistical technique that involves finding the most "interesting" possible projections in multidimensional data. Often, projections that deviate more from a normal distribution are considered to be more intere ...

, and

independent component analysis In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate statistics, multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and ...

. The common goal of these methods in general is to find a representation of the data in terms of new variables, which are a linear or nonlinear transformation of the original variables, and in which important statistical relationships emerge. The resulting variable sets are almost always smaller than the original variable set, and hence these methods can be loosely said to impose a granulation on the feature space. These dimensionality reduction methods are all reviewed in the standard texts, such as , , and .

Variable aggregation

A different class of variable granulation methods derive more from

data clustering Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some specific sense defined by the analyst) to each o ...

methodologies than from the linear systems theory informing the above methods. It was noted fairly early that one may consider "clustering" related variables in just the same way that one considers clustering related data. In data clustering, one identifies a group of similar entities (using a "

measure of similarity In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...

" suitable to the domain — ), and then in some sense ''replaces'' those entities with a prototype of some kind. The prototype may be the simple average of the data in the identified cluster, or some other representative measure. But the key idea is that in subsequent operations, we may be able to use the single prototype for the data cluster (along with perhaps a statistical model describing how exemplars are derived from the prototype) to ''stand in'' for the much larger set of exemplars. These prototypes are generally such as to capture most of the information of interest concerning the entities. Kraskov tree

Similarly, it is reasonable to ask whether a large set of variables might be aggregated into a smaller set of ''prototype'' variables that capture the most salient relationships between the variables. Although variable clustering methods based on

linear correlation In statistics, correlation or dependence is any statistical relationship, whether causality, causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in ...

have been proposed (;), more powerful methods of variable clustering are based on the

mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information conten ...

between variables. Watanabe has shown (;) that for any set of variables one can construct a ''

polytomic An internal node of a phylogenetic tree is described as a polytomy or multifurcation if (i) it is in a rooted tree and is linked to three or more child subtrees or (ii) it is in an unrooted tree and is attached to four or more branches. A tree ...

'' (i.e., n-ary) tree representing a series of variable agglomerations in which the ultimate "total" correlation among the complete variable set is the sum of the "partial" correlations exhibited by each agglomerating subset (see figure). Watanabe suggests that an observer might seek to thus partition a system in such a way as to minimize the interdependence between the parts "... as if they were looking for a natural division or a hidden crack." One practical approach to building such a tree is to successively choose for agglomeration the two variables (either atomic variables or previously agglomerated variables) which have the highest pairwise mutual information . The product of each agglomeration is a new (constructed) variable that reflects the local

joint distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

of the two agglomerating variables, and thus possesses an entropy equal to their

joint entropy In information theory, joint entropy is a measure of the uncertainty associated with a set of variables. Definition The joint Shannon entropy (in bits) of two discrete random variables X and Y with images \mathcal X and \mathcal Y is define ...

. (From a procedural standpoint, this agglomeration step involves replacing two columns in the attribute-value table—representing the two agglomerating variables—with a single column that has a unique value for every unique combination of values in the replaced columns . No information is lost by such an operation; however, if one is exploring the data for inter-variable relationships, it would generally ''not'' be desirable to merge redundant variables in this way, since in such a context it is likely to be precisely the redundancy or ''dependency'' between variables that is of interest; and once redundant variables are merged, their relationship to one another can no longer be studied.

System granulation (aggregation)

database systems In computing, a database is an organized collection of Data (computing), data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, Application software, applications, and ...

, aggregations (see e.g. OLAP aggregation and

Business intelligence Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...

systems) result in transforming original data tables (often called information systems) into the tables with different semantics of rows and columns, wherein the rows correspond to the groups (granules) of original tuples and the columns express aggregated information about original values within each of the groups. Such aggregations are usually based on SQL and its extensions. The resulting granules usually correspond to the groups of original tuples with the same values (or ranges) over some pre-selected original columns. There are also other approaches wherein the groups are defined basing on, e.g., physical adjacency of rows. For example,

Infobright Infobright was founded as a commercial developer of column-oriented relational database software with a focus in machine-generated data. The company was later acquired in 2017 and then spun out in 2018 as an independent research and development ...

implemented a database engine wherein data was partitioned onto ''rough rows'', each consisting of 64K of physically consecutive (or almost consecutive) rows. Rough rows were automatically labeled with compact information about their values on data columns, often involving multi-column and multi-table relationships. It resulted in a higher layer of granulated information where objects corresponded to rough rows and attributes - to various aspects of rough information. Database operations could be efficiently supported within such a new framework, with an access to the original data pieces still available .

Concept granulation (component analysis)

The origins of the ''granular computing'' ideology are to be found in the

rough sets In computer science, a rough set, first described by Polish computer scientist Zdzisław I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the ''lower'' and the ''upper'' approxim ...

and

fuzzy sets Fuzzy or Fuzzies may refer to: Music * Fuzzy (band), a 1990s Boston indie pop band * Fuzzy (composer), Danish composer Jens Vilhelm Pedersen (born 1939) * Fuzzy (album), ''Fuzzy'' (album), 1993 debut album of American rock band Grant Lee Buffalo ...

literatures. One of the key insights of rough set research—although by no means unique to it—is that, in general, the selection of different sets of features or variables will yield different ''concept'' granulations. Here, as in elementary rough set theory, by "concept" we mean a set of entities that are ''indistinguishable'' or ''indiscernible'' to the observer (i.e., a simple concept), or a set of entities that is composed from such simple concepts (i.e., a complex concept). To put it in other words, by projecting a data set ( value-attribute system) onto different sets of variables, we recognize alternative sets of equivalence-class "concepts" in the data, and these different sets of concepts will in general be conducive to the extraction of different relationships and regularities.

Equivalence class granulation

We illustrate with an example. Consider the attribute-value system below: : When the full set of attributes

P = \

is considered, we see that we have the following seven equivalence classes or primitive (simple) concepts: :

\begin 
\ \\ 
\ \\ 
\ \\ 
\ \\
\ \\
\ \\
\ \end

Thus, the two objects within the first equivalence class,

\,

cannot be distinguished from one another based on the available attributes, and the three objects within the second equivalence class,

\,

cannot be distinguished from one another based on the available attributes. The remaining five objects are each discernible from all other objects. Now, let us imagine a projection of the attribute value system onto attribute

P_1

alone, which would represent, for example, the view from an observer which is only capable of detecting this single attribute. Then we obtain the following much coarser equivalence class structure. :

\begin 
\ \\ 
\ \\ 
\ \end

This is in a certain regard the same structure as before, but at a lower degree of resolution (larger grain size). Just as in the case of value granulation (discretization/quantization), it is possible that relationships (dependencies) may emerge at one level of granularity that are not present at another. As an example of this, we can consider the effect of concept granulation on the measure known as ''attribute dependency'' (a simpler relative of the

). To establish this notion of dependency (see also

), let

Q = \

represent a particular concept granulation, where each

Q_i

is an equivalence class from the concept structure induced by attribute set . For example, if the attribute set consists of attribute

P_1

alone, as above, then the concept structure

Q

will be composed of :

\begin
Q_1 &= \, \\
Q_2 &= \, \\
Q_3 &= \.
\end

The dependency of attribute set on another attribute set ,

\gamma_P(Q),

is given by :

\gamma_(Q) =  \frac  \leq 1

That is, for each equivalence class

Q_i

Q,

we add up the size of its "lower approximation" (see

) by the attributes in , i.e.,

Q_i.

More simply, this approximation is the number of objects which on attribute set can be positively identified as belonging to target set

Q_i.

Added across all equivalence classes in

Q,

the numerator above represents the total number of objects which—based on attribute set —can be positively categorized according to the classification induced by attributes . The dependency ratio therefore expresses the proportion (within the entire universe) of such classifiable objects, in a sense capturing the "synchronization" of the two concept structures

Q

and

P.

The dependency

\gamma_(Q)

"can be interpreted as a proportion of such objects in the information system for which it suffices to know the values of attributes in to determine the values of attributes in " (Ziarko & Shan 1995). Having gotten definitions now out of the way, we can make the simple observation that the choice of concept granularity (i.e., choice of attributes) will influence the detected dependencies among attributes. Consider again the attribute value table from above: : Consider the dependency of attribute set

Q = \

on attribute set

P = \.

That is, we wish to know what proportion of objects can be correctly classified into classes of

Q

based on knowledge of

P.

The equivalence classes of

Q

and of

P

are shown below. : The objects that can be ''definitively'' categorized according to concept structure

Q

based on

P

are those in the set

\,

and since there are six of these, the dependency of on ,

\gamma_(Q) = 6/10.

This might be considered an interesting dependency in its own right, but perhaps in a particular data mining application only stronger dependencies are desired. We might then consider the dependency of the smaller attribute set

Q = \

on the attribute set

P = \.

The move from

Q = \

Q = \

induces a coarsening of the class structure

Q,

as will be seen shortly. We wish again to know what proportion of objects can be correctly classified into the (now larger) classes of

Q

based on knowledge of

P.

The equivalence classes of the new

Q

and of

P

are shown below. : Clearly,

Q

has a coarser granularity than it did earlier. The objects that can now be ''definitively'' categorized according to the concept structure

Q

based on

P

constitute the complete universe

\

, and thus the dependency of on ,

\gamma_(Q) = 1.

That is, knowledge of membership according to category set

P

is adequate to determine category membership in

Q

with complete certainty; In this case we might say that

P \rightarrow Q.

Thus, by coarsening the concept structure, we were able to find a stronger (deterministic) dependency. However, we also note that the classes induced in

Q

from the reduction in resolution necessary to obtain this deterministic dependency are now themselves large and few in number; as a result, the dependency we found, while strong, may be less valuable to us than the weaker dependency found earlier under the higher resolution view of

Q.

In general it is not possible to test all sets of attributes to see which induced concept structures yield the strongest dependencies, and this search must be therefore be guided with some intelligence. Papers which discuss this issue, and others relating to intelligent use of granulation, are those by Y.Y. Yao and

Lotfi Zadeh Lotfi Aliasger Zadeh (; ; ; 4 February 1921 – 6 September 2017) was a mathematician, computer scientist, electrical engineer, artificial intelligence researcher, and professor of computer science at the University of California, Berkeley. Zad ...

listed in the #References below.

Component granulation

Another perspective on concept granulation may be obtained from work on parametric models of categories. In

mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observati ...

learning, for example, a set of data is explained as a mixture of distinct

Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...

(or other) distributions. Thus, a large amount of data is "replaced" by a small number of distributions. The choice of the number of these distributions, and their size, can again be viewed as a problem of ''concept granulation''. In general, a better fit to the data is obtained by a larger number of distributions or parameters, but in order to extract meaningful patterns, it is necessary to constrain the number of distributions, thus deliberately ''coarsening'' the concept resolution. Finding the "right" concept resolution is a tricky problem for which many methods have been proposed (e.g., AIC, BIC, MDL, etc.), and these are frequently considered under the rubric of " model regularization".

Different interpretations of granular computing

Granular computing can be conceived as a framework of theories, methodologies, techniques, and tools that make use of information granules in the process of problem solving. In this sense, granular computing is used as an umbrella term to cover topics that have been studied in various fields in isolation. By examining all of these existing studies in light of the unified framework of granular computing and extracting their commonalities, it may be possible to develop a general theory for problem solving. In a more philosophical sense, granular computing can describe a way of thinking that relies on the human ability to perceive the real world under various levels of granularity (i.e., abstraction) in order to abstract and consider only those things that serve a specific interest and to switch among different granularities. By focusing on different levels of granularity, one can obtain different levels of knowledge, as well as a greater understanding of the inherent knowledge structure. Granular computing is thus essential in human problem solving and hence has a very significant impact on the design and implementation of intelligent systems.

References

*. *Bargiela, A. and Pedrycz, W. (2003) ''Granular Computing. An introduction'', Kluwer Academic Publishers *. *. *. *. *. *. *. * *. *. * *. *. *. *. *. *. *. *. *. *. *. * * *. *. *. *. *. * *Yao, Y.Y. (2004) "A Partition Model of Granular Computing", Lecture Notes in Computer Science (to appear) * * * *Zadeh, L.A. (1997) "Toward a Theory of Fuzzy Information Granulation and its Centrality in Human Reasoning and Fuzzy Logic"'', Fuzzy Sets and Systems'', 90:111-127 *. {{refend Theoretical computer science Machine learning