Data Cloud
   HOME

TheInfoList



OR:

A tag cloud (also known as a word cloud or weighted list in visual design) is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.


History

In the language of visual design, a tag cloud (or word cloud) is one kind of "weighted list", as commonly used on geographic maps to represent the relative size of cities in terms of relative typeface size. An early printed example of a weighted list of English keywords was the "subconscious files" in
Douglas Coupland Douglas Coupland (born 30 December 1961) is a Canadian novelist, designer and visual artist. His first novel, the 1991 international bestseller '' Generation X: Tales for an Accelerated Culture'', popularized the terms Generation X and McJob. He ...
's ''
Microserfs ''Microserfs'' is an epistolary novel by Douglas Coupland published by HarperCollins in 1995. It first appeared in short story form as the cover article for the January 1994 issue of ''Wired'' magazine and was subsequently expanded to full nove ...
'' (1995). A German appearance occurred in 1992. The specific visual form and common use of the term "tag cloud" rose to prominence in the first decade of the 21st century as a widespread feature of early
Web 2.0 Web 2.0 (also known as participative (or participatory) web and social web) refers to websites that emphasize user-generated content, ease of use, participatory culture, and interoperability (i.e., compatibility with other products, systems, a ...
websites and blogs, used primarily to visualize the frequency distribution of keyword metadata that describe website content, and as a navigation aid. The first tag clouds on a high-profile website were on the photo sharing site
Flickr Flickr ( ) is an image hosting service, image and Online video platform, video hosting service, as well as an online community, founded in Canada and headquartered in the United States. It was created by Ludicorp in 2004 and was previously a co ...
, created by Flickr co-founder and interaction designer
Stewart Butterfield Daniel Stewart Butterfield (born Dharma Jeremy Butterfield; March 21, 1973) is a Canadian billionaire businessman, best known for co-founding the photo-sharing website Flickr and the team-messaging application Slack. Early life and education In ...
in 2004. That implementation was based on Jim Flanagan's Search Referral Zeitgeist, a visualization of Web site referrers. Tag clouds were also popularized around the same time by
Del.icio.us Delicious (stylized del.icio.us) was a social bookmarking web service for storing, sharing, and discovering web bookmarks. The site was founded by Joshua Schachter and Peter Gadjokov in 2003 and acquired by Yahoo! in 2005. By the end of 2008, ...
and
Technorati Technorati is a search engine and a publisher advertising platform. Technorati launched its ad network in 2008. In 2016, Synacor acquired Technorati for $3 million. The company's core product was previously an Internet search engine for search ...
, among others. Oversaturation of the tag cloud method and ambivalence about its utility as a web-navigation tool led to a decline of usage among these early adopters. Flickr gave a five-word acceptance speech for the 2006 "Best Practices"
Webby Award The Webby Awards (colloquially referred to as the Webbys) are awards for excellence on the Internet presented annually by the International Academy of Digital Arts and Sciences, a judging body composed of over three thousand industry experts a ...
, which simply stated "sorry about the tag clouds." A second generation of software development discovered a wider diversity of uses for tag clouds as a basic visualization method for text data. Several extensions of tag clouds have been proposed in this context.


Types

There are three main types of tag cloud applications in
social software Social software, also known as social apps or social platform includes communications and interactive tools that are often based on the Internet. Communication tools typically handle capturing, storing and presenting communication, usually writt ...
, distinguished by their meaning rather than appearance. In the first type, there is a tag for the frequency of each item, whereas in the second type, there are global tag clouds where the frequencies are aggregated over all items and users. In the third type, the cloud contains categories, with size indicating number of subcategories.


Frequency

In the first type, size represents the number of times that tag has been applied to a single item. This is useful as a means of displaying
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
about an item that has been
democratically Democracy (from , ''dēmos'' 'people' and ''kratos'' 'rule') is a form of government in which political power is vested in the people or the population of a state. Under a minimalist definition of democracy, rulers are elected through competitiv ...
"voted" on and where precise results are not desired. In the second, more commonly used type, size represents the number of items to which a tag has been applied, as a presentation of each tag's
popularity In sociology, popularity is how much a person, idea, place, item or other concept is either liked or accorded status by other people. Liking can be due to reciprocal liking, interpersonal attraction, and similar factors. Social status can be d ...
.


Significance

Instead of frequency, the size can be used to represent the significance of words and word
co-occurrence In linguistics, co-occurrence or cooccurrence is an above-chance frequency of ordered occurrence of two adjacent terms in a text corpus. Co-occurrence in this linguistic sense can be interpreted as an indicator of semantic proximity or an idio ...
s, compared to a background
corpus Corpus (plural ''corpora'') is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of ...
(for example, compared to all the text in Wikipedia). This approach cannot be used standalone, but it relies on comparing the document frequencies to expected distributions.


Categorization

In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category. There are some approaches to construct tag clusters instead of tag clouds, e.g., by applying tag co-occurrences in documents. More generally, the same visual technique can be used to display non-tag data, as in a word cloud or a data cloud. The term keyword cloud is sometimes used as a
search engine marketing Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) primarily through paid advertising. SEM may incorporate search engine op ...
(SEM) term that refers to a group of keywords that are relevant to a specific website. In recent years tag clouds have gained popularity because of their role in
search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of Web traffic, website traffic to a website or a web page from web search engine, search engines. SEO targets unpaid search traffic (usually referred to as ...
of Web pages as well as supporting the user in navigating the content in an information system efficiently. Tag clouds as a navigational tool make the resources of a website more connected, when crawled by a search engine spider, which may improve the site's search engine rank. From a user interface perspective they are often used to summarize search results to support the user in finding content in a particular information system more quickly.


Visual appearance

Tag clouds are typically represented using inline
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
elements. The tags can appear in alphabetical order, in a random order, they can be sorted by weight, and so on. Sometimes, further visual properties are manipulated in addition to font size, such as the font color, intensity, or weight.Lohmann, S., Ziegler, J., Tetzlaff, L
Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration
, T. Gross et al. (Eds.): INTERACT 2009, Part I, LNCS 5726, pp. 392–404, 2009.
Most popular is a rectangular tag arrangement with alphabetical sorting in a sequential line-by-line layout. The decision for an optimal layout should be driven by the expected user goals. Some prefer to cluster the tags semantically so that similar tags will appear near each otherSalonen, J. 2007
Self-organising map based tag clouds – Creating spatially meaningful representations of tagging data
. Proceedings of the 1st OPAALS conference, 26–27 November 2007, Rome, Italy.
or use
embedding In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group (mathematics), group that is a subgroup. When some object X is said to be embedded in another object Y ...
techniques such as tSNE to position words. Edges can be added to emphasize the co-occurrences of tags and visualize interactions.
Heuristics A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...
can be used to reduce the size of the tag cloud whether or not the purpose is to cluster the tags. Tag cloud visual taxonomy is determined by a number of attributes: tag ordering rule (e.g. alphabetically, by importance, by context, randomly, ordered for visual quality), shape of the entire cloud (e.g. rectangular, circle, given map borders), shape of tag bounds (rectangle, or character body), tag rotation (none, free, limited), vertical tag alignment (sticking to typographical baselines, free). A tag cloud on the web must address problems of modeling and controlling aesthetics, constructing a two-dimensional layout of tags, and all these must be done in short time on volatile browser platform. Tags clouds to be used on the web must be in
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
, not graphics, to make them robot-readable, they must be constructed on the client side using the fonts available in the browser, and they must fit in a rectangular box.


Data clouds

A data cloud or cloud data is a data display which uses font size and/or color to indicate numerical values. It is similar to a tag cloud but instead of word count, displays data such as population or
stock market A stock market, equity market, or share market is the aggregation of buyers and sellers of stocks (also called shares), which represent ownership claims on businesses; these may include ''securities'' listed on a public stock exchange a ...
prices.


Text clouds

A text cloud or word cloud is a visualization of word frequency in a given text as a weighted list. The technique has recently been popularly used to visualize the topical content of political speeches.


Collocate clouds

Extending the principles of a text cloud, a collocate cloud provides a more focused view of a document or
corpus Corpus (plural ''corpora'') is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of ...
. Instead of summarising an entire document, the collocate cloud examines the usage of a particular word. The resulting cloud contains the words which are often used in conjunction with the search word. These collocates are formatted to show frequency (as size) as well as collocational strength (as brightness). This provides interactive ways to browse and explore language.


Perception

Tag clouds have been the subjects of investigation in several usability studies. The following summary is based on an overview of research results given by Lohmann et al.: * Tag size: Large tags attract more user attention than small tags (effect influenced by further properties, e.g., number of characters, position, neighboring tags). * Scanning: Users scan rather than read tag clouds. * Centering: Tags in the middle of the cloud attract more user attention than tags near the borders (effect influenced by
layout In general terms, a layout is a structured arrangement of items within certain limits, or a plan for such arrangement. Specifically, layout may refer to: * Page layout, the arrangement of visual elements on a page ** Comprehensive layout (comp), ...
). * Position: The upper left quadrant receives more user attention than the others (Western reading habits). * Exploration: Tag clouds provide suboptimal support when searching for specific tags (if these do not have a very large font size). Felix et al. compared how human reading performance differs from traditional tag clouds that map numeric values to the size of the font and alternative designs that uses for example color or additional shapes like circle and bars. They also compared how different arrangement of the words affects performance. * Use an additional bar or circle instead of the font size increases accuracy when reading the numeric value * However users can find specific word quicker when no additional mark is used * The performance depends on the task, simple tasks like finding a word are highly affected by the design choice, however the effect on tasks like identify the topic of a tag cloud is much smaller.


Creation

In principle, the font size of a tag in a tag cloud is determined by its incidence. For a word cloud of categories like weblogs, frequency, for example, corresponds to the number of weblog entries that are assigned to a category. For smaller frequencies one can specify font sizes directly, from one to whatever the maximum font size. For larger values, a scaling should be made. In a linear normalization, the weight t_i of a descriptor is mapped to a size scale of 1 through ''f'', where t_ and t_ are specifying the range of available weights. :s_i = \left \lceil \frac \right \rceil for t_i > t_; else s_=1 :* s_i: display fontsize :* f_: max. fontsize :* t_i: count :* t_: min. count :* t_: max. count Since the number of indexed items per descriptor is usually distributed according to a
power law In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a relative change in the other quantity proportional to the ...
, for larger ranges of values, a
logarithm In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...
ic representation makes sense. Implementations of tag clouds also include text parsing and filtering out unhelpful tags such as common words, numbers, and punctuation. There are also websites creating artificially or randomly weighted tag clouds, for advertising, or for humorous results.


See also

* Concordance *
Folksonomy Folksonomy is a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tag ...
*
Information visualization Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating Graphics, graphic or visual Representation (arts), representations of a large amount of complex quantitative and qualitative data and i ...
* Keywords * tf-idf


References


External links


Understanding Tag Clouds
– an information design analysis of tag clouds

– software development guide from O'Reilly's ONLamp {{Authority control Web 2.0 neologisms Visualization (graphics)