A bibliogram is a graphical representation of the frequency of certain target words, usually noun phrases, in a given text. The term was introduced in 2005 by Howard D. White to name the linguistic object studied, but not previously named, in informetrics,

scientometrics Scientometrics is a subfield of informetrics that studies quantitative aspects of scholarly literature. Major research issues include the measurement of the impact of research papers and academic journals, the understanding of scientific citati ...

and

bibliometrics Bibliometrics is the application of statistical methods to the study of bibliographic data, especially in scientific and library and information science contexts, and is closely associated with scientometrics (the analysis of scientific metri ...

. The noun phrases in the

ranking A ranking is a relationship between a set of items, often recorded in a list, such that, for any two items, the first is either "ranked higher than", "ranked lower than", or "ranked equal to" the second. In mathematics, this is known as a weak ...

may be authors, journals, subject headings, or other indexing terms. The "stretches of text” may be a book, a set of related articles, a subject bibliography, a set of Web pages, and so on. Bibliograms are always generated from writings, usually from scholarly or scientific literature.

Definition

A bibliogram is verbal construct made when

noun phrase A noun phrase – or NP or nominal (phrase) – is a phrase that usually has a noun or pronoun as its head, and has the same grammatical functions as a noun. Noun phrases are very common cross-linguistically, and they may be the most frequently ...

s from extended stretches of text are ranked high to low by their

frequency Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...

co-occurrence In linguistics, co-occurrence or cooccurrence is an above-chance frequency of ordered occurrence of two adjacent terms in a text corpus. Co-occurrence in this linguistic sense can be interpreted as an indicator of semantic proximity or an idio ...

with one or more user-supplied seed terms. Each bibliogram has three components: * A seed term that sets a

context In semiotics, linguistics, sociology and anthropology, context refers to those objects or entities which surround a ''focal event'', in these disciplines typically a communicative event, of some kind. Context is "a frame that surrounds the event ...

. * Words that co-occur with the seed across some set of records. * Counts (frequencies) by which co-occurring words can be ordered high to low. As a family of term-frequency distributions, the bibliogram has frequently been written about under descriptions such as: * positive skew distribution * empirical hyperbolic * scale-free (see also

Scale-free network A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction ''P''(''k'') of nodes in the network having ''k'' connections to other nodes goes for large values of ''k'' as : P( ...

) *

power law In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a relative change in the other quantity proportional to the ...

* size frequency distribution * reverse-J It is sometimes called a "core and scatter" distribution. The "core" consists of relatively few top-ranked terms that account for a disproportionately large share of co-occurrences overall. The "scatter” consists of relatively many lower-ranked terms that account for the remaining share of co-occurrences. Usually the top-ranked terms are not tied in frequency, but identical frequencies and tied ranks become more common as the frequencies get smaller. At the bottom of the distribution, a long tail of terms are tied in rank because each co-occurs with the seed term only once. In most cases bibliograms can be described by

s such as Zipf's law and Bradford's law. In this regard, they have long been studied by mathematicians and statisticians in information science. However, these treatments typically ignore the qualitative meanings of the ranked terms themselves, which are often of interest in their own right. For example, the following bibliogram was made with an author's name as seed and shows the descriptors that co-occur with her name in the

ERIC The given name Eric, Erich, Erikk, Erik, Erick, Eirik, or Eiríkur is derived from the Old Norse name ''Eiríkr'' (or ''Eríkr'' in Old East Norse due to monophthongization). The first element, ''ei-'' may be derived from the older Proto-N ...

database. The descriptors are ranked by how many of her articles they were used to index: 6 Creativity 4 Creativity Tests 3 Divergent Thinking 2 Elementary School Mathematics 2 Instruction 2 Mathematics Education 2 Problem Solving 2 Research 2 Time 1 Acceleration 1 Anxiety 1 Beginning Teachers 1 Behavioral Objectives 1 Child Development 1 Classroom Techniques 1 Cognitive Development etc. This author is a researcher in education, and it will be seen that the terms profile her intellectual interests over the years. In general, bibliograms can be used to: * suggest additional terms for search strategies * characterize the work of scholars, scientists, or institutions * show who an author cites over time * show who cites an author over time * show the other authors with whom an author is co-cited over time * show the subjects associated with a journal or an author * show the authors, organizations, or journals associated with a subject * show library classification codes associated with subject headings and vice versa * show the popularity of items in the collections of libraries * model the structure of literatures with title terms, descriptors, author names, journal names Bibliograms can be created with the RANK command on Dialog (other vendors have similar commands), ranking options within

WorldCat WorldCat is a union catalog that itemizes the collections of tens of thousands of institutions (mostly libraries), in many countries, that are current or past members of the OCLC global cooperative. It is operated by OCLC, Inc. Many of the O ...

, HistCite,

Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of Academic publishing, scholarly literature across an array of publishing formats and disciplines. Released in Beta release, beta in November 2004, th ...

, and inexpensive content analysis software. White suggests that bibliograms have a parallel construct in what he calls ''associograms''. These are the rank-ordered lists of word association norms studied in

psycholinguistics Psycholinguistics or psychology of language is the study of the interrelation between linguistic factors and psychological aspects. The discipline is mainly concerned with the mechanisms by which language is processed and represented in the mind ...

. They are similar to bibliograms in statistical structure but are not generated from writings. Rather, they are generated by presenting panels of people with a stimulus term (which functions like a seed term) and tabulating the words they associate with the seed by frequency of co-occurrence. They are currently of interest to information scientists as a nonstandard way of creating thesauri for document retrieval.

Examples

Other examples of bibliograms are the ordered set of an author's co-authors or the list of authors that are published in a specific journal together with their number of articles. A popular example is the list of additional titles to consider for purchase that you get when you search an item in

Amazon Amazon most often refers to: * Amazon River, in South America * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon (company), an American multinational technology company * Amazons, a tribe of female warriors in Greek myth ...

. These suggested titles are the top terms in the "core" of a bibliogram formed with your search term as seed. The frequencies are counts of the times they have been co-purchased with the seed. Examples of associagrams may be found in th
Edinburgh Associative Thesaurus

Other methods

Similar but different methods are used in

data clustering Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some specific sense defined by the analyst) to each o ...

and

data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...

. Google Sets also created a list of associated terms for a given set of terms.

References

* Howard D. White (2005): ''On Extending Informetrics: An Opinion Paper''. In: Proceedings of the 10th International Congress of the International Society for Scientometrics and Informetrics. Stockholm p. 442-449 Bibliometrics

Definition

Examples

Other methods

See also

References