Gene nomenclature is the scientific
naming
Naming is assigning a name to something.
Naming may refer to:
* Naming (parliamentary procedure), a procedure in certain parliamentary bodies
* Naming ceremony, an event at which an infant is named
* Product naming, the discipline of deciding wha ...
of
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s, the units of
heredity
Heredity, also called inheritance or biological inheritance, is the passing on of traits from parents to their offspring; either through asexual reproduction or sexual reproduction, the offspring cells or organisms acquire the genetic infor ...
in living organisms. It is also closely associated with
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
nomenclature, as genes and the proteins they code for usually have similar nomenclature. An international committee published recommendations for genetic symbols and nomenclature in 1957. The need to develop formal guidelines for human gene names and symbols was recognized in the 1960s and full guidelines were issued in 1979 (Edinburgh Human Genome Meeting). Several other
genus
Genus (; : genera ) is a taxonomic rank above species and below family (taxonomy), family as used in the biological classification of extant taxon, living and fossil organisms as well as Virus classification#ICTV classification, viruses. In bino ...
-specific research communities (e.g., ''
Drosophila
''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
'' fruit flies, ''
Mus'' mice) have adopted nomenclature standards as well, and have published them on the relevant
model organism
A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
websites and in scientific journals, including the ''
Trends in Genetics'' Genetic Nomenclature Guide. Scientists familiar with a particular
gene family
A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on ...
may work together to revise the nomenclature for the entire set of genes when new information becomes available.
For many genes and their corresponding proteins, an assortment of alternate names is in use across the scientific literature and public
biological databases
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including geno ...
, posing a challenge to effective organization and exchange of biological information.
Standardization
Standardization (American English) or standardisation (British English) is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organiza ...
of nomenclature thus tries to achieve the benefits of
vocabulary control and
bibliographic control
In library and information science, cataloging ( US) or cataloguing ( UK) is the process of creating metadata representing information resources, such as books, sound recordings, moving images, etc. Cataloging provides information such as aut ...
, although adherence is voluntary. The advent of the
information age
The Information Age is a historical period that began in the mid-20th century. It is characterized by a rapid shift from traditional industries, as established during the Industrial Revolution, to an economy centered on information technology ...
has brought
gene ontology
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ...
, which in some ways is a next step of gene nomenclature, because it aims to unify the representation of gene and gene product attributes across all species.
Relationship with protein nomenclature
Gene nomenclature and protein nomenclature are not separate endeavors; they are aspects of the same whole. Any name or symbol used for a protein can potentially also be used for the gene that encodes it, and vice versa. But owing to the nature of how science has developed (with knowledge being uncovered bit by bit over decades), proteins and their corresponding genes have not always been discovered simultaneously (and not always physiologically understood when discovered), which is the largest reason why protein and gene names do not always match, or why scientists tend to favor one symbol or name for the protein and another for the gene. Another reason is that many of the mechanisms of life are the same or very similar across
species
A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
, genera, orders, and phyla (through
homology, analogy, or some of both), so that a given protein may be produced in many kinds of organisms; and thus scientists naturally often use the same symbol and name for a given protein in one species (for example, mice) as in another species (for example, humans). Regarding the first duality (same symbol and name for gene or protein), the context usually makes the sense clear to scientific readers, and the nomenclatural systems also provide for some specificity by using
italic for a symbol when the gene is meant and plain (roman) for when the protein is meant. Regarding the second duality (a given protein is
endogenous
Endogeny, in biology, refers to the property of originating or developing from within an organism, tissue, or cell.
For example, ''endogenous substances'', and ''endogenous processes'' are those that originate within a living system (e.g. an ...
in many kinds of organisms), the nomenclatural systems also provide for at least human-versus-nonhuman specificity by using different
capitalization
Capitalization ( North American spelling; also British spelling in Oxford) or capitalisation (Commonwealth English; all other meanings) is writing a word with its first letter as a capital letter (uppercase letter) and the remaining letters in ...
, although scientists often ignore this distinction, given that it is often biologically irrelevant.
Also owing to the nature of how scientific knowledge has unfolded, proteins and their corresponding genes often have several names and symbols that are
synonym
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
ous. Some of the earlier ones may be
deprecated
Deprecation is the discouragement of use of something human-made, such as a term, feature, design, or practice. Typically something is deprecated because it is claimed to be inferior compared to other options available.
Something may be deprec ...
in favor of newer ones, although such deprecation is voluntary. Some older names and symbols live on simply because they have been widely used in the
scientific literature
Scientific literature encompasses a vast body of academic papers that spans various disciplines within the natural and social sciences. It primarily consists of academic papers that present original empirical research and theoretical ...
(including before the newer ones were coined) and are well established among users. For example,
mentions of ''HER2'' and ''ERBB2'' are synonymous.
Lastly, the correlation between genes and proteins is not always
one-to-one (in either direction); in some cases it is several-to-one or one-to-several, and the names and symbols may then be gene-specific or protein-specific to some degree, or overlapping in usage:
* Some proteins and
protein complex
A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multidomain enzymes, in which multiple active site, catalytic domains are found in a single polypeptide chain.
...
es are built from the products of several genes (each gene contributing a
polypeptide
Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty ...
subunit), which means that the protein or complex will not have the same name or symbol as any one gene. For example, a particular protein called "example" (symbol "EXAMP") may have 2 chains (subunits), which are encoded by 2 genes named "example alpha chain" and "example beta chain" (symbols ''EXAMPA'' and ''EXAMPB'').
* Some genes encode multiple proteins, because
post-translational modification
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
(PTM) and
alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
provide several paths for
expression. For example,
glucagon
Glucagon is a peptide hormone, produced by alpha cells of the pancreas. It raises the concentration of glucose and fatty acids in the bloodstream and is considered to be the main catabolic hormone of the body. It is also used as a Glucagon (medic ...
and similar polypeptides (such as GLP1 and GLP2) all come (via PTM) from proglucagon, which comes from preproglucagon, which is the polypeptide that the ''GCG'' gene encodes. When one speaks of the various polypeptide products, the names and symbols refer to different things (i.e., preproglucagon, proglucagon, glucagon, GLP1, GLP2), but when one speaks of the gene, all of those names and symbols are aliases for the same gene. Another example is that the various
μ-opioid receptor
The μ-opioid receptors (MOR) are a class of opioid receptors with a high affinity for enkephalins and beta-endorphin, but a low affinity for dynorphins. They are also referred to as μ(''mu'')-opioid peptide (MOP) receptors. The prototypical ...
proteins (e.g., μ
1, μ
2, μ
3) are all splice variants encoded by one gene, ''OPRM1''; this is how one can speak of MORs (μ-opioid receptors) in the plural (proteins) even though there is only one ''MOR'' gene, which may be called ''OPRM1'', ''MOR1'', or ''MOR''—all of those aliases validly refer to it, although one of them (''OPRM1'') is preferred nomenclature.
Species-specific guidelines
The
HUGO Gene Nomenclature Committee
The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human gene, based on a ...
is responsible for providing human gene naming guidelines and approving new, unique human gene names and symbols (short
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s typically created by abbreviating). For some nonhuman species, model organism databases serve as central repositories of guidelines and help resources, including advice from
curators
A curator (from , meaning 'to take care') is a manager or overseer. When working with cultural organizations, a curator is typically a "collections curator" or an "exhibitions curator", and has multifaceted tasks dependent on the particular ins ...
and nomenclature committees. In addition to species-specific databases, approved gene names and symbols for many species can be located in the
National Center for Biotechnology Information's "Entrez Gene" database.
Bacterial genetic nomenclature
There are generally accepted rules and conventions used for naming
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s in
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
. Standards were proposed in 1966 by Demerec et al.
General rules
Each bacterial gene is denoted by a
mnemonic
A mnemonic device ( ), memory trick or memory device is any learning technique that aids information retention or retrieval in the human memory, often by associating the information with something that is easier to remember.
It makes use of e ...
of three lower case letters which indicate the pathway or process in which the gene-product is involved, followed by a capital letter signifying the actual gene. In some cases, the gene letter may be followed by an
allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
number. All letters and numbers are underlined or italicised. For example, ''leuA'' is one of the genes of the
leucine
Leucine (symbol Leu or L) is an essential amino acid that is used in the biosynthesis of proteins. Leucine is an α-amino acid, meaning it contains an α-amino group (which is in the protonated −NH3+ form under biological conditions), an α-Car ...
biosynthetic pathway, and ''leuA273'' is a particular allele of this gene.
Where the actual protein coded by the gene is known then it may become part of the basis of the mnemonic, thus:
*''rpoA'' encodes the α-subunit of RNA polymerase
*''rpoB'' encodes the β-subunit of RNA polymerase
*''polA'' encodes DNA polymerase I
*''polC'' encodes DNA polymerase III
*''rpsL'' encodes ribosomal protein, small S12
Some gene designations refer to a known general function:
*''dna'' is involved in DNA replication
Predicted genes
In a 1998 analysis of the ''E. coli'' genome, a large number of genes with unknown function were designated names beginning with the letter ''y'', followed by sequentially generated letters without a mnemonic meaning (e.g., ''ydiO'' and ''ydbK''). Since being designated, some ''y-genes'' have been confirmed to have a function,
and assigned a synonym (alternative) name in recognition of this. However, as y-genes are not always re-named after being further characterised, this designation is not a reliable indicator of a gene's significance.
Common mnemonics
Biosynthetic genes
Loss of gene activity leads to a nutritional requirement (
auxotrophy) not exhibited by the
wildtype
The wild type (WT) is the phenotype of the typical form of a species as it occurs in nature. Originally, the wild type was conceptualized as a product of the standard "normal" allele at a locus, in contrast to that produced by a non-standard, "m ...
(
prototrophy).
Amino acids:
*''ala'' = alanine
*''arg'' = arginine
*''asn'' = asparagine
Some pathways produce metabolites that are precursors of more than one pathway. Hence, loss of one of these enzymes will lead to a requirement for more than one amino acid. For example:
*''ilv'': isoleucine and valine
Nucleotides:
*''gua'' = guanine
*''pur'' = purines
*''pyr'' = pyrimidine
*''thy'' = thymine
Vitamins:
*''bio'' = biotin
*''nad'' = NAD
*''pan'' = pantothenic acid
Catabolic genes
Loss of gene activity leads to loss of the ability to catabolise (use) the compound.
*''ara'' = arabinose
*''gal'' = galactose
*''lac'' = lactose
*''mal'' = maltose
*''man'' = mannose
*''mel'' = melibiose
*''rha'' = rhamnose
*''xyl'' = xylose
Drug and bacteriophage resistance genes
*''amp'' = ampicillin resistance
*''azi'' = azide resistance
*''bla'' = beta-lactam resistance
*''cat'' = chloramphenicol resistance
*''kan'' = kanamycin resistance
*''rif'' = rifampicin resistance
*''tonA'' = phage T1 resistance
Nonsense suppressor mutations
*''sup'' = suppressor (for instance, ''supF'' suppresses amber mutations)
Mutant nomenclature
If the gene in question is the wildtype a superscript '+' sign is used:
*''leuA
+''
If a gene is mutant, it is signified by a superscript '-':
*''leuA
−''
By convention, if neither is used, it is considered to be mutant.
There are additional superscripts and subscripts which provide more information about the mutation:
*
ts = temperature sensitive (''leuA
ts'')
*
cs = cold sensitive (''leuA
cs'')
*
am = amber mutation (''leuA
am'')
*
um = umber (opal) mutation (''leuA
um'')
*
oc = ochre mutation (''leuA
oc'')
*
R = resistant (Rif
R)
Other modifiers:
*Δ = deletion (Δ''leuA'')
*- = fusion (''leuA''-''lacZ'')
*
: = fusion (''leuA'':''lacZ'')
*
:: = insertion (''leuA''::Tn''10'')
*Ω = a genetic construct introduced by a two-point crossover (Ω''leuA'')
*Δ''deleted gene''::''replacing gene'' = deletion with replacement (Δ''leuA''::''nptII''(Kan
R) indicates that the ''leuA'' gene has been deleted and replaced with the gene for neomycin phosphotransferase, which confers kanamycin-resistance, as oftentimes parenthetically noted for drug-resistance markers)
Phenotype nomenclature
When referring to the genotype (the gene) the mnemonic is italicized and not capitalised. When referring to the gene product or phenotype, the mnemonic is first-letter capitalised and not italicized (''e.g.'' DnaA – the protein produced by the ''dnaA'' gene; LeuA
− – the phenotype of a ''leuA'' mutant; Amp
R – the ampicillin-resistance phenotype of the β-lactamase gene ''bla'').
Bacterial protein name nomenclature
Protein names are generally the same as the gene names, but the protein names are not italicized, and the first letter is upper-case. E.g. the name of RNA polymerase is RpoB, and this protein is encoded by ''rpoB'' gene.
Vertebrate gene and protein symbol conventions
The research communities of
vertebrate
Vertebrates () are animals with a vertebral column (backbone or spine), and a cranium, or skull. The vertebral column surrounds and protects the spinal cord, while the cranium protects the brain.
The vertebrates make up the subphylum Vertebra ...
model organisms have adopted guidelines whereby genes in these species are given, whenever possible, the same names as their human
orthologs
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
. The use of prefixes on gene symbols to indicate species (e.g., "Z" for zebrafish) is discouraged. The recommended formatting of printed gene and protein symbols varies between species.
Symbol and name
Vertebrate genes and proteins have names (typically strings of words) and symbols, which are short
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s (typically 3 to 8 characters). For example, the gene
cytotoxic T-lymphocyte-associated protein 4 has the HGNC symbol ''CTLA4''. These symbols are usually, but not always, coined by
contraction or
acronym
An acronym is a type of abbreviation consisting of a phrase whose only pronounced elements are the initial letters or initial sounds of words inside that phrase. Acronyms are often spelled with the initial Letter (alphabet), letter of each wor ...
ic abbreviation of the name. They are
pseudo-acronym
An acronym is a type of abbreviation consisting of a phrase whose only pronounced elements are the initial letters or initial sounds of words inside that phrase. Acronyms are often spelled with the initial Letter (alphabet), letter of each wor ...
s, however, in the sense that they are complete identifiers by themselves—short names, essentially. They are synonymous with (rather than standing for) the gene/protein name (or any of its aliases), regardless of whether the initial letters "match". For example, the symbol for the gene v-akt murine thymoma viral oncogene homolog 1, which is ''AKT1'', cannot be said to be an acronym for the name, and neither can any of its various synonyms, which include ''AKT'', ''PKB'', ''PRKBA'', and ''RAC''. Thus, the relationship of a gene symbol to the gene name is functionally the relationship of a
nickname
A nickname, in some circumstances also known as a sobriquet, or informally a "moniker", is an informal substitute for the proper name of a person, place, or thing, used to express affection, playfulness, contempt, or a particular character trait ...
to a formal name (both are complete
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s)—it is not the relationship of an acronym to its expansion. In this sense they are similar to the symbols for
units of measurement
A unit of measurement, or unit of measure, is a definite magnitude (mathematics), magnitude of a quantity, defined and adopted by convention or by law, that is used as a standard for measurement of the same kind of quantity. Any other qua ...
in the SI system (such as km for the
kilometre
The kilometre (SI symbol: km; or ), spelt kilometer in American English, American and Philippine English, is a unit of length in the International System of Units (SI), equal to one thousand metres (kilo- being the SI prefix for ). It is the ...
), in that they can be viewed as true
logogram
In a written language, a logogram (from Ancient Greek 'word', and 'that which is drawn or written'), also logograph or lexigraph, is a written character that represents a semantic component of a language, such as a word or morpheme. Chine ...
s rather than just abbreviations. Sometimes the distinction is academic, but not always. Although it is not wrong to say that "VEGFA" is an acronym standing for "
vascular endothelial growth factor A", just as it is not wrong that "km" is an abbreviation for "kilometre", there is more to the formality of symbols than those statements capture.
The
root
In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...
portion of the symbols for a
gene family
A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on ...
(such as the "
SERPIN
Serpins are a superfamily of proteins with similar structures that were first identified for their protease inhibition activity and are found in all kingdoms of life. The acronym serpin was originally coined because the first serpins to be ...
" root in ''SERPIN1'', ''SERPIN2'', ''SERPIN3'', and so on) is called a root symbol.
Human
The
HUGO Gene Nomenclature Committee
The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human gene, based on a ...
is responsible for providing human gene naming guidelines and approving new, unique human gene names and symbols (short
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s typically created by abbreviating). All human gene names and symbols can be searched online at the HGNC website, and the guidelines for their formation are available there. The guidelines for humans fit logically into the larger scope of
vertebrate
Vertebrates () are animals with a vertebral column (backbone or spine), and a cranium, or skull. The vertebral column surrounds and protects the spinal cord, while the cranium protects the brain.
The vertebrates make up the subphylum Vertebra ...
s in general, and the HGNC's remit has recently expanded to assigning symbols to all vertebrate species without an existing nomenclature committee, to ensure that vertebrate genes are named in line with their human orthologs/paralogs. Human gene symbols generally are italicised, with all letters in uppercase (e.g., ''SHH'', for
sonic hedgehog
Sonic hedgehog protein (SHH) is a major signaling molecule of embryonic development in humans and animals, encoded by the ''SHH'' gene.
This signaling molecule is key in regulating embryonic morphogenesis in all animals. SHH controls organoge ...
). Italics are not necessary in gene catalogs. Protein designations are the same as the gene symbol except that they are not italicised. Like the gene symbol, they are in
all caps
In typography, text or font in all caps (short for "all capitals") contains capital letters without any lowercase letters. For example: All-caps text can be seen in legal documents, advertisements, newspaper headlines, and the titles on book co ...
because human (human-specific or human homolog). mRNAs and cDNAs use the same formatting conventions as the gene symbol.
For naming
families of genes, the HGNC recommends using a "root symbol"
as the root for the various gene symbols. For example, for the
peroxiredoxin family, ''PRDX'' is the root symbol, and the family members are ''
PRDX1'', ''
PRDX2'', ''
PRDX3'', ''
PRDX4'', ''
PRDX5'', and ''
PRDX6''.
Mouse and rat
Gene symbols generally are italicised, with only the first letter in uppercase and the remaining letters in lowercase (''Shh''). Italics are not required on web pages. Protein designations are the same as the gene symbol, but are not italicised and all are upper case (SHH).
Chicken (''Gallus'' sp.)
Nomenclature generally follows the conventions of human nomenclature. Gene symbols generally are italicised, with all letters in uppercase (e.g., ''NLGN1'', for neuroligin1). Protein designations are the same as the gene symbol, but are not italicised; all letters are in uppercase (NLGN1). mRNAs and cDNAs use the same formatting conventions as the gene symbol.
Anole lizard (''Anolis'' sp.)
Gene symbols are italicised and all letters are in lowercase (''shh''). Protein designations are different from their gene symbol; they are not italicised, and all letters are in uppercase (SHH).
Frog (''Xenopus'' sp.)
Gene symbols are italicised and all letters are in lowercase (''shh''). Protein designations are the same as the gene symbol, but are not italicised; the first letter is in uppercase and the remaining letters are in lowercase (Shh).
Zebrafish
Gene symbols are italicised, with all letters in lowercase (''shh''). Protein designations are the same as the gene symbol, but are not italicised; the first letter is in uppercase and the remaining letters are in lowercase (Shh).
Gene and protein symbol and description in copyediting
"Expansion" (glossing)
A nearly universal rule in copyediting of articles for
medical journals
Medical literature is the scientific literature of medicine: articles in journals and texts in books devoted to the field of medicine. Many references to the medical literature include the health care literature generally, including that of denti ...
and other health science publications is that abbreviations and acronyms must be
expanded at first use, to provide a
glossing
A gloss is a brief notation, especially a marginal or interlinear one, of the meaning of a word or wording in a text. It may be in the language of the text or in the reader's language if that is different.
A collection of glosses is a ''glossar ...
type of explanation. Typically no exceptions are permitted except for small lists of especially well known terms (such as ''
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
'' or ''
HIV
The human immunodeficiency viruses (HIV) are two species of '' Lentivirus'' (a subgroup of retrovirus) that infect humans. Over time, they cause acquired immunodeficiency syndrome (AIDS), a condition in which progressive failure of the im ...
''). Although readers with high
subject-matter expert
A subject-matter expert (SME) is a person who has accumulated great knowledge in a particular field or topic and this level of knowledge is demonstrated by the person's degree, licensure, and/or through years of professional experience with the su ...
ise do not need most of these expansions, those with intermediate or (especially) low expertise are appropriately served by them.
One complication that gene and protein symbols bring to this general rule is that they are not, accurately speaking, abbreviations or acronyms, despite the fact that many were originally coined via abbreviating or acronymic etymology. They are
pseudoacronyms (as ''
SAT
The SAT ( ) is a standardized test widely used for college admissions in the United States. Since its debut in 1926, its name and Test score, scoring have changed several times. For much of its history, it was called the Scholastic Aptitude Test ...
'' and ''
KFC
KFC Corporation, doing business as KFC (an abbreviation of Kentucky Fried Chicken), is an American fast food restaurant chain specializing in fried chicken and chicken sandwiches. Headquartered in Louisville, Kentucky, it is the world's se ...
'' also are) because they do not "stand for" any expansion. Rather, the relationship of a gene symbol to the gene name is functionally the relationship of a
nickname
A nickname, in some circumstances also known as a sobriquet, or informally a "moniker", is an informal substitute for the proper name of a person, place, or thing, used to express affection, playfulness, contempt, or a particular character trait ...
to a formal name (both are complete
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s)—it is not the relationship of an acronym to its expansion. In fact, many official gene symbol–gene name pairs do not even share their initial-letter sequences (although some do). Nevertheless, gene and protein symbols "look just like" abbreviations and acronyms, which presents the problem that "failing" to "expand" them (even though it is not actually a failure and there are no true expansions) creates the appearance of violating the spell-out-all-acronyms rule.
One common way of reconciling these two opposing forces is simply to exempt all gene and protein symbols from the glossing rule. This is certainly fast and easy to do, and in highly specialized journals, it is also justified because the entire target
readership has high subject matter expertise. (Experts are not confused by the presence of symbols (whether known or novel) and they know where to look them up online for further details if needed.) But for journals with broader and more general target readerships, this action leaves the readers without any
explanatory annotation and can leave them wondering what the apparent-abbreviation stands for and why it was not explained. Therefore, a good alternative solution is simply to put either the official gene name or a suitable short description (gene alias/other designation) in parentheses after the first use of the official gene/protein symbol. This meets both the formal requirement (the presence of a gloss) and the functional requirement (helping the reader to know what the symbol refers to). The same guideline applies to shorthand names for sequence variations;
AMA says, "In general medical publications, textual explanations should accompany the shorthand terms at first mention."
Thus "188del11" is glossed as "an 11-bp deletion at nucleotide 188." This corollary rule (which forms an adjunct to the spell-everything-out rule) often also follows the "abbreviation-leading" style of expansion that is becoming more prevalent in recent years. Traditionally, the abbreviation always followed the fully expanded form in parentheses at first use. This is still the general rule. But for certain classes of abbreviations or acronyms (such as
clinical trial
Clinical trials are prospective biomedical or behavioral research studies on human subject research, human participants designed to answer specific questions about biomedical or behavioral interventions, including new treatments (such as novel v ...
acronyms
ECOG
The Eastern Cooperative Oncology Group (ECOG) began in 1955 as one of the first publicly funded cooperative groups to perform multi-center clinical trials for cancer research. A cooperative group in oncology constitutes a large network of pri ...
''
] or standardized polychemotherapy regimens [e.g., ''CHOP (chemotherapy), CHOP''
]), this pattern may be reversed, because the short form is more widely used and the expansion is merely parenthetical to the discussion at hand. The same is true of gene/protein symbols.
Synonyms and previous symbols and names
The
HUGO Gene Nomenclature Committee
The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human gene, based on a ...
(HGNC) maintains an official symbol and name for each human gene, as well as a list of synonyms and previous symbols and names. For example, for ''
AFF1'' (AF4/FMR2 family, member 1), previous symbols and names are ''MLLT2'' ("myeloid/lymphoid or mixed-lineage leukemia (trithorax (Drosophila) homolog); translocated to, 2") and ''PBM1'' ("pre-B-cell monocytic leukemia partner 1"), and synonyms are ''AF-4'' and ''AF4''. Authors of journal articles often use the latest official symbol and name, but just as often they use synonyms and previous symbols and names, which are well established by earlier use in the literature. AMA style is that "authors should use the most up-to-date term"
and that "in any discussion of a gene, it is recommended that the approved gene symbol be mentioned at some point, preferably in the title and abstract if relevant."
Because copyeditors are not expected or allowed to rewrite the gene and protein nomenclature throughout a manuscript (except by rare express instructions on particular assignments), the middle ground in manuscripts using synonyms or older symbols is that the copyeditor will add a mention of the current official symbol at least as a parenthetical gloss at the first mention of the gene or protein, and query for confirmation.
Styling
Some basic conventions, such as (1) that animal/human homolog (ortholog) pairs differ in
letter case
Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''majuscule'') and smaller lowercase (more formally '' minuscule'') in the written representation of certain languages. The writing system ...
(
title case
Title case or headline case is a style of capitalization used for rendering the titles of published works or works of art in English. When using title case, all words are capitalized, except for minor words (typically articles, short prepositio ...
and
all caps
In typography, text or font in all caps (short for "all capitals") contains capital letters without any lowercase letters. For example: All-caps text can be seen in legal documents, advertisements, newspaper headlines, and the titles on book co ...
, respectively) and (2) that the symbol is italicized when referring to the gene but nonitalic when referring to the protein, are often not followed by contributors to medical journals. Many journals have the copyeditors restyle the casing and formatting to the extent feasible, although in complex genetics discussions only
subject-matter expert
A subject-matter expert (SME) is a person who has accumulated great knowledge in a particular field or topic and this level of knowledge is demonstrated by the person's degree, licensure, and/or through years of professional experience with the su ...
s (SMEs) can effortlessly parse them all. One example that illustrates the potential for ambiguity among non-SMEs is that some official gene names have the word "protein" within them, so the phrase "brain protein I3 (''BRI3'')" (referring to the gene) and "brain protein I3 (BRI3)" (referring to the protein) are both valid. The ''
AMA Manual'' gives another example: both "the TH gene" and "the ''TH'' gene" can validly be parsed as correct ("the gene for tyrosine hydroxylase"), because the first mentions the alias (description) and the latter mentions the symbol. This seems confusing on the surface, although it is easier to understand when explained as follows: in this gene's case, as in many others, the alias (description) "happens to use the same letter string" that the symbol uses. (The matching of the letters is of course acronymic in origin and thus the phrase "happens to" implies more coincidence than is actually present; but phrasing it that way helps to make the explanation clearer.) There is no way for a non-SME to know this is the case for any particular letter string without looking up every gene from the manuscript in a database such as NCBI Gene, reviewing its symbol, name, and alias list, and doing some mental cross-referencing and double-checking (plus it helps to have biochemical knowledge). Most medical journals do not (in some cases cannot) pay for that level of
fact-checking
Fact-checking is the process of verifying the factual accuracy of questioned reporting and statements. Fact-checking can be conducted before or after the text or content is published or otherwise disseminated. Internal fact-checking is such che ...
as part of their copyediting service level; therefore, it remains the author's responsibility. However, as pointed out earlier, many authors make little attempt to follow the letter case or italic guidelines; and regarding protein symbols, they often will not use the official symbol at all. For example, although the guidelines would call
p53
p53, also known as tumor protein p53, cellular tumor antigen p53 (UniProt name), or transformation-related protein 53 (TRP53) is a regulatory transcription factor protein that is often mutated in human cancers. The p53 proteins (originally thou ...
protein "TP53" in humans or "Trp53" in mice, most authors call it "p53" in both (and even refuse to call it "TP53" if edits or queries try to), not least because of the biologic principle that many proteins are essentially or exactly the same molecules regardless of mammalian species. Regarding the gene, authors are usually willing to call it by its human-specific symbol and capitalization, ''TP53'', and may even do so without being prompted by a query. But the end result of all these factors is that the published literature often does not follow the nomenclature guidelines completely.
References
External links
International Protein Nomenclature GuidelinesThe Council of Science Editors (CSE) Resources for Genetic and Cytogenetic Nomenclature
The Protein Naming Utility a rules database for protein nomenclature
Coli Genetic Stock Centeris responsible for bacterial genetic nomenclature pertaining to Escherichia coli.
* ''Escherichia coli'' genetic nomenclature (rules for gene naming and meaning of other symbols used in Molecular Biology
on EcoliWiki the community annotation system o
EcoliHub
{{DEFAULTSORT:Gene Nomenclature
Genes
Molecular biology
Biological nomenclature
Bioinformatics
Bacteriology