Protein sequencing is the practical process of determining the
amino acid sequence
Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthe ...
of all or part of a
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
or
peptide
Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty am ...
. This may serve to identify the protein or characterize its
post-translational modification
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
s. Typically, partial sequencing of a protein provides sufficient information (one or more sequence tags) to identify it with reference to databases of protein sequences derived from the conceptual
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
of
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s.
The two major direct methods of protein sequencing are
mass spectrometry
Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
and
Edman degradation using a
protein sequenator (sequencer). Mass spectrometry methods are now the most widely used for protein sequencing and identification but Edman degradation remains a valuable tool for characterizing a protein's
''N''-terminus.
Determining amino acid composition

It is often desirable to know the unordered amino acid composition of a protein prior to attempting to find the ordered sequence, as this knowledge can be used to facilitate the discovery of errors in the sequencing process or to distinguish between ambiguous results. Knowledge of the frequency of certain amino acids may also be used to choose which
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalysis, catalyzes proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the formation of new protein products ...
to use for digestion of the protein. The misincorporation of low levels of non-standard amino acids (e.g. norleucine) into proteins may also be determined. A generalized method often referred to as ''amino acid analysis''
for determining amino acid frequency is as follows:
# Hydrolyse a known quantity of protein into its constituent amino acids.
# Separate and quantify the amino acids in some way.
Hydrolysis
Hydrolysis
Hydrolysis (; ) is any chemical reaction in which a molecule of water breaks one or more chemical bonds. The term is used broadly for substitution reaction, substitution, elimination reaction, elimination, and solvation reactions in which water ...
is done by heating a sample of the protein in 6 M
hydrochloric acid
Hydrochloric acid, also known as muriatic acid or spirits of salt, is an aqueous solution of hydrogen chloride (HCl). It is a colorless solution with a distinctive pungency, pungent smell. It is classified as a acid strength, strong acid. It is ...
to 100–110 °C for 24 hours or longer. Proteins with many bulky
hydrophobic
In chemistry, hydrophobicity is the chemical property of a molecule (called a hydrophobe) that is seemingly repelled from a mass of water. In contrast, hydrophiles are attracted to water.
Hydrophobic molecules tend to be nonpolar and, thu ...
groups may require longer heating periods. However, these conditions are so vigorous that some amino acids (
serine
Serine
(symbol Ser or S) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α- amino group (which is in the protonated − form under biological conditions), a carboxyl group (which is in the deprotonated − ...
,
threonine
Threonine (symbol Thr or T) is an amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated −NH form when dissolved in water), a carboxyl group (which is in the deprotonated −COO− ...
,
tyrosine
-Tyrosine or tyrosine (symbol Tyr or Y) or 4-hydroxyphenylalanine is one of the 20 standard amino acids that are used by cells to synthesize proteins. It is a conditionally essential amino acid with a polar side group. The word "tyrosine" is ...
,
tryptophan
Tryptophan (symbol Trp or W)
is an α-amino acid that is used in the biosynthesis of proteins. Tryptophan contains an α-amino group, an α-carboxylic acid group, and a side chain indole, making it a polar molecule with a non-polar aromat ...
,
glutamine
Glutamine (symbol Gln or Q) is an α-amino acid that is used in the biosynthesis of proteins. Its side chain is similar to that of glutamic acid, except the carboxylic acid group is replaced by an amide. It is classified as a charge-neutral ...
, and
cysteine
Cysteine (; symbol Cys or C) is a semiessential proteinogenic amino acid with the chemical formula, formula . The thiol side chain in cysteine enables the formation of Disulfide, disulfide bonds, and often participates in enzymatic reactions as ...
) are degraded. To circumvent this problem, Biochemistry Online suggests heating separate samples for different times, analysing each resulting solution, and extrapolating back to zero hydrolysis time. Rastall suggests a variety of reagents to prevent or reduce degradation, such as
thiol
In organic chemistry, a thiol (; ), or thiol derivative, is any organosulfur compound of the form , where R represents an alkyl or other organic substituent. The functional group itself is referred to as either a thiol group or a sulfhydryl grou ...
reagent
In chemistry, a reagent ( ) or analytical reagent is a substance or compound added to a system to cause a chemical reaction, or test if one occurs. The terms ''reactant'' and ''reagent'' are often used interchangeably, but reactant specifies a ...
s or
phenol
Phenol (also known as carbolic acid, phenolic acid, or benzenol) is an aromatic organic compound with the molecular formula . It is a white crystalline solid that is volatile and can catch fire.
The molecule consists of a phenyl group () ...
to protect tryptophan and tyrosine from attack by chlorine, and pre-oxidising cysteine. He also suggests measuring the quantity of
ammonia
Ammonia is an inorganic chemical compound of nitrogen and hydrogen with the chemical formula, formula . A Binary compounds of hydrogen, stable binary hydride and the simplest pnictogen hydride, ammonia is a colourless gas with a distinctive pu ...
evolved to determine the extent of
amide hydrolysis.
Separation and quantitation
The amino acids can be separated by
ion-exchange chromatography
Ion chromatography (or ion-exchange chromatography) is a form of chromatography that separates ions and ionizable polar molecules based on their affinity to the ion exchanger. It works on almost any kind of Charge (chemistry), charged molecule ...
then derivatized to facilitate their detection. More commonly, the amino acids are derivatized then resolved by
reversed phase HPLC.
An example of the ion-exchange chromatography is given by the NTRC using sulfonated polystyrene as a matrix, adding the amino acids in acid solution and passing a buffer of steadily increasing
pH through the column. Amino acids are eluted when the pH reaches their respective
isoelectric point
The isoelectric point (pI, pH(I), IEP), is the pH at which a molecule carries no net electric charge, electrical charge or is electrically neutral in the statistical mean. The standard nomenclature to represent the isoelectric point is pH(I). Howe ...
s. Once the amino acids have been separated, their respective quantities are determined by adding a reagent that will form a coloured derivative. If the amounts of amino acids are in excess of 10 nmol,
ninhydrin can be used for this; it gives a yellow colour when reacted with proline, and a vivid purple with other amino acids. The concentration of amino acid is proportional to the absorbance of the resulting solution. With very small quantities, down to 10 pmol, fluorescent derivatives can be formed using reagents such as
ortho-phthaldehyde (OPA) or
fluorescamine
Fluorescamine is a spiro compound that is not fluorescent itself, but reacts with primary amines to form highly fluorescent products, i.e. it is fluorogenic. It hence has been used as a reagent for the detection of amines and peptide
Peptide ...
.
Pre-column derivatization may use the Edman reagent to produce a derivative that is detected by UV light. Greater sensitivity is achieved using a reagent that generates a fluorescent derivative. The derivatized amino acids are subjected to reversed phase chromatography, typically using a C8 or C18
silica column and an optimised
elution
In analytical and organic chemistry, elution is the process of extracting one material from another by washing with a solvent: washing of loaded ion-exchange resins to remove captured ions, or eluting proteins or other biopolymers from an el ...
gradient. The eluting amino acids are detected using a UV or fluorescence detector and the peak areas compared with those for derivatised standards in order to quantify each amino acid in the sample.
''N''-terminal amino acid analysis

Determining which amino acid forms the
''N''-terminus of a
peptide
Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty am ...
chain is useful for two reasons: to aid the ordering of individual peptide fragments' sequences into a whole chain, and because the first round of
Edman degradation is often contaminated by impurities and therefore does not give an accurate determination of the ''N''-terminal amino acid. A generalised method for ''N''-terminal amino acid analysis follows:
# React the peptide with a reagent that will selectively label the terminal amino acid.
# Hydrolyse the protein.
# Determine the amino acid by chromatography and comparison with standards.
There are many different reagents which can be used to label terminal amino acids. They all react with amine groups and will therefore also bind to amine groups in the side chains of amino acids such as lysine - for this reason it is necessary to be careful in interpreting chromatograms to ensure that the right spot is chosen. Two of the more common reagents are Sanger's reagent (
1-fluoro-2,4-dinitrobenzene) and dansyl derivatives such as
dansyl chloride.
Phenylisothiocyanate, the reagent for the Edman degradation, can also be used. The same questions apply here as in the determination of amino acid composition, with the exception that no stain is needed, as the reagents produce coloured derivatives and only qualitative analysis is required. So the amino acid does not have to be eluted from the chromatography column, just compared with a standard. Another consideration to take into account is that, since any amine groups will have reacted with the labelling reagent, ion exchange chromatography cannot be used, and
thin-layer chromatography
Thin-layer chromatography (TLC) is a chromatography technique that separates components in non-volatile mixtures.
It is performed on a TLC plate made up of a non-reactive solid coated with a thin layer of adsorbent material. This is called the sta ...
or
high-pressure liquid chromatography should be used instead.
C-terminal amino acid analysis
The number of methods available for
C-terminal
The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, carboxy tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When t ...
amino acid analysis is much smaller than the number of available methods of N-terminal analysis. The most common method is to add
carboxypeptidase
A carboxypeptidase ( EC number 3.4.16 - 3.4.18) is a protease enzyme that hydrolyzes (cleaves) a peptide bond at the carboxy-terminal (C-terminal) end of a protein or peptide. This is in contrast to an aminopeptidases, which cleave peptide b ...
s to a solution of the protein, take samples at regular intervals, and determine the terminal amino acid by analysing a plot of amino acid concentrations against time. This method will be very useful in the case of polypeptides and protein-blocked N termini. C-terminal sequencing would greatly help in verifying the primary structures of proteins predicted from DNA sequences and to detect any posttranslational processing of gene products from known codon sequences.
Edman degradation
The
Edman degradation is a very important reaction for protein sequencing, because it allows the ordered amino acid composition of a protein to be discovered. Automated Edman sequencers are now in widespread use, and are able to sequence peptides up to approximately 50 amino acids long. A reaction scheme for sequencing a protein by the Edman degradation follows; some of the steps are elaborated on subsequently.
# Break any
disulfide bridges in the protein with a
reducing agent
In chemistry, a reducing agent (also known as a reductant, reducer, or electron donor) is a chemical species that "donates" an electron to an (called the , , , or ).
Examples of substances that are common reducing agents include hydrogen, carbon ...
like
2-mercaptoethanol
2-Mercaptoethanol (also β-mercaptoethanol, BME, 2BME, 2-ME or β-met) is the chemical compound with the chemical formula, formula HOCH2CH2SH. ME or βME, as it is commonly abbreviated, is used to reduce disulfide bonds and can act as a biological ...
. A
protecting group
A protecting group or protective group is introduced into a molecule by chemical modification of a functional group to obtain chemoselectivity in a subsequent chemical reaction. It plays an important role in multistep organic synthesis.
In man ...
such as
iodoacetic acid may be necessary to prevent the bonds from re-forming.
# Separate and purify the individual chains of the protein complex, if there are more than one.
# Determine the amino acid composition of each chain.
# Determine the terminal amino acids of each chain.
# Break each chain into fragments under 50 amino acids long.
# Separate and purify the fragments.
# Determine the sequence of each fragment.
# Repeat with a different pattern of cleavage.
# Construct the sequence of the overall protein.
Digestion into peptide fragments
Peptides longer than about 50–70 amino acids long cannot be sequenced reliably by the Edman degradation. Because of this, long protein chains need to be broken up into small fragments that can then be sequenced individually. Digestion is done either by
endopeptidases such as
trypsin
Trypsin is an enzyme in the first section of the small intestine that starts the digestion of protein molecules by cutting long chains of amino acids into smaller pieces. It is a serine protease from the PA clan superfamily, found in the dig ...
or
pepsin
Pepsin is an endopeptidase that breaks down proteins into smaller peptides and amino acids. It is one of the main digestive enzymes in the digestive systems of humans and many other animals, where it helps digest the proteins in food. Pe ...
or by chemical reagents such as
cyanogen bromide
Cyanogen bromide is the inorganic compound with the chemical formula, formula BrCN. It is a colorless solid that is widely used to modify biopolymers, fragment proteins and peptides (cuts the C-terminus of methionine), and synthesize other compo ...
. Different enzymes give different cleavage patterns, and the overlap between fragments can be used to construct an overall sequence.
Reaction
The peptide to be sequenced is
adsorbed onto a solid surface. One common
substrate
Substrate may refer to:
Physical layers
*Substrate (biology), the natural environment in which an organism lives, or the surface or medium on which an organism grows or is attached
** Substrate (aquatic environment), the earthy material that exi ...
is glass fibre coated with
polybrene, a
cationic polymer. The Edman reagent,
phenylisothiocyanate (PITC), is added to the adsorbed peptide, together with a mildly basic
buffer solution
A buffer solution is a solution where the pH does not change significantly on dilution or if an acid or base is added at constant temperature. Its pH changes very little when a small amount of strong acid or base is added to it. Buffer solution ...
of 12%
trimethylamine
Trimethylamine (TMA) is an organic compound with the formula N(CH3)3. It is a trimethylated derivative of ammonia. TMA is widely used in industry. At higher concentrations it has an ammonia-like odor, and can cause necrosis of mucous membranes ...
. This reacts with the amine group of the N-terminal amino acid.
The terminal amino acid can then be selectively detached by the addition of
anhydrous
A substance is anhydrous if it contains no water. Many processes in chemistry can be impeded by the presence of water; therefore, it is important that water-free reagents and techniques are used. In practice, however, it is very difficult to achie ...
acid. The derivative then
isomerises to give a substituted
phenylthiohydantoin, which can be washed off and identified by chromatography, and the cycle can be repeated. The efficiency of each step is about 98%, which allows about 50 amino acids to be reliably determined.
Protein sequencer
A protein sequenator is a machine that performs Edman degradation in an automated manner. A sample of the protein or peptide is immobilized in the reaction vessel of the protein sequenator and the Edman degradation is performed. Each cycle releases and derivatises one amino acid from the protein or peptide's ''N''-terminus and the released amino-acid derivative is then identified by HPLC. The sequencing process is done repetitively for the whole
polypeptide
Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty ...
until the entire measurable sequence is established or for a pre-determined number of cycles.
Identification by mass spectrometry
Protein identification is the process of assigning a name to a protein of interest (POI), based on its amino-acid sequence. Typically, only part of the protein’s sequence needs to be determined experimentally in order to identify the protein with reference to databases of protein sequences deduced from the DNA sequences of their genes. Further protein characterization may include confirmation of the actual N- and C-termini of the POI, determination of sequence variants and identification of any post-translational modifications present.
Proteolytic digests
A general scheme for protein identification is described.
# The POI is isolated, typically by
SDS-PAGE
SDS-PAGE (sodium dodecyl sulfate–polyacrylamide gel electrophoresis) is a Discontinuous electrophoresis, discontinuous electrophoretic system developed by Ulrich K. Laemmli which is commonly used as a method to separate proteins with molecular m ...
or
chromatography
In chemical analysis, chromatography is a laboratory technique for the Separation process, separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it ...
.
# The isolated POI may be chemically modified to stabilise Cysteine residues (e.g. S-amidomethylation or S-carboxymethylation).
# The POI is digested with a specific protease to generate peptides.
Trypsin
Trypsin is an enzyme in the first section of the small intestine that starts the digestion of protein molecules by cutting long chains of amino acids into smaller pieces. It is a serine protease from the PA clan superfamily, found in the dig ...
, which cleaves selectively on the C-terminal side of Lysine or Arginine residues, is the most commonly used protease. Its advantages include i) the frequency of Lys and Arg residues in proteins, ii) the high specificity of the enzyme, iii) the stability of the enzyme and iv) the suitability of tryptic peptides for mass spectrometry.
# The peptides may be desalted to remove ionizable contaminants and subjected to
MALDI-TOF mass spectrometry. Direct measurement of the masses of the peptides may provide sufficient information to identify the protein (see
Peptide mass fingerprinting) but further fragmentation of the peptides inside the mass spectrometer is often used to gain information about the peptides’ sequences. Alternatively, peptides may be desalted and separated by
reversed phase HPLC and introduced into a mass spectrometer via an
ESI source. LC-ESI-MS may provide more information than MALDI-MS for protein identification but uses more instrument time.
# Depending on the type of mass spectrometer, fragmentation of peptide ions may occur via a variety of mechanisms such as
collision-induced dissociation
Collision-induced dissociation (CID), also known as collisionally activated dissociation (CAD), is a mass spectrometry technique to induce fragmentation (chemistry), fragmentation of selected ions in the gas phase. The selected ions (typically m ...
(CID) or
post-source decay (PSD). In each case, the pattern of fragment ions of a peptide provides information about its sequence.
# Information including the measured mass of the putative peptide ions and those of their fragment ions is then matched against calculated mass values from the conceptual (in-silico) proteolysis and fragmentation of databases of protein sequences. A successful match will be found if its score exceeds a threshold based on the analysis parameters. Even if the actual protein is not represented in the database, error-tolerant matching allows for the putative identification of a protein based on similarity to
homologous proteins. A variety of software packages are available to perform this analysis.
# Software packages usually generate a report showing the identity (accession code) of each identified protein, its matching score, and provide a measure of the relative strength of the matching where multiple proteins are identified.
# A diagram of the matched peptides on the sequence of the identified protein is often used to show the sequence coverage (% of the protein detected as peptides). Where the POI is thought to be significantly smaller than the matched protein, the diagram may suggest whether the POI is an N- or C-terminal fragment of the identified protein.
De novo sequencing
The pattern of fragmentation of a peptide allows for direct determination of its sequence by
''de novo'' sequencing. This sequence may be used to match databases of protein sequences or to investigate
post-translational or chemical modifications. It may provide additional evidence for protein identifications performed as above.
N- and C-termini
The peptides matched during protein identification do not necessarily include the N- or C-termini predicted for the matched protein. This may result from the N- or C-terminal peptides being difficult to identify by MS (e.g. being either too short or too long), being post-translationally modified (e.g. N-terminal acetylation) or genuinely differing from the prediction. Post-translational modifications or truncated termini may be identified by closer examination of the data (i.e. ''de novo'' sequencing). A repeat digest using a protease of different specificity may also be useful.
Post-translational modifications
Whilst detailed comparison of the MS data with predictions based on the known protein sequence may be used to define post-translational modifications, targeted approaches to data acquisition may also be used. For instance, specific enrichment of phosphopeptides may assist in identifying
phosphorylation
In biochemistry, phosphorylation is described as the "transfer of a phosphate group" from a donor to an acceptor. A common phosphorylating agent (phosphate donor) is ATP and a common family of acceptor are alcohols:
:
This equation can be writ ...
sites in a protein. Alternative methods of peptide fragmentation in the mass spectrometer, such as
ETD or
ECD, may give complementary sequence information.
Whole-mass determination
The protein’s whole mass is the sum of the masses of its amino-acid residues plus the mass of a water molecule and adjusted for any post-translational modifications. Although proteins ionize less well than the peptides derived from them, a protein in solution may be able to be subjected to ESI-MS and its mass measured to an accuracy of 1 part in 20,000 or better. This is often sufficient to confirm the termini (thus that the protein’s measured mass matches that predicted from its sequence) and infer the presence or absence of many post-translational modifications.
Limitations
Proteolysis does not always yield a set of readily analyzable peptides covering the entire sequence of POI. The fragmentation of peptides in the mass spectrometer often does not yield ions corresponding to cleavage at each peptide bond. Thus, the deduced sequence for each peptide is not necessarily complete. The standard methods of fragmentation do not distinguish between leucine and isoleucine residues since they are isomeric.
Because the Edman degradation proceeds from the N-terminus of the protein, it will not work if the N-terminus has been chemically modified (e.g. by acetylation or formation of Pyroglutamic acid). Edman degradation is generally not useful to determine the positions of disulfide bridges. It also requires peptide amounts of 1 picomole or above for discernible results, making it less sensitive than
mass spectrometry
Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
.
Predicting from DNA/RNA sequences
In biology, proteins are produced by
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
of messenger RNA (mRNA) with the protein sequence deriving from the sequence of codons in the mRNA. The mRNA is itself formed by the
transcription of genes and may be further modified. These processes are sufficiently understood to use computer algorithms to automate predictions of protein sequences from DNA sequences, such as from whole-genome DNA-sequencing projects, and have led to the generation of large databases of protein sequences such as
UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
. Predicted protein sequences are an important resource for protein identification by mass spectrometry.
Historically, short protein sequences (10 to 15 residues) determined by Edman degradation were back-translated into DNA sequences that could be used as probes or primers to isolate
molecular clones of the corresponding gene or complementary DNA. The sequence of the cloned DNA was then determined and used to deduce the full amino-acid sequence of the protein.
Bioinformatics tools
Bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
tools exist to assist with interpretation of mass spectra (see
de novo peptide sequencing), to compare or analyze protein sequences (see
sequence analysis
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome ...
), or search databases using peptide or protein sequences (see
BLAST).
Applications to cryptography
The difficulty of protein sequencing was recentl
proposedas a basis for creating k-time programs, programs that run exactly k times before self-destructing. Such a thing is impossible to build purely in software because all software is inherently clonable an unlimited number of times.
See also
*
Proteomics
Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...
*
DNA sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
*
Klaus Biemann
*
Donald F. Hunt
*
Matthias Mann
*
John R. Yates
References
Further reading
*
{{Authority control
Cell biology
Proteomic sequencing