The UCSC Genome Browser is an online and downloadable genome browser hosted by the

University of California, Santa Cruz The University of California, Santa Cruz (UC Santa Cruz or UCSC) is a public university, public Land-grant university, land-grant research university in Santa Cruz, California, United States. It is one of the ten campuses in the University of C ...

(UCSC). It is an interactive website offering access to genome sequence data from a variety of

vertebrate Vertebrates () are animals with a vertebral column (backbone or spine), and a cranium, or skull. The vertebral column surrounds and protects the spinal cord, while the cranium protects the brain. The vertebrates make up the subphylum Vertebra ...

and

invertebrate Invertebrates are animals that neither develop nor retain a vertebral column (commonly known as a ''spine'' or ''backbone''), which evolved from the notochord. It is a paraphyletic grouping including all animals excluding the chordata, chordate s ...

species and major

model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...

s, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a

MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...

database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

History

Origins and Early Development (2000–2003)

10yr Post-Release UCSC Domain Traffic Data

The UCSC Genome Browser was developed in 2000 by graduate student Jim Kent and Professor David Haussler at the

(UCSC), to provide public access to the draft human genome sequence produced by the

Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...

. On July 7, 2000, UCSC released the first working draft of the human genome online, accompanied by an initial version of the Genome Browser. This release enabled researchers worldwide to access and explore the genome data interactively. The project received early funding from the

Howard Hughes Medical Institute The Howard Hughes Medical Institute (HHMI) is an American non-profit medical research organization headquartered in Chevy Chase, Maryland with additional facilities in Ashburn, Virginia. It was founded in 1953 by Howard Hughes, an American busin ...

(HHMI) and the

National Human Genome Research Institute The National Human Genome Research Institute (NHGRI) is an institute of the National Institutes of Health, located in Bethesda, Maryland. NHGRI began as the Office of Human Genome Research in The Office of the Director in 1988. This Office transi ...

(NHGRI). In 2002, the team published a detailed description of the Genome Browser in

Genome Research ''Genome Research'' is a peer-reviewed scientific journal published by Cold Spring Harbor Laboratory Press. Disregarding review journals, Genome Research ranks 2nd in the category 'Genetics and Genomics' after Nature Genetics. The focus of the j ...

, outlining its

-based database and web interface. The browser featured various aligned annotation tracks, including gene predictions,

mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...

/ EST alignments, and SNP markers, all presented in a scrollable view. Users could also add custom tracks to visualize their data alongside official annotations. In that same year, the browser expanded to include the mouse genome, facilitating comparative genomics studies. Tools like BLAT (BLAST-like alignment tool) and LiftOver were introduced to enhance sequence alignment and coordinate conversion between different genome assemblies.

Expansion and Feature Enhancements (2004–2010)

Between 2004 and 2010, the UCSC Genome Browser incorporated numerous additional genomes, including those of rat, chicken, dog, and chimpanzee, among others. The development of chain and net alignment algorithms allowed for whole-genome alignments between species, and the Conservation track visualized evolutionary conserved elements. To accommodate the influx of data from new genomic technologies, UCSC introduced Genome Graphs in 2007–2008, enabling users to plot genome-wide datasets, such as association study

p-values In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...

, across entire genomes. The browser also implemented the BigBed and BigWig binary data formats in 2010, facilitating efficient visualization of large-scale sequencing datasets.

Further Integration with Major Genomic Projects (2011–2015)

In 2011, UCSC launched Track Data Hubs, allowing external researchers to integrate their annotation tracks into the Genome Browser via remote URLs. UCSC played a pivotal role in the

ENCODE The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims "to build a comprehensive parts list of functional elements in the human genome." ENCODE also supports further biomedical research by "generating community resourc ...

(Encyclopedia of DNA Elements) project since its launch in 2003. This new feature significantly enhanced how researchers could interact with and visualize large-scale genomic datasets. The browser hosted a vast array of functional genomics data generated by ENCODE, including

ChIP-seq ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with Massively parallel signature sequencing, massively parallel DNA sequencing to identify t ...

RNA-seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...

, and

DNase Deoxyribonuclease (DNase, for short) refers to a group of glycoprotein endonucleases which are enzymes that catalyze the hydrolytic cleavage of phosphodiester linkages in the DNA backbone, thus degrading DNA. The role of the DNase enzyme in cells ...

hypersensitivity assays. The browser also integrated data from the

1000 Genomes Project The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least o ...

, providing comprehensive access to human genetic variation data. In 2013, UCSC partnered with the GENCODE project to adopt its high-quality gene annotations. In 2015, the GENCODE gene set (GRCh38/hg38 assembly) replaced UCSC's in-house track as the default gene set of the human genome browser.

Recent Developments and Recognition (2016–Present)

Beginning in 2016, the UCSC Genome Browser expanded its capabilities by integrating clinical and variant datasets, including those from

ClinVar ClinVar is a public archive with free access to reports on the relationships between human variations and phenotypes, with supporting evidence. The database includes germline and somatic variants of any size, type or genomic location. Interpretation ...

and various cancer genomics resources. In 2017, UCSC launched the UCSC Cell Browser, a companion platform designed to handle single-cell sequencing datasets and spatial transcriptomics. The browser has also integrated data from the Genotype-Tissue Expression (GTEx) project, providing visualization resources for

gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...

across various human tissues. The browser now hosts over 180 genome assemblies from more than 100 species, including the fully

telomere A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes (see #Sequences, Sequences). Telomeres are a widespread genetic feature most commonly found in eukaryotes. In ...

-to-telomere human genome assembly (T2T-CHM13) released by the T2T Consortium in 2022. Funding for the UCSC Genome Browser has transitioned to rely exclusively on NIH grants, with continued support from the NHGRI. In 2022, the browser was recognized as one of the inaugural Global Core Biodata Resources, highlighting its critical role in life science research and ensuring prioritized long-term funding. As of 2025, the UCSC Genome Browser continues to serve as an essential, freely accessible tool for researchers worldwide, accommodating daily usage by tens of thousands and regularly updating with new genomic data and functionalities.

Genomes

In the years since its inception, the UCSC Browser has expanded to accommodate genome sequences of all vertebrate species and selected invertebrates for which high-coverage genomic sequences is available, now including 108

species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...

. High coverage is necessary to allow overlap to guide the construction of larger contiguous regions. Genomic sequences with less coverage are included in multiple-alignment tracks on some browsers, but the fragmented nature of these assemblies does not make them suitable for building full featured browsers. (more below on multiple-alignment tracks). The species hosted with full-featured genome browsers are shown in the table. It is important to note that updates to this section are dependent on new genome releases from sequencing centers and that explains the reason as to why there was a 2 year difference between the last two genome additions. Apart from these 108 species and their assemblies, the UCSC Genome Browser also offer
''Assembly Hubs''
web-accessible directories of genomic data that can be viewed on the browser and include assemblies that are not hosted natively on it. There, users can load and annotate unique assemblies for which UCSC does not provide an annotation database. A full list of species and their assemblies can be viewed in th
GenArk Portal
including 2,589 assemblies hosted by both UCSC Genome Browser database and Assembly Hubs. An example can be seen in th

assembly hub. Below is a snippet of what users can find when they use the assembly hub:

Limitations

The UCSC Genome browser is a good tool to use for analyzing genomic sequences and data but it has its own limitations some which include a legacy website interface. In this age of advancements in technology, it would be expected that a website commonly used by thousands of students and researchers globally would have a user friendly interface that is easy to navigate but that is not the case with the UCSC genome browser. Another pitfall of the UCSC Genome browser is that it is primarily a visualization tool used to showcase various sequences and to do some analysis of these sequences, users would have to use external tools such as MAFFT, COFFEE or MUSCLE.

Browser functionality

The large amount of data about biological systems that is accumulating in the literature makes it necessary to collect and digest information using the tools of

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

. The UCSC Genome Browser presents a diverse collection of annotation datasets (known as "tracks" and presented graphically), including mRNA alignments, mappings of DNA repeat elements, gene predictions, gene-expression data, disease-association data (representing the relationships of genes to diseases), and mappings of commercially available gene chips (e.g., Illumina and Agilent). The basic paradigm of display is to show the genome sequence in the horizontal dimension, and show graphical representations of the locations of the mRNAs, gene predictions, etc. Blocks of color along the coordinate axis show the locations of the alignments of the various data types. The ability to show this large variety of data types on a single coordinate axis makes the browser a handy tool for the vertical integration of the data. To find a specific gene or genomic region, the user may type in the gene name, a DNA sequence, an accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region around the gene

BRCA1 Breast cancer type 1 susceptibility protein is a protein that in humans is encoded by the ''BRCA1'' () gene. Orthologs are common in other vertebrate species, whereas invertebrate genomes may encode a more distantly related gene. ''BRCA1'' is a ...

). Presenting the data in the graphical format allows the browser to present link access to detailed information about any of the annotations. The gene details page of the UCSC Genes track provides a large number of links to more specific information about the gene at many other data resources, such as Online Mendelian Inheritance in Man (

OMIM Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...

) and SwissProt. Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed. By pre-aligning millions of RNA secuences from

GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...

to each of the 244 genome assemblies (many of the 108 species have more than one assembly), the browser allows instant access to the alignments of any RNA to any of the hosted species. BrowserFoxp2

The juxtaposition of the many types of data allow researchers to display exactly the combination of data that will answer specific questions. A pdf/postscript output functionality allows export of a camera-ready image for publication in academic journals. One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is the continuously variable nature of the display. Sequence of any size can be displayed, from a single DNA base up to the entire chromosome (human chr1 = 245 million bases, Mb) with full annotation tracks. Researchers can display a single gene, a single exon, or an entire chromosome band, showing dozens or hundreds of genes and any combination of the many annotations. A convenient drag-and-zoom feature allows the user to choose any region in the genome image and expand it to occupy the full screen. Researchers may also use the browser to display their own data via th
Custom Tracks
tool. This feature allows users to upload a file of their own data and view the data in the context of the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with th
Table Browser
tool (such as only the SNPs that change the amino acid sequence of a protein) and display this specific subset of the data in the browser as a Custom Track.UCSC Genome Browser: Custom Tracks Help Page
/ref> Any browser view created by a user, including those containing Custom Tracks, may be shared with other users via th

Custom Tracks support multiple file formats, including BED, WIG, GFF, GTF, PSL, and big* formats such as bigBed and bigWig. Users may input data via direct paste, file upload, or by referencing a URL pointing to the remote data. Tracks are temporary and those not associated with a saved session are removed after 48 hours. Users can configure tracks with track lines to specify attributes such as name, description, visibility, color, and link targets. Optional browser lines may be included to define initial display coordinates and browser settings. Uploaded tracks can be managed, updated, or deleted through the “Manage Custom Tracks” interface. For larger or more persistent data hosting, users may us

which provide a scalable system for remote data integration and advanced configuration.

Tracks

Below the displayed images of the UCSC Genome browser are eleven categories of additional tracks that can be selected and displayed alongside the original data. Researchers can select tracks which best represent their query to allow for more applicable data to be displayed depending on the type and depth of research being done. These categories are as follows:

Analysis tools

Overview

The UCSC site hosts a set of genome analysis tools. Each tool allows users to create, find, and modify sequences to find similar sequences or patterns. These tools are generally free to use for academic purposes, nonprofit organizations and individuals with a personal interest in genomics. Tools developed by UCSC include: Genome Browser, BLAT, In-Silico PCR, Table Browser, LiftOver, REST API, Variant Annotation Integrator, Gene Sorter, Genome Graph, Data Integrator, UShER, Gene Interactions, VisiGene, DNA Duster, Protein Duster, and Phylogenetic Tree PNG Maker. Source Code for BLAT, LiftOver and Genome Browser is available for download on th

Other useful tools that work with UCSC file formats include: BEDOPS, bedtools, bwtool CrossMap, CruzDb, G-OnRamp, libBigWig, MakeHub, RTrackLayer, trackhub, twobitreader, ucsc-genomes-download, and Wiggle Tools.

BLAT

BLAT is a FASTA format sequence alignment tool that is useful for finding sequences in the massive sequence (human genome = 3.23 billion bases b of any of the featured genomes. Users are able to paste a sequence into the text box or upload a file containing the sequence. The tool also includes customizability depending on what a user is looking for. Users may choose the genome and assembly the sequence belongs to, Query type, Sort output and Output type. Using BLAT on DNA finds sequences ≥ 95% similarity of bases of lengths ≥ 25. It indexes the genome in memory consisting of all overlapping 11-mers stepping by 5 unless there are repeats. Using BLAT on protein finds sequences ≥ 80% similarity of amino acids of lengths ≥ 20. It indexes the genome in memory of all overlapping 4-mers stepping 5 unless there are repeats. BLAT was written by Jim Kent, more information about the software can be found on hi
website

Genome Graphs

Th
Genome Graphs
tool allows users to view all chromosomes at once and display the results of genome-wide association studies (GWAS). Users can customize the clad an organism is in, the genome and assembly type, graph colors, and the significance threshold. Users can also either upload their own data, import database assemblies or configure the layout of the graph, graph style, and chromosome layout. There is a more detailed instruction guide for users who may want to utilize all features to their fullest potential on th
Genome Graphs User's Guide

Lift Over

Th
LiftOver
ref> tool uses whole-genome alignments to allow conversion of sequences from one assembly to another or between species. A user can enter the genome coordinates and annotations into the textbox or upload the file to the system. The original genome and assembly are selected first as well as the new genome and assembly that it is going to be converted into. The input can be customized in two categories: Regions defined by chrom:start-end (BED 4 to BED 6) and Regions with an exon-intron structure (usually transcripts, BED 12). Regions defined by chrom:start-end can be customized to allow for multiple output regions, set the minimum hit size in query and set minimum chain size in target. Regions with an exon-intron structure can be customized to set the minimum ratio of alignment blocks or exons that must map and set; if an exon is not mapped, use the closest mapped base.

Python APIs

The UCSC Genome Browser provides Python-compatible interfaces that allow researchers to programmatically access genomic data and annotations. These APIs support automation, integration into computational workflows, and large-scale analysis tasks, enhancing accessibility beyond the graphical browser.

Overview

The UCSC REST API is the primary method for programmatic interaction. It allows users to send HTTP requests to retrieve genomic sequences, annotation tracks, and gene-related information. While the API itself is language-agnostic, Python developers can easily integrate it using libraries such as requests. Community-developed wrappers and tools further simplify API usage in Python-based bioinformatics environments.

Functionality and Use Cases

Common uses of the UCSC REST API in Python include: * Sequence Retrieval – Downloading nucleotide sequences from specific genome coordinates * Gene Annotation Access – Accessing curated data from RefSeq, GENCODE, and other gene tables * Variation Data Queries – Obtaining information about SNPs, insertions, or structural variants in defined regions * Track Information Extraction – Listing available tracks and metadata for a given genome build * Pipeline Integration – Automating queries in larger workflows for comparative or functional genomics These capabilities make the API useful for custom dashboards, automated annotation pipelines, and downstream analysis in tools like Jupyter Notebooks or Snakemake.

Example: Fetching Genomic Sequence with Python

import requests # Define endpoint and parameters url = "https://api.genome.ucsc.edu/getData/sequence" params = # Make the request response = requests.get(url, params=params) data = response.json() # Display the DNA sequence print("Retrieved sequence:", data dna" This snippet requests the sequence from position 1,000,000 to 1,000,100 on the hg38 human genome assembly and returns the raw DNA bases. It illustrates how researchers can access genome content without downloading entire datasets.

Comparison to Other Access Methods

This flexibility makes the REST API ideal for rapid, scriptable access to UCSC’s genomic resources.

Limitations

While the UCSC REST API is highly accessible, it is limited by: * Rate limits and request size constraints * Lack of complex filtering (compared to MySQL or Table Browser advanced queries) * No built-in authentication for sensitive data (e.g., private tracks) For large datasets or bulk analysis, users may still prefer downloading entire tracks or working with the UCSC Genome Browser database locally.

Resources and Documentation

UCSC Genome Browser REST API Documentation

Open source / mirrors

The UCSC Browser code base is open-source for non-commercial use, and is mirrored locally by many research groups, allowing private display of data in the context of the public data. The UCSC Browser is mirrored at several locations worldwide, as shown in the table. The Browser code is also used in separate installations by the UCSC Malaria Genome Browser and the Archaea Browser.

References

External links

*
On-line Training/Tutorials & User's Guides

UCSC Genome tutorials
(videos of

YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...

) {{DEFAULTSORT:Ucsc Genome Browser Bioinformatics software Genome databases National Institutes of Health Computational biology University of California, Santa Cruz

History

Origins and Early Development (2000–2003)

Expansion and Feature Enhancements (2004–2010)

Further Integration with Major Genomic Projects (2011–2015)

Recent Developments and Recognition (2016–Present)

Genomes

Limitations

Browser functionality

Tracks

Analysis tools

Overview

BLAT

Genome Graphs

Lift Over

Python APIs

Overview

Functionality and Use Cases

Example: Fetching Genomic Sequence with Python

Comparison to Other Access Methods

Limitations

Resources and Documentation

Open source / mirrors

See also

References

External links