EB-eye
   HOME

TheInfoList



OR:

EBI Search is a scalable text
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
that provides easy and uniform access to the biological data resources and services hosted at the European Bioinformatics Institute (EBI). The original and primary purpose of EBI Search is to provide search and indexing capabilities of publicly available biological data, thus enabling research in the fields of
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and
life science Life, also known as biota, refers to matter that has biological processes, such as signaling and self-sustaining processes. It is defined descriptively by the capacity for homeostasis, organisation, metabolism, growth, adaptation, respon ...
s by supporting both basic research and the broader scientific community by making biological data easily accessible and searchable. In addition to the EBI Search website, a RESTful API interface is available, enabling programmatic data queries. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines.


History

The EBI Search project was developed in August 2006 at the European Bioinformatics Institute as software under the name ''EB-eye'' on top of the existing
Apache Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
open-source search engine. The project was soon expanded to include more than 62 distinct datasets, covering about 400 million entries and was renamed to ''EBI Search''. In 2017, EBI Search was improved by implementing "search as a service" through a RESTful API that let other websites integrate its search capabilities into their platforms, eliminating the need to build separate search systems. The service was also enhanced with features like hierarchical taxonomy navigation and similar-entry suggestions, while scaling to handle over 300 million searches and 1.3 billion records that could be re-indexed in under 24 hours. In 2019, EBI Search was further developed to include a new HTTP cache mechanism improving response times, unlimited cross-references retrieval, support for Cross-Origin Resource Sharing (CORS), and integration of new data resources like Europe PMC, BioSamples, Rfam, and reviewed ChEMBL. During the
COVID-19 pandemic The COVID-19 pandemic (also known as the coronavirus pandemic and COVID pandemic), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began with an disease outbreak, outbreak of COVID-19 in Wuhan, China, in December ...
, the project was updated to handle increased data needs. At present, the EBI Search engine indexes more than 140 different data resources, making it one of the most comprehensive search tools for biological and biomedical data.


Data resources

EMBL-EBI hosts a vast amount of molecular data and other information that is indexed by EBI Search. The search engine indexes data from various data resources. All these resources are freely available and regularly updated through EMBL-EBI's data management pipeline. The EBI Search can search only the information that gets indexed. This implies that other search engines operating on biological data might yield different results. As a rule of thumb, the EBI Search engine indexes identifiers, names, descriptions, keywords and cross-references. The indexed data includes
nucleotide sequence A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
s and
protein sequence Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthe ...
s,
protein families A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be c ...
, structural data,
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
profiles, protein interactions, biological pathways, and
small molecules In molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs are small molecules; t ...
. Additionally, EBI Search indexes
academic literature Academic publishing is the subfield of publishing which distributes Research, academic research and scholarship. Most academic work is published in academic journal articles, books or Thesis, theses. The part of academic written output that is n ...
,
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...
s, and institutional information.


Search interface

When users enter text into EBI Search interfaces - whether through the search boxes or by specifying the query parameter in RESTful API calls - their input gets converted into a standardized search query format. This converted query is what actually retrieves the search results.


Searching using the website

The user can search globally across all data resources indexed by EBI search by using the EBI search box. You can simply type some query terms into the text search box there and press the search button (or press Enter). The user can thus search globally across all EBI Search data resources. The system then displays a summary page with a list of various data sets and the number of matches found in each of them. In EBI Search boxes you can enter any meaningful term to find relevant information by typing, for example, accession numbers/identifiers (such as ''VAV_HUMAN''), gene symbols (for instance ''tpi1''), species or keywords.


Search results

The EBI Search website presents results in a three-column layout designed for efficient data exploration. The left column displays a summary of hits per category/domain with customizable facets for filtering results. The central column lists the primary search results with direct URLs to original data entries. The right column shows related data and alternative views. For gene and protein queries, specialized "Gene & protein summaries" appear above the main results, collating data from multiple EMBL-EBI resources according to molecular biology's central dogma.


= Features and tools

= Users can interact with search results in several ways: * Data Export: Results can be downloaded in multiple formats (XML, JSON, TSV, CSV) using the 'Save result' button, with a current limit of 100 entries per download *Analysis Tools: Direct launching of domain-specific tools (e.g., BLAST for sequence analysis, Clustal Omega for multiple sequence alignment) from selected search results *RSS Alerts: Users can create RSS feeds to monitor updates to their search queries, particularly useful for tracking new publications, protein entries, or structural data *Cross-References: Results include links to related entries across different EMBL-EBI databases, facilitating comprehensive data exploration


= Result relevance

= Search result ordering primarily follows Apache Lucene's scoring system, where closer matches receive higher relevance scores. Users can influence result ranking using the caret symbol (^) followed by a boost factor—for example, "prostate^4 AND cancer" gives greater weight to entries matching "prostate". While EBI Search can be configured to boost specific domains or fields, runtime boosting is recommended for most precise control over result ordering.


Searching using the API

The EBI Search provides RESTful Web Services that allow programmatic access to biological data from the EBI Search data resources. This service is particularly useful for researchers and developers who wish to include EBI Search results into their code pipelines or to simply use it with a custom developed interface. The implementation details and webinar can be found on official EMBL EBI sites. Users can interact with the API through various endpoints supporting different response formats including
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
,
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
,
RSS RSS ( RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many ...
, and CSV. The service enables
faceted search Faceted search augments lexical search with a faceted navigation system, allowing users to narrow results by applying filters based on a faceted classification of the items. It is a parametric search technique. A faceted classification system clas ...
ing, cross-reference searching, and auto-completion functionality across multiple databases. The API currently follows Apache Lucene query syntax and returns appropriate HTTP status codes to indicate the success or failure of requests.


References

*{{cite web , url=http://journal.embnet.org/index.php/embnetnews/article/view/88/99 , title=EMBnet.News (Volume 14, Nr. 1, December 2007), work=EMBnetNews , date=December 2007 , accessdate=1 April 2009, ref=refEmblNews


External links


Official EBI Search course

EBI Search engine documentation

EBI Search RESTful API documentation

EBI Job Dispatcher Documentation

EMBL-EBI Programmatic Access Training

The Apache Lucene project

UniProt web site

GMOD Generic Software Components for Model Organisms Databases
Bioinformatics organizations Scientific databases Biological databases Internet search engines Science and technology in Cambridgeshire South Cambridgeshire District