HOME

TheInfoList



OR:

A search engine is an
information retrieval system Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text search, ful ...
designed to help find information stored on a
computer system A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
. The search results are usually presented in a list and are commonly called ''hits''. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing
information overload Information overload (also known as infobesity, infoxication, information anxiety, and information explosion) is the difficulty in understanding an issue and effectively making decisions when one has too much information (TMI) about that issue, ...
. The most public, visible form of a search engine is a
Web search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
which searches for information on the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
.


How search engines work

Search engines provide an
interface Interface or interfacing may refer to: Academic journals * ''Interface'' (journal), by the Electrochemical Society * '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics'' * '' Int ...
to a group of items that enables users to specify criteria about an item of interest and have the engine find the matching items. The criteria are referred to as a search query. In the case of text search engines, the search query is typically expressed as a set of words that identify the desired
concept Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by s ...
that one or more
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
s may contain. There are several styles of search query syntax that vary in strictness. It can also switch names within the search engines from previous sites. Whereas some text search engines require users to enter two or three words separated by white space, other search engines may enable users to specify entire documents, pictures, sounds, and various forms of
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languag ...
. Some search engines apply improvements to search queries to increase the likelihood of providing a quality set of items through a process known as query expansion. Query understanding methods can be used as standardize query language. The list of items that meet the criteria specified by the query is typically sorted, or ranked. Ranking items by relevance (from highest to lowest) reduces the time required to find the desired information.
Probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
search engines rank items based on measures of similarity (between each item and the query, typically on a scale of 1 to 0, 1 being most similar) and sometimes
popularity In sociology, popularity is how much a person, idea, place, item or other concept is either liked or accorded status by other people. Liking can be due to reciprocal liking, interpersonal attraction, and similar factors. Social status can be ...
or authority (see
Bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
) or use
relevance feedback Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not th ...
.
Boolean Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false" ...
search engines typically only return items which match exactly without regard to order, although the term ''boolean search engine'' may simply refer to the use of boolean-style syntax (the use of operators AND, OR, NOT, and XOR) in a probabilistic context. To provide a set of matching items that are sorted according to some criteria quickly, a search engine will typically collect metadata about the group of items under consideration beforehand through a process referred to as indexing. The index typically requires a smaller amount of
computer storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a comput ...
, which is why some search engines only store the indexed information and not the full content of each item, and instead provide a method of navigating to the items in the search engine result page. Alternatively, the search engine may store a copy of each item in a cache so that users can see the state of the item at the time it was indexed or for archive purposes or to make repetitive processes work more efficiently and quickly. Other types of search engines do not store an index. Crawler, or spider type search engines (a.k.a. real-time search engines) may collect and assess items at the time of the search query, dynamically considering additional items based on the contents of a starting item (known as a seed, or seed URL in the case of an Internet crawler).
Meta search engine A metasearch engine (or search aggregator) is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. ...
s store neither an index nor a cache and instead simply reuse the index or results of one or more other search engine to provide an aggregated, final set of results. Database size, which had been a significant marketing feature through the early 2000s, was similarly displaced by emphasis on relevancy ranking, the methods by which search engines attempt to sort the best results first. Relevancy ranking first became a major issue circa 1996, when it became apparent that it was impractical to review full lists of results. Consequently,
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for relevancy ranking have continuously improved. Google's
PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordi ...
method for ordering the results has received the most press, but all major search engines continually refine their ranking methodologies with a view toward improving the ordering of results. As of 2006, search engine rankings are more important than ever, so much so that an industry has developed (" search engine optimizers", or "SEO") to help web-developers improve their search ranking, and an entire body of
case law Case law, also used interchangeably with common law, is law that is based on precedents, that is the judicial decisions from previous cases, rather than law based on constitutions, statutes, or regulations. Case law uses the detailed facts of a ...
has developed around matters that affect search engine rankings, such as use of
trademarks A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from ot ...
in
metatags Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can ...
. The sale of search rankings by some search engines has also created controversy among librarians and consumer advocates. Search engine experience for users continues to be enhanced. Google's addition of the Google Knowledge Graph has had wider ramifications for the Internet, possibly even limiting certain websites traffic, for example Wikipedia. By pulling information and presenting it on Google's page, some argue that it can negatively affect other sites. However, there have been no major concerns.


Types of search engines

; By source *
Desktop search Desktop search tools search within a user's own computer files as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, image ...
*
Federated search Federated search retrieves information from a variety of sources via a search application built on top of one or more search engines. A user makes a single query request which is distributed to the search engines, databases or other query engine ...
* Human search engine *
Metasearch engine A metasearch engine (or search aggregator) is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. ...
* Multisearch * Search aggregator *
Web search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
; By content type * Audio search engine * Full text search * Image search * Video search engine ; By interface *
Incremental search In computing, incremental search, hot search, incremental find or real-time suggestions is a user interface interaction method to progressively search for and filter through text. As the user types text, one or more possible matches for the text ...
*
Instant answer An instant answer is an answer supplied by a search engine in response to a search query, without the user having to navigate away from the search results. Instant answers have been used by many search engines, such as Google, DuckDuckGo, Bi ...
* Semantic search * Selection-based search * Voice Search ; By topic *
Bibliographic database A bibliographic database is a database of bibliographic records, an organized digital collection of references to published literature, including journal and newspaper articles, conference proceedings, reports, government and legal publications ...
*
Enterprise search Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. "Enterprise search" is used to describe the software of search information within an ente ...
*
Medical literature retrieval Medical literature retrieval or medical document retrieval is an activity that uses professional methods for medical research papers retrieval, report and other data to improve medicine research and practice. Medical search engine Professional med ...
* Vertical search


See also

*
Automatic summarization Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commo ...
*
Emanuel Goldberg Emanuel Goldberg ( he, עמנואל גולדברג; yi, עמנואל גאָלדבערג; russian: Эмануэль Гольдберг) (born: 31 August 1881; died: 13 September 1970) was an Israeli physicist and inventor. He was born in Moscow a ...
(inventor of early search engine) *
Index (search engine) Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, an ...
* Inverted index * List of search engines * Search as a service *
Search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic (known as "natural" or " organic" results) rather than di ...
*
Search suggest drop-down list A search suggest drop-down list is a query feature used in computing to show the searcher shortcuts, while the query is typed into a text box. Before the query is complete, a drop-down list with the suggested completions appears to provide opti ...
* Solver (computer science) *
Spamdexing Spamdexing (also known as search engine spam, search engine poisoning, black-hat search engine optimization, search spam or web spam) is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link buildin ...
* SQL * Text mining


References

{{DEFAULTSORT:Search Engine (Computing) Information retrieval systems