SELECT
is the most commonly used data manipulation language (DML) command. As SQL is a declarative programming language, SELECT
queries specify a result set, but do not specify how to calculate it. The database translates the query into a " query plan" which may vary between executions, database versions and database software. This functionality is called the "SELECT
clause is the list of AS
optionally provides an alias for each column or expression in the SELECT
clause. This is the relational algebra rename operation.
* FROM
specifies from which table to get the data.
* WHERE
specifies which rows to retrieve. This is approximately the relational algebra GROUP BY A GROUP BY statement in SQL specifies that a SQL SELECT statement partitions result rows into groups, based on their values in one or several columns. Typically, grouping is used to apply some sort of aggregate function for each group.
The result ...
groups rows sharing a property so that an HAVING
selects among the groups defined by the GROUP BY clause.
* ORDER BY
specifies how to order the returned rows.
Overview
SELECT
is the most common operation in SQL, called "the query". SELECT
retrieves data from one or more tables, or expressions. Standard SELECT
statements have no persistent effects on the database. Some non-standard implementations of SELECT
can have persistent effects, such as the SELECT INTO
syntax provided in some databases.
Queries allow the user to describe desired data, leaving the database management system (DBMS) to carry out SELECT
keyword. An asterisk ("*
") can be used to specify that the query should return all columns of the queried tables. SELECT
is the most complex statement in SQL, with optional keywords and clauses that include:
* The FROM
clause, which indicates the table(s) to retrieve data from. The FROM
clause can include optional JOIN
subclauses to specify the rules for joining tables.
* The WHERE
clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE
clause eliminates all rows from the result set where the comparison predicate does not evaluate to True.
* The GROUP BY
clause projects rows having common values into a smaller set of rows. GROUP BY
is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE
clause is applied before the GROUP BY
clause.
* The HAVING
clause includes a predicate used to filter rows resulting from the GROUP BY
clause. Because it acts on the results of the GROUP BY
clause, aggregation functions can be used in the HAVING
clause predicate.
* The ORDER BY
clause identifies which column to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an ORDER BY
clause, the order of rows returned by an SQL query is undefined.
* The DISTINCT
keyword eliminates duplicate data.
The following example of a SELECT
query returns a list of expensive books. The query retrieves all rows from the ''Book'' table in which the ''price'' column contains a value greater than 100.00. The result is sorted in ascending order by ''title''. The asterisk (*) in the ''select list'' indicates that all columns of the ''Book'' table should be included in the result set.
Subqueries
Queries can be nested so that the results of one query can be used in another query via a relational operator or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases, the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation functionAVG
receives as input the result of a subquery:
Derived table
A derived table is the use of referencing an SQL subquery in a FROM clause. Essentially, the derived table is a subquery that can be selected from or joined to. Derived table functionality allows the user to reference the subquery as a table. The derived table also is referred to as an ''inline view'' or a ''select in from list''. In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books):Examples
Given a table T, the ''query'' will result in all the elements of all the rows of the table being shown. With the same table, the query will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a '' projection'' in relational algebra, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns. With the same table, the query will result in all the elements of all the rows where the value of column C1 is '1' being shown in relational algebra terms, a ''Limiting result rows
Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor. In ISO SQL:2003, result sets may be limited by using * cursors, or * by adding a SQL window function to the SELECT-statement ISO SQL:2008 introduced theFETCH FIRST
clause.
According to PostgreSQL v.9 documentation, an SQL window function "performs a calculation across a set of table rows that are somehow related to the current row", in a way similar to aggregate functions.
The name recalls signal processing window functions. A window function call always contains an OVER clause.
ROW_NUMBER() window function
ROW_NUMBER() OVER
may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows:
RANK() window function
TheRANK() OVER
window function acts like ROW_NUMBER, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons:
FETCH FIRST clause
Since ISO SQL:2008 results limits can be specified as in the following example using theFETCH FIRST
clause.
FETCH FIRST
ORDER BY
clause. The ORDER BY
, OFFSET
, and FETCH FIRST
clauses are all required for this usage.
Non-standard syntax
Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the ''simple limit'' query for different DBMSes are listed:Rows Pagination
Rows Pagination is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page (a limited set of rows, per example only 10 rows), and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.Data in Pagination approach
*
= Number of rows in a page
*
= Number of the current page
*
= Number of the row - 1 where the page starts = (page_number-1) * rows
Simplest method (but very inefficient)
# Select all rows from the database # Read all rows but send to display only when the row_number of the rows read is between
and
Other simple method (a little more efficient than read all rows)
# Select all the rows from the beginning of the table to the last row to display (
)
# Read the
rows but send to display only when the row_number of the rows read is greater than
Method with positioning
# Select only
rows starting from the next row to display (
)
# Read and send to display all the rows read from the database
Method with filter (it is more sophisticated but necessary for very big dataset)
# Select only then
rows with filter:
## First Page: select only the first
rows, depending on the type of database
## Next Page: select only the first
rows, depending on the type of database, where the
is greater than
(the value of the
of the last row in the current page)
## Previous Page: sort the data in the reverse order, select only the first
rows, where the
is less than
(the value of the
of the first row in the current page), and sort the result in the correct order
# Read and send to display all the rows read from the database
Hierarchical query
Some databases provide specialised syntax forQuery evaluation ANSI
The processing of a SELECT statement according to ANSI SQL would be the following:Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, and Dejan SarkaWindow function support by RDBMS vendors
The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system (Hadoop, Spark, Google BigQuery) where we have weaker data co-locality guarantees than on a distributed relational database (MPP). Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.Generating data in T-SQL
Method to generate data based on the union allReferences
Sources
* Horizontal & Vertical Partitioning, Microsoft SQL Server 2000 Books Online.External links