Sort (Unix)
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, sort is a standard
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
program of
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
and
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r" flag will reverse the sort order.


History

A command that invokes a general sort facility was first implemented within
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of t ...
. Later, it appeared in
Version 1 Unix The term "Research Unix" refers to early versions of the Unix operating system for DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC). History The term ''Resear ...
. This version was originally written by
Ken Thompson Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B programmi ...
at
AT&T Bell Laboratories Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...
. By Version 4 Thompson had modified it to use
pipes Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circula ...
, but sort retained an option to name the output file because it was used to sort a file in place. In Version 5, Thompson invented "-" to represent
standard input In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
. The version of bundled in
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
coreutils The GNU Core Utilities or coreutils is a package of GNU software containing implementations for many of the basic tools, such as cat, ls, and rm, which are used on Unix-like operating systems. In September 2002, the ''GNU coreutils'' were cr ...
was written by Mike Haertel and Paul Eggert. This implementation employs the
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
algorithm. Similar commands are available on many other operating systems, for example a command is part of
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
's ''MSX-DOS2 Tools'' for
MSX-DOS MSX-DOS is a discontinued disk operating system developed by Microsoft for the 8-bit home computer standard MSX, and is a cross between MS-DOS 1.25 and CP/M-80 2. MSX-DOS MSX-DOS and the extended BASIC with 3½-inch floppy disk supp ...
version 2. The command has also been ported to the IBM i operating system.


Syntax

sort PTION..
ILE Ile may refer to: * iLe, a Puerto Rican singer * Ile District (disambiguation), multiple places * Ilé-Ifẹ̀, an ancient Yoruba city in south-western Nigeria * Interlingue (ISO 639:ile), a planned language * Isoleucine, an amino acid * Anothe ...
.. With no FILE, or when FILE is -, the command reads from
standard input In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
.


Parameters


Examples


Sort a file in alphabetical order

$ cat phonebook Smith, Brett 555-4321 Doe, John 555-1234 Doe, Jane 555-3214 Avery, Cory 555-4132 Fogarty, Suzie 555-2314 $ sort phonebook Avery, Cory 555-4132 Doe, Jane 555-3214 Doe, John 555-1234 Fogarty, Suzie 555-2314 Smith, Brett 555-4321


Sort by number

The -n option makes the program sort according to numerical value. The command produces output that starts with a number, the file size, so its output can be piped to to produce a list of files sorted by (ascending) file size: $ du /bin/* , sort -n 4 /bin/domainname 24 /bin/ls 102 /bin/sh 304 /bin/csh The command with the option prints file sizes in the 7th field, so a list of the files sorted by file size is produced by: $ find . -name "*.tex" -ls , sort -k 7n


Columns or fields

Use the -k option to sort on a certain column. For example, use "-k 2" to sort on the second column. In old versions of sort, the +1 option made the program sort on the second column of data (+2 for the third, etc.). This usage is deprecated. $ cat zipcode Adam 12345 Bob 34567 Joe 56789 Sam 45678 Wendy 23456 $ sort -k 2n zipcode Adam 12345 Wendy 23456 Bob 34567 Sam 45678 Joe 56789


Sort on multiple fields

The -k m,n option lets you sort on a key that is potentially composed of multiple fields (start at column m, end at column n): $ cat quota fred 2000 bob 1000 an 1000 chad 1000 don 1500 eric 500 $ sort -k2,2n -k1,1 quota eric 500 an 1000 bob 1000 chad 1000 don 1500 fred 2000 Here the first sort is done using column 2. -k2,2n specifies sorting on the key starting and ending with column 2, and sorting numerically. If -k2 is used instead, the sort key would begin at column 2 and extend to the end of the line, spanning all the fields in between. -k1,1 dictates breaking ties using the value in column 1, sorting alphabetically by default. Note that bob, and chad have the same quota and are sorted alphabetically in the final output.


Sorting a pipe delimited file

$ sort -k2,2,-k1,1 -t', ' zipcode Adam, 12345 Wendy, 23456 Sam, 45678 Joe, 56789 Bob, 34567


Sorting a tab delimited file

Sorting a file with
tab separated values A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure, e.g., a database table or spreadsheet data, and a way of exchanging information between databases. Each record in the table is one line of the text ...
requires a
tab character The tab key (abbreviation of tabulator key or tabular key) on a keyboard is used to advance the cursor to the next tab stop. History The word ''tab'' derives from the word ''tabulate'', which means "to arrange data in a tabular, or table, fo ...
to be specified as the column delimiter. This illustration uses the shell's dollar-quote notation to specify the tab as a C escape sequence. $ sort -k2,2 -t $'\t' phonebook Doe, John 555-1234 Fogarty, Suzie 555-2314 Doe, Jane 555-3214 Avery, Cory 555-4132 Smith, Brett 555-4321


Sort in reverse

The -r option just reverses the order of the sort: $ sort -rk 2n zipcode Joe 56789 Sam 45678 Bob 34567 Wendy 23456 Adam 12345


Sort in random

The GNU implementation has a -R --random-sort option based on hashing; this is not a full random shuffle because it will sort identical lines together. A true random sort is provided by the Unix utility
shuf is a command-line utility included in the textutils package of GNU Core Utilities for creating a standard output consisting of random permutations of the input. The version of shuf bundled in GNU coreutils The GNU Core Utilities or coreutils ...
.


Sort by version

The GNU implementation has a -V --version-sort option which is a natural sort of (version) numbers within text. Two text strings that are to be compared are split into blocks of letters and blocks of digits. Blocks of letters are compared alpha-numerically, and blocks of digits are compared numerically (i.e., skipping leading zeros, more digits means larger, otherwise the leftmost digits that differ determine the result). Blocks are compared left-to-right and the first non-equal block in that loop decides which text is larger. This happens to work for IP addresses, Debian package version strings and similar tasks where numbers of variable length are embedded in strings.


See also

*
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fili ...
*
List of Unix commands This is a list of Unix commands as specified by IEEE Std 1003.1-2008, which is part of the Single UNIX Specification (SUS). These commands can be found on Unix operating systems and most Unix-like operating systems. List See also * List of G ...
*
uniq uniq is a utility command (computing), command on Unix, Plan 9 from Bell Labs, Plan 9, Inferno (operating system), Inferno, and Unix-like operating systems which, when fed a text file or Standard streams#Standard input (stdin), standard input, o ...
*
shuf is a command-line utility included in the textutils package of GNU Core Utilities for creating a standard output consisting of random permutations of the input. The version of shuf bundled in GNU coreutils The GNU Core Utilities or coreutils ...


References


Further reading

* *


External links


Original Sort manpage
The original BSD Unix program's
manpage A man page (short for manual page) is a form of software documentation usually found on a Unix or Unix-like operating system. Topics covered include computer programs (including library and system calls), formal standards and conventions, and ev ...
* * *
Further details about sort at Softpanorama
{{Core Utilities commands Computing commands Sorting algorithms Unix text processing utilities Unix SUS2008 utilities Plan 9 commands Inferno (operating system) commands IBM i Qshell commands