In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, zipping is a function which maps a
tuple
In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
of
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
s into a
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
of
tuple
In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s. This name zip derives from the action of a
zipper
A zipper, zip, fly, or zip fastener, formerly known as a clasp locker, is a commonly used device for binding together two edges of fabric or other flexible material. Used in clothing (e.g. jackets and jeans), luggage and other bags, camping ...
in that it interleaves two formerly disjoint sequences. The inverse function is ''unzip''.
Example
Given the three words ''cat'', ''fish'' and ''be'' where , ''cat'', is 3, , ''fish'', is 4 and , ''be'', is 2. Let
denote the length of the longest word which is ''fish'';
. The zip of ''cat'', ''fish'', ''be'' is then 4 tuples of elements:
:
where ''#'' is a symbol not in the original alphabet. In
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
this truncates to the shortest sequence
, where
:
zip3 "cat" "fish" "be"
-- 'c','f','b'),('a','i','e')
Definition
Let Σ be an
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
, # a symbol not in Σ.
Let ''x''
1''x''
2... ''x''
, ''x'', , ''y''
1''y''
2... ''y''
, ''y'', , ''z''
1''z''
2... ''z''
, ''z'', , ... be ''n''
words
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
(i.e. finite
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
s) of elements of Σ. Let
denote the length of the longest word, i.e. the maximum of , ''x'', , , ''y'', , , ''z'', , ... .
The zip of these words is a finite sequence of ''n''-tuples of elements of , i.e. an element of
:
:
,
where for any index , the ''w
i'' is #.
The zip of ''x, y, z, ...'' is denoted zip(''x, y, z, ...'') or ''x'' ⋆ ''y'' ⋆ ''z'' ⋆ ...
The inverse to zip is sometimes denoted unzip.
A variation of the zip operation is defined by:
:
where
is the ''minimum'' length of the input words. It avoids the use of an adjoined element
, but destroys information about elements of the input sequences beyond
.
In programming languages
Zip
functions are often available in
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s, often referred to as . In
Lisp
A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech.
Types
* A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispi ...
-dialects one can simply the desired function over the desired lists, is
variadic In computer science, an operator or function is variadic if it can take a varying number of arguments; that is, if its arity is not fixed.
For specific articles, see:
* Variadic function
* Variadic macro in the C preprocessor
* Variadic template
* ...
in Lisp so it can take an arbitrary number of lists as argument. An example from
Clojure
Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is ...
:
;; `nums' contains an infinite list of numbers (0 1 2 3 ...)
(def nums (range))
(def tens 0 20 30
(def firstname "Alice")
;; To zip (0 1 2 3 ...) and 0 20 30into a vector, invoke `map vector' on them; same with list
(map vector nums tens) ; ⇒ ( 10 20 30
(map list nums tens) ; ⇒ ((0 10) (1 20) (2 30))
(map str nums tens) ; ⇒ ("010" "120" "230")
;; `map' truncates to the shortest sequence; note missing \c and \e from "Alice"
(map vector nums tens firstname) ; ⇒ ( 10 \A 20 \l 30 \i
(map str nums tens firstname) ; ⇒ ("010A" "120l" "230i")
;; To unzip, apply `map vector' or `map list'
(apply map list (map vector nums tens firstname))
;; ⇒ ((0 1 2) (10 20 30) (\A \l \i))
In
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
:
(defparameter nums '(1 2 3))
(defparameter tens '(10 20 30))
(defparameter firstname "Alice")
(mapcar #'list nums tens)
;; ⇒ ((1 10) (2 20) (3 30))
(mapcar #'list nums tens (coerce firstname 'list))
;; ⇒ ((1 10 #\A) (2 20 #\l) (3 30 #\i)) — truncates on shortest list
;; Unzips
(apply #'mapcar #'list (mapcar #'list nums tens (coerce firstname 'list)))
;; ⇒ ((1 2 3) (10 20 30) (#\A #\l #\i))
Languages such as
Python provide a function, older version (Python 2.*) allowed mapping over lists to get a similar effect.
[map(function, iterable, ...)](_blank)
from section Built-in Functions from Python v2.7.2 documentation in conjunction with the operator unzips a list:
>>> nums = , 2, 3
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline of ...
>>> tens = 0, 20, 30
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
>>> firstname = 'Alice'
>>> zipped = zip(nums, tens)
>>> zipped
1, 10), (2, 20), (3, 30)
>>> zip(*zipped) # unzip
1, 2, 3), (10, 20, 30)
Onekama ( ) is a village in Manistee County in the U.S. state of Michigan. The population was 411 at the 2010 census. The village is located on the shores of Portage Lake and is surrounded by Onekama Township. The town's name is derived from "O ...
>>> zipped2 = zip(nums, tens, list(firstname))
>>> zipped2 # zip, truncates on shortest
1, 10, 'A'), (2, 20, 'l'), (3, 30, 'i')
>>> zip(*zipped2) # unzip
1, 2, 3), (10, 20, 30), ('A', 'l', 'i')
>>> # mapping with `None' doesn't truncate; deprecated in Python 3.*
>>> map(None, nums, tens, list(firstname))
1, 10, 'A'), (2, 20, 'l'), (3, 30, 'i'), (None, None, 'c'), (None, None, 'e')
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
has a method of zipping sequences but requires a specific function for each
arity
Arity () is the number of arguments or operands taken by a function, operation or relation in logic, mathematics, and computer science. In mathematics, arity may also be named ''rank'', but this word can have many other meanings in mathematics. In ...
( for two sequences, for three etc.),
zip :: [a
-> [b">">zip :: [a
-> [b-> [(a, b)"><_a><br>->_[b.html" ;"title="">zip :: [a
-> [b">">zip :: [a
-> [b-> [(a, b)/nowiki>] from Prelude, Basic libraries similarly the functions and are available for unzipping:
-- nums contains an infinite list of numbers [1, 2, 3, ...]
nums = [1..]
tens = 0, 20, 30
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
firstname = "Alice"
zip nums tens
-- ⇒ 1,10), (2,20), (3,30)— zip, truncates infinite list
unzip $ zip nums tens
-- ⇒ (,2,3
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
0,20,30
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline of ...
— unzip
zip3 nums tens firstname
-- ⇒ 1,10,'A'), (2,20,'l'), (3,30,'i')— zip, truncates
unzip3 $ zip3 nums tens firstname
-- ⇒ (,2,3
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
0,20,30
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline of ...
"Ali") — unzip
Language comparison
List of languages by support of zip:
{, class="wikitable"
, + Unzip in various languages
, -
! scope="col" , Language
! scope="col" , Unzip
! scope="col" , Unzip 3 tuples
! scope="col" , Unzip ''n'' tuples
! scope="col" , Notes
, -
! scope="row" , Clojure
Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is ...
,
,
,
,
, -
! scope="row" , Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
,
,
,
,
, -
! scope="row" , F#
,
,
,
,
, -
! scope="row" , Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
,
,
, for ''n'' > 3 is available in the module
, -
! scope="row" , Python
,
,
, {{mono, zip(*''zipvlist'')
,
See also
* Map (higher-order function)
In many programming languages, map is the name of a higher-order function that applies a given function to each element of a collection, e.g. a list or set, returning the results in a collection of the same type. It is often called ''apply ...
References
Articles with example Haskell code
Articles with example Lisp (programming language) code
Articles with example Clojure code
Articles with example Python (programming language) code