Maximum Segment Sum Problem
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, the maximum sum subarray problem, also known as the maximum segment sum problem, is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
A ...nof numbers. It can be solved in O(n) time and O(1) space. Formally, the task is to find indices i and j with 1 \leq i \leq j \leq n , such that the sum : \sum_^j A is as large as possible. (Some formulations of the problem also allow the empty subarray to be considered; by convention, the sum of all values of the empty subarray is zero.) Each number in the input array A could be positive, negative, or zero. For example, for the array of values minus;2, 1, −3, 4, −1, 2, 1, −5, 4 the contiguous subarray with the largest sum is , −1, 2, 1 with sum 6. Some properties of this problem are: # If the array contains all non-negative numbers, then the problem is trivial; a maximum subarray is the entire array. # If the array contains all non-positive numbers, then a solution is any subarray of size 1 containing the maximal value of the array (or the empty subarray, if it is permitted). # Several different sub-arrays may have the same maximum sum. Although this problem can be solved using several different algorithmic techniques, including brute force, divide and conquer, dynamic programming, and reduction to shortest paths, a simple single-pass algorithm known as Kadane's algorithm solves it efficiently.


History

The maximum subarray problem was proposed by
Ulf Grenander Ulf Grenander (23 July 1923 – 12 May 2016) was a Swedish statistician and professor of applied mathematics at Brown University. His early research was in probability theory, stochastic processes, time series analysis, and statistical theory (pa ...
in 1977 as a simplified model for
maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
of patterns in digitized images. Grenander was looking to find a rectangular subarray with maximum sum, in a two-dimensional array of real numbers. A brute-force algorithm for the two-dimensional problem runs in ''O''(''n''6) time; because this was prohibitively slow, Grenander proposed the one-dimensional problem to gain insight into its structure. Grenander derived an algorithm that solves the one-dimensional problem in ''O''(''n''2) time using
prefix sum In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers is a second sequence of numbers , the summation, sums of Prefix (computer science), prefixes (running totals) of the input sequence: : : : ...
, improving the brute force running time of ''O''(''n''3). When Michael Shamos heard about the problem, he overnight devised an ''O''(''n'' log ''n'')
divide-and-conquer algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dir ...
for it. Soon after, Shamos described the one-dimensional problem and its history at a
Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...
seminar attended by Jay Kadane, who designed within a minute an ''O''(''n'')-time algorithm, which is as fast as possible.since every algorithm must at least scan the array once which already takes ''O''(''n'') time In 1982, David Gries obtained the same ''O''(''n'')-time algorithm by applying
Dijkstra Dijkstra ( ) is a Dutch language, Dutch family name of West Frisian language, West Frisian origin. It most commonly refers to: * Edsger W. Dijkstra (1930–2002), Dutch computer scientist ** Named after him: Dijkstra's algorithm, Dijkstra Prize, Di ...
's "standard strategy"; in 1989, Richard Bird derived it by purely algebraic manipulation of the brute-force algorithm using the
Bird–Meertens formalism The Bird–Meertens formalism (BMF) is a calculus for deriving programs from program specifications (in a functional programming setting) by a process of equational reasoning. It was devised by Richard Bird and Lambert Meertens as part of their ...
. Grenander's two-dimensional generalization can be solved in O(''n''3) time either by using Kadane's algorithm as a subroutine, or through a divide-and-conquer approach. Slightly faster algorithms based on distance matrix multiplication have been proposed by and by . There is some evidence that no significantly faster algorithm exists; an algorithm that solves the two-dimensional maximum subarray problem in O(''n''3−ε) time, for any ε>0, would imply a similarly fast algorithm for the all-pairs shortest paths problem.


Applications

Maximum subarray problems arise in many fields, such as genomic
sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome ...
and
computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
. Genomic sequence analysis employs maximum subarray algorithms to identify important biological segments of protein sequences that have unusual properties, by assigning scores to points within the sequence that are positive when a motif to be recognized is present, and negative when it is not, and then seeking the maximum subarray among these scores. These problems include conserved segments, GC-rich regions, tandem repeats, low-complexity filter, DNA binding domains, and regions of high charge. In
computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
, bitmap images generally consist only of positive values, for which the maximum subarray problem is trivial: the result is always the whole array. However, after subtracting a threshold value (such as the average pixel value) from each pixel, so that above-average pixels will be positive and below-average pixels will be negative, the maximum subarray problem can be applied to the modified image to detect bright areas within it.;


Kadane's algorithm


No empty subarrays admitted

Kadane's algorithm scans the given array A \ldots n/math> from left to right. In the jth step, it computes the subarray with the largest sum ending at j; this sum is maintained in variable current_sum. Moreover, it computes the subarray with the largest sum anywhere in A \ldots j/math>, maintained in variable best_sum, and easily obtained as the maximum of all values of current_sum seen so far, cf. line 7 of the algorithm. As a
loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding th ...
, in the jth step, the old value of current_sum holds the maximum over all i \in \ of the sum A \cdots+A -1/math>. Therefore, current_sum+A /math> is the maximum over all i \in \ of the sum A \cdots+A /math>. To extend the latter maximum to cover also the case i=j, it is sufficient to consider also the singleton subarray A \; \ldots \; j/math>. This is done in line 6 by assigning \max(A current_sum+A as the new value of current_sum, which after that holds the maximum over all i \in \ of the sum A \cdots+A /math>. Thus, the problem can be solved with the following code, expressed in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
. def max_subarray(numbers): """Find the largest sum of any contiguous subarray.""" best_sum = float('-inf') current_sum = 0 for x in numbers: current_sum = max(x, current_sum + x) best_sum = max(best_sum, current_sum) return best_sum If the input contains no positive element, the returned value is that of the largest element (i.e., the value closest to 0), or negative infinity if the input was empty. For correctness, an exception should be raised when the input array is empty, since an empty array has no maximum nonempty subarray. If the array is nonempty, its first element could be used in place of negative infinity, if needed to avoid mixing numeric and non-numeric values. The algorithm can be adapted to the case which allows empty subarrays or to keep track of the starting and ending indices of the maximum subarray. This algorithm calculates the maximum subarray ending at each position from the maximum subarray ending at the previous position, so it can be viewed as a case of dynamic programming.


Empty subarrays admitted

Kadane's algorithm, as originally published, is for solving the problem variant which allows empty subarrays. In such variant, the answer is 0 when the input contains no positive elements (including when the input is empty). The variant is obtained with two changes in code: in line 3, best_sum should be initialized to 0 to account for the empty subarray A \ldots -1/math> best_sum = 0; and line 6 in the for loop current_sum should be updated to max(0, current_sum + x). current_sum = max(0, current_sum + x) As a
loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding th ...
, in the jth step, the old value of current_sum holds the maximum over all i \in \ of the sum A \cdots+A -1/math>. Therefore, current_sum+A /math> is the maximum over all i \in \ of the sum A \cdots+A /math>. To extend the latter maximum to cover also the case i=j+1, it is sufficient to consider also the empty subarray A +1 \; \ldots \; j/math>. This is done in line 6 by assigning \max(0,current_sum+A as the new value of current_sum, which after that holds the maximum over all i \in \ of the sum A \cdots+A /math>. Machine-verified C /
Frama-C Frama-C is a set of interoperable program analyzers for C programs. The name ''Frama-C'' stands for ''Framework for Modular Analysis of C programs''. Frama-C has been developed by the French Commissariat à l'Énergie Atomique et aux Énergi ...
code of both variants can be found
here Here may refer to: Music * ''Here'' (Adrian Belew album), 1994 * ''Here'' (Alicia Keys album), 2016 * ''Here'' (Cal Tjader album), 1979 * ''Here'' (Edward Sharpe album), 2012 * ''Here'' (Idina Menzel album), 2004 * ''Here'' (Merzbow album), ...
.


Computing the best subarray's position

The algorithm can be modified to keep track of the starting and ending indices of the maximum subarray as well. Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple/trivial example of dynamic programming.


Complexity

The runtime complexity of Kadane's algorithm is O(n) and its space complexity is O(1).


Generalizations

Similar problems may be posed for higher-dimensional arrays, but their solutions are more complicated; see, e.g., . showed how to find the ''k'' largest subarray sums in a one-dimensional array, in the optimal time bound O(n + k). The Maximum sum ''k''-disjoint subarrays can also be computed in the optimal time bound O(n + k) .


See also

*
Subset sum problem The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset S of integers and a target-sum T, and the question is to decide whether any subset of the integers sum to precisely T''.'' ...


Notes


Notes


References

* * *. * * * * * *. * * *. * *


External links

* * * {{Cite web, url=http://cs.slu.edu/~goldwamh/courses/slu/csci314/2012_Fall/lectures/maxsubarray/, title=Notes on Maximum Subarray Problem, date=2012
www.algorithmist.com

alexeigor.wikidot.com

greatest subsequential sum problem on Rosetta Code

geeksforgeeks page on Kadane's Algorithm
Optimization algorithms and methods Dynamic programming Polynomial-time problems Articles with example Python (programming language) code