Dynamic programming is an efficient technique for solving many combinatorial optimization problems in a polynomial time.
Dynamic programming is both a mathematical optimization method and a computer programming method. In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner [1]. There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping sub-problems.
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem [3]. Commonly, a binomial coefficient is indexed by a pair of integers $ n ≥ k ≥ 0 $ and is written $ \tbinom {n}{k} $. It is the coefficient of the $ x^k $ term in the polynomial expansion of the binomial power $ (1 + x)^n $, and it is given by the formula:
$$ \tbinom {n}{k} = \frac {n!}{k!(n-k)!} \tag{1}$$# Load the Python libraries
import timeit
import math
import pandas as pd
# Example values
n = 25
k = 15
# Binomial coefficient from the mathematical formula
def bin_coef_1(n, k):
return int(math.factorial(n) / (math.factorial(k) * math.factorial(n - k)))
start_time = timeit.default_timer()
print(bin_coef_1(n, k))
print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')
3268760 >> elapsed time 0.3633999999999027 ms
# The recursive natural solution
def bin_coef_2(n, k):
if k == 0 or k == n:
return 1
return bin_coef_2(n - 1, k - 1) + bin_coef_2(n - 1, k)
start_time = timeit.default_timer()
print(bin_coef_2(n, k))
print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')
3268760 >> elapsed time 1602.7969 ms
# Solution with dynamic programming (supported by a table)
def bin_coef_3(n, k):
c = 0
v = [1] * (k + 1)
for i in range(n + 1):
for j in range(k, 0, -1):
if j < i:
v[j] = v[j - 1] + v[j]
return v[k]
start_time = timeit.default_timer()
print(bin_coef_3(n, k))
print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')
3268760 >> elapsed time 1.0575999999997698 ms
With time complexity of $ \Theta(nk) $ and a space complexity of $ \Theta(k) $.
# Example values
n = 10
p = 0.55
q = 1 - p
# The recursive natural solution
def WCP(i, j):
if i == 0:
return 1
elif j == 0:
return 0
return p * WCP(i - 1, j) + q * WCP(i, j - 1)
start_time = timeit.default_timer()
print(WCP(n, n))
print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')
0.6710359124216079 >> elapsed time 152.84509999999952 ms
# Solution with dynamic programming (supported by a table)
def WCP2(n, p):
n = n + 1
q = 1 - p
prob = [[0] * n for i in range(n)]
for s in range(n):
prob[0][s] = 1
for k in range(1, s):
prob[k][s - k] = p * prob[k - 1][s - k] + q * prob[k][s - k - 1]
for s in range(1, n):
for k in range(0, n - s):
prob[s + k][n - k - 1] = p * prob[s + k - 1][n - k - 1] + q * prob[s + k][n - k - 2]
return prob[n - 1][n - 1]
start_time = timeit.default_timer()
print(WCP2(n, p))
print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')
0.6710359124216079 >> elapsed time 3.256300000000323 ms
With time complexity of $ \Theta(n^2) $ and a space complexity of $ \Theta(n^2) $.
The coin-change problem or change-making problem addresses the question of finding the minimum number of coins (of certain denominations) that add up to a given amount of money. It is a special case of the integer knapsack problem, and has applications wider than just currency [4].
Version with unlimited supply of coins.
def calc_coin_change(N, d):
n = len(d)
matrix = [[0] * (N + 1) for i in range(n)]
for i in range(0, n):
for j in range(1, N + 1):
if i == 0 and j < d[i]:
matrix[i][j] = math.inf
elif i == 0:
matrix[i][j] = 1 + matrix[0][j - d[0]]
elif j < d[i]:
matrix[i][j] = matrix[i - 1][j]
else:
matrix[i][j] = min(matrix[i - 1][j], 1 + matrix[i][j - d[i]])
return matrix
# Example values
N = 8
d = [1, 4, 6]
# Showing results
dp_table = calc_coin_change(N, d)
pd.DataFrame(dp_table, index=d)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
4 | 0 | 1 | 2 | 3 | 1 | 2 | 3 | 4 | 2 |
6 | 0 | 1 | 2 | 3 | 1 | 2 | 1 | 2 | 2 |
With time complexity of $ \Theta(nN) $ and a space complexity of $ \Theta(n(N + 1)) $.
Greedy approach
def get_coins_list(c, d, N, verbose=False):
coins_list = []
i = len(d) - 1
j = N
while i > -1 and j > -1:
if verbose:
print(i, j)
if i - 1 >= 0 and c[i][j] == c[i - 1][j]:
i = i - 1
elif j - d[i] >= 0 and c[i][j] == 1 + c[i][j - d[i]]:
coins_list.append(d[i])
j = j - d[i]
else:
break
return coins_list
# List of coins for each scenario
for j in range(0, N + 1):
print(j, '->', get_coins_list(dp_table, d, j))
0 -> [] 1 -> [1] 2 -> [1, 1] 3 -> [1, 1, 1] 4 -> [4] 5 -> [4, 1] 6 -> [6] 7 -> [6, 1] 8 -> [4, 4]
With time complexity of $ \Theta(n + c[n, N]) $ and a space complexity of $ \Theta(n(N + 1)) $.
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit W and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items [5].
def calc_best_knapsack(w, v, W):
n = len(v)
matrix = [[0] * (W + 1) for i in range(n)]
for i in range(0, n):
for j in range(1, W + 1):
if i == 0 and j < w[i]:
matrix[i][j] = -math.inf
elif i == 0:
matrix[i][j] = v[i]
elif j < w[i]:
matrix[i][j] = matrix[i - 1][j]
else:
matrix[i][j] = max(matrix[i - 1][j], matrix[i - 1][j - w[i]] + v[i])
return matrix
# Example values
w = [1, 2, 5, 6, 7]
v = [1, 6, 18, 22, 28]
max_weight = 11
# Run algorithm
dp_table = calc_best_knapsack(w, v, max_weight)
df_index = ["w:" + str(w[i]) + ", v:" + str(v[i]) for i in range(len(v))]
pd.DataFrame(dp_table, index=df_index)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
w:1, v:1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
w:2, v:6 | 0 | 1 | 6 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
w:5, v:18 | 0 | 1 | 6 | 7 | 7 | 18 | 19 | 24 | 25 | 25 | 25 | 25 |
w:6, v:22 | 0 | 1 | 6 | 7 | 7 | 18 | 22 | 24 | 28 | 29 | 29 | 40 |
w:7, v:28 | 0 | 1 | 6 | 7 | 7 | 18 | 22 | 28 | 29 | 34 | 35 | 40 |
With time complexity of $ \Theta(nW) $ and a space complexity of $ \Theta(n(W + 1)) $.
Greedy approach
def get_items_list(values, v, w, W, verbose=False):
item_list = []
i = len(w) - 1
j = W
while i > -1 and j > -1:
if verbose:
print(i, j)
if i - 1 >= 0 and values[i][j] == values[i - 1][j]:
i = i - 1
elif i - 1 >= 0 and j - w[i] >= 0 and values[i][j] == values[i - 1][j - w[i]] + v[i]:
item = { "w": w[i], "v": v[i] }
item_list.append(item)
j = j - w[i]
i = i - 1
elif i == 0 and values[i][j] == v[i]:
item = { "w": w[i], "v": v[i] }
item_list.append(item)
break
else:
break
return item_list
# List of coins for each scenario
for j in range(0, max_weight + 1):
print(j, '->', get_items_list(dp_table, v, w, j))
0 -> [] 1 -> [{'w': 1, 'v': 1}] 2 -> [{'w': 2, 'v': 6}] 3 -> [{'w': 2, 'v': 6}, {'w': 1, 'v': 1}] 4 -> [{'w': 2, 'v': 6}, {'w': 1, 'v': 1}] 5 -> [{'w': 5, 'v': 18}] 6 -> [{'w': 6, 'v': 22}] 7 -> [{'w': 7, 'v': 28}] 8 -> [{'w': 7, 'v': 28}, {'w': 1, 'v': 1}] 9 -> [{'w': 7, 'v': 28}, {'w': 2, 'v': 6}] 10 -> [{'w': 7, 'v': 28}, {'w': 2, 'v': 6}, {'w': 1, 'v': 1}] 11 -> [{'w': 6, 'v': 22}, {'w': 5, 'v': 18}]
With time complexity of $ \Theta(n + W) $ and a space complexity of $ \Theta(n(W + 1)) $.
The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences [6].
The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.
def calc_lcs(a, b):
n = len(a)
m = len(b)
matrix = [[0] * (m + 1) for i in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, m + 1):
if a[i - 1] == b[j - 1]:
matrix[i][j] = 1 + matrix[i - 1][j - 1]
else:
matrix[i][j] = max(matrix[i - 1][j], matrix[i][j - 1])
return matrix
# Example values
a = ['X', 'M', 'J', 'Y', 'A', 'U', 'Z']
b = ['M', 'Z', 'J', 'A', 'W', 'X', 'U']
# Run algorithm
dp_table = calc_lcs(a, b)
pd.DataFrame(dp_table, index=['-'] + a, columns=['-'] + b)
- | M | Z | J | A | W | X | U | |
---|---|---|---|---|---|---|---|---|
- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
M | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
J | 0 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
Y | 0 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
A | 0 | 1 | 1 | 2 | 3 | 3 | 3 | 3 |
U | 0 | 1 | 1 | 2 | 3 | 3 | 3 | 4 |
Z | 0 | 1 | 2 | 2 | 3 | 3 | 3 | 4 |
With time complexity of $ \Theta(nm) $ and a space complexity of $ \Theta((n + 1)(m + 1)) $.
Greedy approach
def get_lcs(matrix, a, b, verbose=False):
lc_seq = []
i = len(a)
j = len(b)
while i > -1 and j > -1:
if verbose:
print(i, j)
if i > 0 and j > 0 and a[i - 1] == b[j - 1]:
lc_seq.append(a[i - 1])
i = i - 1
j = j - 1
elif j > 0 and (i == 0 or matrix[i][j - 1] >= matrix[i - 1][j]):
j = j - 1
elif i > 0 and (j == 0 or matrix[i][j - 1] < matrix[i - 1][j]):
i = i - 1
else:
break
return list(reversed(lc_seq))
# This function gets the longest common subsequence
get_lcs(dp_table, a, b)
['M', 'J', 'A', 'U']
With time complexity of $ \Theta(n + m) $ and a space complexity of $ \Theta((n + 1)(m + 1)) $.
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences [7]. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns.
Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.
It was one of the first applications of dynamic programming to compare biological sequences. The algorithm essentially divides a large problem (e.g. the full sequence) into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem [8].
# Function to measure the performance of an alignment
def s(x, y):
if x == '-' or y == '-':
# Payment for Gap
return -1
elif x == y:
# Payment for Match
return 1
# Payment for Mismatch
return -1
# Needleman–Wunsch algorithm to calculate the sequence alignment
def calc_seq_align(a, b):
m = len(a)
n = len(b)
matrix = [[0] * (m + 1) for i in range(n + 1)]
for i in range(n + 1):
matrix[i][0] = i * s('-', b[i - 1])
for j in range(m + 1):
matrix[0][j] = j * s(a[j - 1], '-')
for i in range(1, n + 1):
for j in range(1, m + 1):
matrix[i][j] = max(matrix[i - 1][j - 1] + s(a[j - 1], b[i - 1]),
matrix[i - 1][j] + s('-', b[i - 1]),
matrix[i][j - 1] + s(a[j - 1], '-'))
return matrix
# Example values
a = list('GCATGCUA')
b = list('GATTACA')
# Run algorithm
dp_table = calc_seq_align(a, b)
pd.DataFrame(dp_table, index=['-'] + b, columns=['-'] + a)
- | G | C | A | T | G | C | U | A | |
---|---|---|---|---|---|---|---|---|---|
- | 0 | -1 | -2 | -3 | -4 | -5 | -6 | -7 | -8 |
G | -1 | 1 | 0 | -1 | -2 | -3 | -4 | -5 | -6 |
A | -2 | 0 | 0 | 1 | 0 | -1 | -2 | -3 | -4 |
T | -3 | -1 | -1 | 0 | 2 | 1 | 0 | -1 | -2 |
T | -4 | -2 | -2 | -1 | 1 | 1 | 0 | -1 | -2 |
A | -5 | -3 | -3 | -1 | 0 | 0 | 0 | -1 | 0 |
C | -6 | -4 | -2 | -2 | -1 | -1 | 1 | 0 | -1 |
A | -7 | -5 | -3 | -1 | -2 | -2 | 0 | 0 | 1 |
With time complexity of $ \Theta(nm) $ and a space complexity of $ \Theta((n + 1)(m + 1)) $.
Greedy approach
def get_seq_align(matrix, a, b, verbose=False):
alignmentA = ""
alignmentB = ""
j = len(a)
i = len(b)
while i > -1 and j > -1:
if verbose:
print(i, j)
if i > 0 and j > 0 and matrix[i][j] == matrix[i - 1][j - 1] + s(a[j - 1], b[i - 1]):
alignmentA = a[j - 1] + alignmentA
alignmentB = b[i - 1] + alignmentB
i = i - 1
j = j - 1
elif i > 0 and matrix[i][j] == matrix[i - 1][j] + s('-', b[i - 1]):
alignmentA = "-" + alignmentA
alignmentB = b[i - 1] + alignmentB
i = i - 1
elif j > 0 and matrix[i][j] == matrix[i][j - 1] + s(a[j - 1], '-'):
alignmentA = a[j - 1] + alignmentA
alignmentB = "-" + alignmentB
j = j - 1
else:
break
return (alignmentA, alignmentB)
# This function gets the Sequence Alignment
get_seq_align(dp_table, a, b)
('GCA-TGCUA', 'G-ATTAC-A')
With time complexity of $ \Theta(n + m) $ and a space complexity of $ \Theta((n + 1)(m + 1)) $.
The all-pairs shortest path problem is the determination of the shortest graph distances between every pair of vertices in a given graph. The problem can be solved using n applications of Dijkstra's algorithm or all at once using the Floyd-Warshall algorithm [9].
Please click here to see an example of both algorithms in the Graphs section. The second one is solved with dynamic programming.
[1] Wikipedia - Dynamic Programming.
[2] Wikipedia - Principle of Optimality.
[3] Wikipedia - Binomial coefficient.
[4] Wikipedia - Change-making problem.
[5] Wikipedia - Knapsack problem.
[6] Wikipedia - Longest common subsequence problem.
[7] Wikipedia - Sequence alignment problem.
[8] Wikipedia - Needleman-Wunsch algorithm.
[9] Wikipedia - All-Pairs Shortest Path.