Module 0085: Search algorithms and their complexities

Tak Auyeung, Ph.D.

November 28, 2018

1 About this module
2 Linear search
2.1 Applications
2.2 Algorithm
2.3 Complexity
3 Binary search
3.1 Applications
3.2 Algorithm
3.3 Complexity

1 About this module

Prerequisites: 0082
Objectives: This module discusses several well known search algorithms for diﬀerent data structures. This module also analyzes the (time) complexity of these algorithms.

2 Linear search

2.1 Applications

Although linear search is far inferior to binary search in terms of eﬃciency, there are times when only linear search is applicable.

Searching in a linked list can only be done using linear search. Searching in an unsorted array can only be done using linear search.

2.2 Algorithm

The algorithm of linear search is in listing 1.

Listing 1: linear search

1define sub linsearch
2  by value x : sequence
3  by value v : valueType
4  return type boolean
5  local i : iterator
6  i

\leftarrow

x.begin()
7 while (i < x.end())

\land

(∗i

\neq

v) do
8    i++
9  end while
10  if (i == x.end()) then
11    return false
12  else
13    return true
14  end if
15end define sub

Just a little bit of explanation for this pseudocode.

x is a sequence. This means it can be an array, a linked list, a ﬁle, or anything that can be seen as a sequence.
v is the value that we are searching for.
i is an iterator for the sequence type.
x.begin() speciﬁes an iterator value to specify the beginning of a sequence x.
x.end() speciﬁes an iterator value that is just past the last item in the sequence x.
*i refers to the current item that the iterator i is “pointing at” in sequence x.
i++ move the iterator i so that it speciﬁes the next item in the sequence.

2.3 Complexity

Let us perform the worst case analysis. In the worst case, the value v does not match any item in the sequence. Line 6 and lines 10 to 14 execute only once, and none of these lines involve repetitive actions. We’ll lump the execution time of all these lines as $t_{1}$ .

How many times do we execute line 7? Assuming that $v$ is not found in the sequence, it means the right condition of the conjunction is always true. This means that we can only count on the left condition to become false to exit the loop.

The iterator i begins at the beginning of the sequence. We only increments once in the loop. It will require $| x |$ increments to reach the end of the sequence ( $| x |$ means the number of values in the sequence). Consequently, line 7 executes $| x | + 1$ times, while line 8 executes $| x |$ times. Let us assume line 7 requires $t_{2}$ for each execution, and line 8 requires $t_{3}$ for each execution, the total amount of time required is $f (| x |) = t_{1} + (| x | + 1) t_{2} + | x | t_{3}$ . Let us call $n = | x |$ „ $k_{1} = t_{1} + t_{2}$ , and $k_{2} = t_{2} + t_{3}$ , then $f (n) = k_{1} + n \cdot k_{2}$ .

The upper bound of $f (n)$ can be many functions, such as $O (n^{2})$ . However, the tight bound of $f (n)$ can only be $Θ (n)$ . Let us revisit the deﬁnitions of asymptotic limits in the context of $f (n) ∕ n$ .

\begin{array}{rcl} \underset{i \to \infty}{liminf} \frac{f (i)}{i} & = & lim_{i \to \infty} \frac{k_{1} + i \cdot k_{2}}{i} & (1) \\ = & lim_{i \to \infty} (k_{2} + \frac{k_{1}}{i}) & (2) \\ = & k_{2} + k_{1} lim_{i \to \infty} \frac{1}{i} & (3) \\ = & k_{2} > 0 & (4) \end{array}

Here, we can use the usual limit in place of limit inferior because the sequence is monotonic. The interpretation is that $f (n) \in Ω (n)$ .

Next, applying the same techniques, we can show that ${limsup}_{i \to \infty} \frac{f (i)}{i} = k_{2} < \infty$ . This means that $f (n) \in O (n)$ .

Since $f (n) \in Ω (n)$ and $f (n) \in O (n)$ , $f (n) \in Θ (n)$ .

3 Binary search

3.1 Applications

Binary search is useful only when values are stored in a randomly accessible way, and they are sorted. This limits the use of binary search to arrays in memory, or randomly accessiable ﬁles of equal size records (or one with an index).

3.2 Algorithm

Listing 2 is the pseudocode of binary search as a recursive algorithm.

Listing 2: binary search

1define sub binsearch
2  by value a : randomaccess
3  by value b : integer
4  by value e : integer
5  by value v : valueType
6  local m : integer
7  return type : boolean
8  if (b > e) then
9   return false
10  else
11   m

\leftarrow ⌊ \frac{b + e}{2} ⌋

12    if (a[m] < v) then
13      return binsearch(a, m+1, e, v)
14    else if (a[m] > v) then
15      return binsearch(a, b, m

-

1, v)
16    else
17      return true
18    end if
19  end if
20end define sub

3.3 Complexity

Let us now analyze the complexity of binary search. Again, we are only interested in the worst case complexity, which means the value v cannot be found in the random access ADT a.

Looking through the recursive algorithm, only a few lines do not take constant time to execute. Line 13 and line 15 invokes the subroutine again. This means that the execution time of an invocation depends solely on the execution time of the recursive calls.

Furthermore, the only change of parameters are to parameter b and parameter e. Let us deﬁne $x = e - b + 1$ , which is the number of elements to be considered by a particular invocation.

How does $x$ of one invocation relate to the $x$ of the recursive invocation? Let us consider all the possible cases for $x^{'}$ , which is the number of elements to consider in the recursive call:

b + e is even, m = b+e 2
- a[m] < v
  - $b^{'} = m + 1$
  - $x^{'} = e - b^{'} + 1 = e - (m + 1) + 1 = e - (b + e) ∕ 2 = e ∕ 2 - b ∕ 2 = (x - 1) ∕ 2$
- a[m] > v
  - $e^{'} = m - 1$
  - $x^{'} = e^{'} - b + 1 = (m - 1) - b + 1 = (b + e) ∕ 2 - b = e ∕ 2 - b ∕ 2 = (x - 1) ∕ 2$
b + e is odd, m = b+e−1 2
- a[m] < v
  - $b^{'} = m + 1$
  - $x^{'} = e - b^{'} + 1 = e - (m + 1) + 1 = e - (b + e - 1) ∕ 2 = (e - b + 1) ∕ 2 = x ∕ 2$
- a[m] > v
  - $e^{'} = m - 1$
  - $x^{'} = e^{'} - b + 1 = (m - 1) - b + 1 = (b + e - 1) ∕ 2 - b = (e - b - 1) ∕ 2 = (e - b + 1 - 2) ∕ 2 = x ∕ 2 - 1$

The worst case is that $x^{'} = x ∕ 2$ .

Let us deﬁne $f (x)$ to be the amount of time required to perform bineary search for $x$ items. Then this deﬁnition can be recursive. $f (0) = t_{1}$ is the base case, in which $t_{1}$ is the upper bound of the total amount of execution time of the algorithm not counting the recursive invocations. The recursive step is $f (x) = t_{1} + f (x ∕ 2)$ because only one of the two recursive call is chosen, and the worst case is to pass $x ∕ 2$ elements to the next invocation.

Let us “guess” that $f (x) = t_{1} \cdot (2 + {log}_{2} (x))$ , provided that $x$ is a power of 2. We can proof our guess as follows.

The basis: $f (0) = t_{1}$ by deﬁnition. However, 0 is not a power of 2. $f (1) = t_{1} + f (1 ∕ 2) = t_{1} + f (0) = 2 t_{1}$ by the recursive deﬁnition of $f$ . However, our guess also suggests that $f (1) = t_{1} \cdot (2 + {log}_{2} (1)) = 2 t_{1}$ . This is the base case of the inductive proof.

The induction step states that assume $f (x) = t_{1} \cdot (2 + {log}_{2} (x))$ for $x = 2^{k}$ , prove that $f (2 x) = t_{1} \cdot (2 + {log}_{2} (2 x))$ . This can be done rather easily:

\begin{array}{rcl} f (2 x) & = & t_{1} + f (x) & (5) \\ = & t_{1} + t_{1} (2 + {log}_{2} (x)) & (6) \\ = & t_{1} (1 + 2 + {log}_{2} (x)) & (7) \\ = & t_{1} (2 + {log}_{2} (2 x)) & (8) \end{array}

Consequently, given that $x = 2^{k}$ , binary search requires $t_{1} (2 + {log}_{2} (x))$ time. This means however, that $f (n) \in Θ (ln (n))$ because $\frac{t_{1} (2 + {log}_{2} (x))}{ln (x)} = t_{1} ln (2) + \frac{2}{ln (x)}$ is a monotonic function that approaches $t_{1} ln (2)$ as $x \to \infty$ .. Because $0 < t_{1} ln (2) < \infty$ , $ln (n)$ is a tight bound for $f (n)$ .