Module 0060: Searching in a sorted array

Tak Auyeung, Ph.D.

October 17, 2019

1 About this module
2 Sorted arrays
3 Linear search
4 Binary search
4.1 Concept
4.2 Algorithm
4.3 Now, that’s eﬃcient!

1 About this module

Prerequisites: 0054
Objectives: This module improves the linear search algorithm for sorted arrays. It also introduces binary search, a very eﬃcient search algorithm for sorted arrays.

2 Sorted arrays

The linear search algorithm in module 0054 is the only kind of search algorithm that works on unsorted arrays. In the worst case, all elements in the array must be examined to conﬁrm that a value is not in the array.

With sorted arrays, however, we can make some improvements.

Assume that array $a$ is sorted, and assume $n = | a |$ is the number of elements in the array. Furthermore, let us assume that the array is sorted in a non-decreasing order. This implies that we can have duplicate values in the array. The mathematical way to say an array is sorted in a non-decreasing order is as follows:

$a [i] \leq a [i + 1]$ for $i$ from 0 to $n - 2$ . We can also spell this out as $a [0] \leq a [1] \leq a [2] \dots a [n - 1]$ .

3 Linear search

We can improve our linear search algorithm to work more eﬃciently. The original algorithm is described as algorithm 1.

Listing 1: Linear search algorithm that works for sorted and unsorted arrays.

i \leftarrow 0

2while

(i < | a | \land (a [i] \neq v)

do
3

i \leftarrow i + 1

4end while
5if

i = | a |

then
6 // conclude no element in

a

has a value of

v

7else
8 // conclude

a [i]

is the first element with a value of

v

9end if

Note that if we ﬁnd that $v < a [i]$ , then we already know that $v < a [i] \leq a [i + 1] \leq a [i + 2] \dots a [n - 1]$ because the array is sorted. This also means that $v \neq a [i + 1]$ , $v \neq a [i + 2]$ , all the way up to $v \neq a [n - 1]$ .

This property of a sorted array means that we can exit the loop earlier when we discover that $v < a [i]$ . This is great new! On the other hand, we need to modify the conditional statement because now we can exit with $i < | a |$ and still conclude that value $v$ is not anywhere in the array. The condition that conﬁrms that value $v$ is not in the array should be $(i = | a |) \lor (a [i] \neq v)$ .

As a result, the algorithm is modiﬁed as in algorithm 2.

Listing 2: Improved linear search algorithm that works only for sorted arrays.

i \leftarrow 0

2while

(i < | a |) \land (a [i] < v)

do
3

i \leftarrow i + 1

4end while
5if

(i = | a |) \lor (a [i] \neq v)

then
6 // conclude no element in

a

has a value of

v

7else
8 // conclude

a [i]

is the first element in

a

that has a value of

v

9end if

With this modiﬁcation, the number of elements to examine is variable when the value $v$ is not in the array. Nonetheless, in the worst case, when $v > a [n - 1]$ , we still need to examine all $n$ items in the array.

4 Binary search

4.1 Concept

Binary search works only if an array is sorted. The basic idea is to compare the value to ﬁnd with an element in the middle of the array. Depending on the outcome, we can throw away one half of the candidates, or conﬁrm the value does exist.

Let us assume that we compare $v$ with $a [m]$ , the ﬁrst candidate has an index of $b$ , and the last candidate has an index of $e$ . Then, there are three possible outcomes:

$v = a [m]$ : the search is over! We just conﬁrmed that value $v$ can be found in the array.
$v < a [m]$ : we can throw away $a [m]$ , $a [m + 1]$ ,... $a [e]$ .
$v > a [m]$ : we can throw away $a [b]$ , $a [b + 1]$ ,... $a [m]$ .

If we pick $m$ to be half way between $b$ and $e$ , then we can throw away at least one half of the candidates after one comparison.

4.2 Algorithm

The binary search algorithm is described in algorithm 3. This algorithm assumes that $a$ is an array with at least one item, and $v$ is the value that we are searching for.

Listing 3: The binary search algorithm

b \leftarrow 0

e \leftarrow | a | - 1

3repeat
4

m \leftarrow ⌊ \frac{b + e}{2} ⌋

5 if

a [m] < v

then then
6

b \leftarrow m + 1

7 else
8 if

v < a [m]

then
9

e \leftarrow m - 1

10 end if
11 end if
12until

(b > e) \lor (a [m] = v)

13if

b > e

then
14 // conclude

v

cannot be found in

a

15else
16 // conclude

v

is found in

a

17end if

Let us dissect this algorithm.

Lines 1 and 2 initialize variables $b$ and $e$ , respectively. $b$ is the index of the ﬁrst element that can still contain value $v$ . It is initialized to 0 because we need to consider all elements initially. For the same reason, $e$ is initialized to $| a | - 1$ because that is the index of the last element in the entire array.
Everything between line 3 and line 12 is the logic of binary search.
Line 4 computes the value of $m$ so it is half-way between $b$ and $e$ . The symbol $⌊ x ⌋$ is called the “ﬂoor” of $x$ . The ﬂoor of $x$ is the largest integer that is less than or equal to $x$ . In this context, we are simply truncating any fractional value.
Line 5 checks to see if we can remove the ﬁrst half of the candidates. If $a [m] < v$ , then we know that $a [b] \leq a [b + 1] \leq a [b + 2] \dots a [m] < v$ , and so we can eliminate all those elements as candidates.
Line 6: Once we know that we can eliminate the ﬁrst half of the candidates, we can do so by changing $b$ . Note that we do not change to $b$ to $m$ because $a [m] \neq v$ . Instead, we move $b$ to $m + 1$ . This subtle point makes all the diﬀerences in terms of terminating the loop.
Lines 8 and 9 are the counterparts of lines 5 and 6. Instead of getting rid of the ﬁrst half, these lines checks to see if we can eliminate the self half.
Line 12 speciﬁes when we can get out of the loop. There are two reasons. First, when $b > e$ , we have no candidates left. Note that when $b = e$ , it means that there is one more element to be considered. Second, when $a [m] = v$ , there is nothing to check anymore, because we have just found an element of value $v$ .
Line 13 checks to see if value $v$ is found in the array or not. If $b > e$ , it means that we exited the loop because we ran out of candidates. Therefore, we can then conﬁrm that value $v$ does not exist in $a$ .

4.3 Now, that’s eﬃcient!

The binary search algorithm is very eﬃcient because each comparison eliminates at least one half of the remaining candidates. This means that if we start oﬀ with 511 candidates, we’ll end up with 255 after one comparison, 127 after two comparisons, 63 after three comparisons, and etc. It’ll take 9 comparisons that conﬁrm a value $v$ is not in an array of 511 elements.

Let $n = | a |$ , and $q$ be the number of comparisons needed to conﬁrm $v$ is not in $a$ . Then $q = ⌈ log (n) ∕ log (2) ⌉$ . The symbol $⌈ x ⌉$ is the ceiling of $x$ , which is the smallest integer that is larger than or equal to $x$ .

Using this formula, to look up a name in a phonebook with 6 billion entries will only take up to 33 comparisons!