Module 0060: Searching in a sorted array

Tak Auyeung, Ph.D.

October 17, 2019

Contents

1 About this module
2 Sorted arrays
3 Linear search
4 Binary search
 4.1 Concept
 4.2 Algorithm
 4.3 Now, that’s efficient!

1 About this module

2 Sorted arrays

The linear search algorithm in module 0054 is the only kind of search algorithm that works on unsorted arrays. In the worst case, all elements in the array must be examined to confirm that a value is not in the array.

With sorted arrays, however, we can make some improvements.

Assume that array a is sorted, and assume n = |a| is the number of elements in the array. Furthermore, let us assume that the array is sorted in a non-decreasing order. This implies that we can have duplicate values in the array. The mathematical way to say an array is sorted in a non-decreasing order is as follows:

a[i] a[i + 1] for i from 0 to n 2. We can also spell this out as a[0] a[1] a[2]a[n 1].

3 Linear search

We can improve our linear search algorithm to work more efficiently. The original algorithm is described as algorithm 1.


Listing 1: Linear search algorithm that works for sorted and unsorted arrays.
1i0 
2while (i<|a|(a[i]v) do 
3  ii+1 
4end while 
5if i=|a| then 
6  // conclude no element in a has a value of v 
7else 
8  // conclude a[i] is the first element with a value of v 
9end if

Note that if we find that v < a[i], then we already know that v < a[i] a[i + 1] a[i + 2]a[n 1] because the array is sorted. This also means that va[i + 1], va[i + 2], all the way up to va[n 1].

This property of a sorted array means that we can exit the loop earlier when we discover that v < a[i]. This is great new! On the other hand, we need to modify the conditional statement because now we can exit with i < |a| and still conclude that value v is not anywhere in the array. The condition that confirms that value v is not in the array should be (i = |a|) (a[i]v).

As a result, the algorithm is modified as in algorithm 2.


Listing 2: Improved linear search algorithm that works only for sorted arrays.
1i0 
2while (i<|a|)(a[i]<v) do 
3  ii+1 
4end while 
5if (i=|a|)(a[i]v) then 
6  // conclude no element in a has a value of v 
7else 
8  // conclude a[i] is the first element in a that has a value of v 
9end if

With this modification, the number of elements to examine is variable when the value v is not in the array. Nonetheless, in the worst case, when v > a[n 1], we still need to examine all n items in the array.

4 Binary search

4.1 Concept

Binary search works only if an array is sorted. The basic idea is to compare the value to find with an element in the middle of the array. Depending on the outcome, we can throw away one half of the candidates, or confirm the value does exist.

Let us assume that we compare v with a[m], the first candidate has an index of b, and the last candidate has an index of e. Then, there are three possible outcomes:

If we pick m to be half way between b and e, then we can throw away at least one half of the candidates after one comparison.

4.2 Algorithm

The binary search algorithm is described in algorithm 3. This algorithm assumes that a is an array with at least one item, and v is the value that we are searching for.


Listing 3: The binary search algorithm
1b0  
2e|a|1  
3repeat  
4  mb+e 2  
5  if a[m]<v then  then 
6    bm+1  
7  else 
8    if v<a[m] then  
9      em1  
10    end if 
11  end if 
12until (b>e)(a[m]=v)  
13if b>e then  
14  // conclude v cannot be found in a 
15else 
16  // conclude v is found in a 
17end if

Let us dissect this algorithm.

4.3 Now, that’s efficient!

The binary search algorithm is very efficient because each comparison eliminates at least one half of the remaining candidates. This means that if we start off with 511 candidates, we’ll end up with 255 after one comparison, 127 after two comparisons, 63 after three comparisons, and etc. It’ll take 9 comparisons that confirm a value v is not in an array of 511 elements.

Let n = |a|, and q be the number of comparisons needed to confirm v is not in a. Then q = log(n)log(2). The symbol x is the ceiling of x, which is the smallest integer that is larger than or equal to x.

Using this formula, to look up a name in a phonebook with 6 billion entries will only take up to 33 comparisons!