Module 0060: Searching in a sorted array
Tak Auyeung, Ph.D.
October 17, 2019
Contents
1 About this module
- Prerequisites: 0054
- Objectives: This module improves the linear search algorithm for sorted arrays. It also introduces binary search, a
very efficient search algorithm for sorted arrays.
2 Sorted arrays
The linear search algorithm in module 0054 is the only kind of search algorithm that works on unsorted arrays. In the worst case, all
elements in the array must be examined to confirm that a value is not in the array.
With sorted arrays, however, we can make some improvements.
Assume that array is
sorted, and assume
is the number of elements in the array. Furthermore, let us assume that the array is sorted in a non-decreasing order. This implies
that we can have duplicate values in the array. The mathematical way to say an array is sorted in a non-decreasing order is as
follows:
for
from 0 to
. We can also
spell this out as .
3 Linear search
We can improve our linear search algorithm to work more efficiently. The original algorithm is described as algorithm
1.
Listing 1: Linear search algorithm that works for sorted and unsorted arrays.
1 2while do 3 4end while 5if then 6 // conclude no element in has a value of 7else 8 // conclude is the first element with a value of 9end if
Note that if we find that , then we
already know that because the array
is sorted. This also means that ,
, all the way
up to .
This property of a sorted array means that we can exit the loop earlier when we discover that
.
This is great new! On the other hand, we need to modify the conditional statement because now we can exit with
and still conclude that value
is not anywhere in the array. The
condition that confirms that value
is not in the array should be .
As a result, the algorithm is modified as in algorithm 2.
Listing 2: Improved linear search algorithm that works only for sorted arrays.
1 2while do 3 4end while 5if then 6 // conclude no element in has a value of 7else 8 // conclude is the first element in that has a value of 9end if
With this modification, the number of elements to examine is variable when the value
is not in the array. Nonetheless,
in the worst case, when , we
still need to examine all
items in the array.
4 Binary search
4.1 Concept
Binary search works only if an array is sorted. The basic idea is to compare the value to find with an element in the middle of
the array. Depending on the outcome, we can throw away one half of the candidates, or confirm the value does
exist.
Let us assume that we compare
with , the first candidate
has an index of , and the last
candidate has an index of .
Then, there are three possible outcomes:
- :
the search is over! We just confirmed that value
can be found in the array.
- :
we can throw away ,
,...
.
- :
we can throw away ,
,...
.
If we pick to be
half way between
and , then
we can throw away at least one half of the candidates after one comparison.
4.2 Algorithm
The binary search algorithm is described in algorithm 3. This algorithm assumes that
is an array with at
least one item, and
is the value that we are searching for.
Listing 3: The binary search algorithm
1
2
3repeat 4
5 if then then 6
7 else 8 if then 9
10 end if 11 end if 12until
13if then 14 // conclude cannot be found in 15else 16 // conclude is found in 17end if
Let us dissect this algorithm.
- Lines 1 and 2 initialize variables
and ,
respectively.
is the index of the first element that can still contain value .
It is initialized to 0 because we need to consider all elements initially. For the same reason,
is initialized to
because that is the index of the last element in the entire array.
- Everything between line 3 and line 12 is the logic of binary search.
- Line 4 computes the value of
so it is half-way between
and .
The symbol
is called the “floor” of .
The floor of
is the largest integer that is less than or equal to .
In this context, we are simply truncating any fractional value.
- Line 5 checks to see if we can remove the first half of the candidates. If ,
then we know that ,
and so we can eliminate all those elements as candidates.
- Line 6: Once we know that we can eliminate the first half of the candidates, we can do so by changing .
Note that we do not change to
to
because .
Instead, we move
to .
This subtle point makes all the differences in terms of terminating the loop.
- Lines 8 and 9 are the counterparts of lines 5 and 6. Instead of getting rid of the first half, these lines checks to see
if we can eliminate the self half.
- Line 12 specifies when we can get out of the loop. There are two reasons. First, when ,
we have no candidates left. Note that when ,
it means that there is one more element to be considered. Second, when ,
there is nothing to check anymore, because we have just found an element of value .
- Line 13 checks to see if value
is found in the array or not. If ,
it means that we exited the loop because we ran out of candidates. Therefore, we can then confirm that value
does not exist in .
4.3 Now, that’s efficient!
The binary search algorithm is very efficient because each comparison eliminates at least one half of the remaining
candidates. This means that if we start off with 511 candidates, we’ll end up with 255 after one comparison, 127
after two comparisons, 63 after three comparisons, and etc. It’ll take 9 comparisons that confirm a value
is not in
an array of 511 elements.
Let , and
be the number of comparisons
needed to confirm
is not in . Then
. The symbol
is the ceiling of
, which is the smallest integer
that is larger than or equal to .
Using this formula, to look up a name in a phonebook with 6 billion entries will only take up to 33 comparisons!