Using the Binary Search Algorithm in Python

I'm a self taught developer and when I'm searching for something in a list or dictionary my go to is the linear search.

I'll create a for loop which will iterate over every item until it finds what I'm looking for.

my_ids = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
id_of_interest = 10

for my_id in my_ids:
    if my_id == id_of_interest:
        print(f'We found {id_of_interest}')
    else:
        print(f'No match for {my_id}')

# No match for 0
# No match for 1
# No match for 2
# No match for 3
# No match for 4
# No match for 5
# No match for 6
# No match for 7
# No match for 8
# No match for 9
# We found 10

There's nothing wrong with the linear search. With the example above you'll see that the item we're looking for is at the end of the list, which means we had to iterate over every item.

For a small list this is OK but for large lists this can be slow and inefficient.

The Binary Search Algorithm

If a list is sorted in a known order then we can use the binary search algorithm to find items more efficiently.

The list of ids in the example are already sorted in order from lowest to highest.

my_ids = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
id_of_interest = 10

low = 0
high = len(my_ids) - 1

while low <= high:
    mid = (low + high) // 2
    if my_ids[mid] < id_of_interest:
        low = mid + 1
        print(f'lower. low {low}, high {high}, mid {mid}')
    elif my_ids[mid] > id_of_interest:
        high = mid - 1
        print(f'higher. low {low}, high {high}, mid {mid}')
    else:
        print(f'Found {id_of_interest} at index {mid}.')
        break

# lower. low 6, high 10, mid 5
# lower. low 9, high 10, mid 8
# lower. low 10, high 10, mid 9
# Found 10 at index 10.

As you can see from the output, the item was found on the 4th iteration instead of the 11th iteration for the linear search.

To use the binary search we start by looking at the first item in the middle of the list.

For this example the index will be the same as the value in the list for simplicity.

If you're wondering what // is. It means divide and round down to the nearest whole number.

First Iteration

Once we've calculated the index for the middle of the list we compare that value with the id_of_interest and decide whether to look up or down the list.

The first value in the middle of the list is 5 which is lower than our id_of_interest10.

We set our lower (low) range to 5 + 1 so our lower range is now 6 and the higher range stays the same 10.

Second Iteration

We recalculate the index for the new middle (mid) which will be (6 + 10) // 2 = 8.

When we compare 8 to our id_of_interest10 it's still lower so we add to our lower range so it becomes 8 + 1 = 9.

Third Iteration

We recalculate the new middle which will be (9 + 10) // 2 = 9. When we compare 9 to our id_of_interest it's still lower than 10 so we add 1 to our lower range so it becomes 9 + 1 = 10.

Fourth Iteration

We recalculate the new middle which will be (10 + 10) // 2 = 10. When we compare 10 to our id_of_interest it's neither lower or higher which means we've found what we were looking for. The else clause is hit and the loop is broken.

Conclusion

In this case the binary search was 275% faster than the linear search.

If you're in a situation where you have a large list to search and you know the list is sorted in a particular order then the binary search is worth trying.