How is it possible to do binary search on a doubly-linked list in O(n) time?

13

9

I've heard that it's possible to implement binary search over a doubly-linked list in O(n) time. Accessing a random element of a doubly-linked list takes O(n) time, and binary search accesses O(log n) different elements, so shouldn't the runtime be O(n log n) instead?

templatetypedef

Posted 2013-10-23T23:51:19.380

Reputation: 258 804

You can do a linear search in O(n), so why do a binary search that takes O(nlogn), or any other algorithm that exceeds O(n)? An abstract API defined on bags (non-sets such as arrays or linked lists) with a BinarySearch method should simply implement the version for linked lists as a linear search ... the caller can't tell which algorithm is used, other by timing it and seeing that it isn't actually a pointlessly slow binary search. In effect, binary search on a linked list can be implemented in O(n) by doing a linear search ... the name doesn't mandate what it actually does. – Jim Balter – 2018-11-30T00:20:22.357

The advantage is that while it does O(n) work traversing the list, it only makes O(log n) comparisons. If you have giant elements stored in the list, this could be significantly faster than doing a linear search. – templatetypedef – 2018-11-30T00:22:51.640

Ok, good point ... I've now read your answer to your question. The statement "It's technically correct to say that the runtime of binary search on a doubly-linked list is O(n log n)" is wrong, since you yourself provide an O(n) algorithm with O(logn) comparisons. So what you say in your question that you heard is correct ... " it's possible to implement binary search over a doubly-linked list in O(n) time." ... you should fix the claim at the top of your answer. In any case, thanks for the algorithm and analysis .. I was looking for that. – Jim Balter – 2018-11-30T00:26:46.773

P.S. It also works on singly-linked lists, since you always have the head of the two sublists, and you can find the midpoint using Floyd's hare and tortoise trick (https://www.geeksforgeeks.org/write-a-c-function-to-print-the-middle-of-the-linked-list/).

– Jim Balter – 2018-11-30T00:32:54.647

It’s not actually incorrect to claim that the binary search takes time O(n log n). It’s just not a tight bound. For example, it’s not incorrect for me to claim that I’m at most 1km tall, even though in reality I’m significantly shorter than that. Also, thanks for sharing that link! I have another question posted where I go into the details behind that algorithm. – templatetypedef – 2018-11-30T00:44:45.333

We'll have to agree to disagree (but you're wrong ... if a problem has a solution in O(n), it's not correct that it has a solution in O(nlogn) where it's "not a tight bound". Your analogy is not relevant. You might as well say that every problem has an O(infinity) solution that isn't a tight bound.) – Jim Balter – 2018-11-30T01:03:49.410

The formal definition of big-O notation is that f(n) = O(g(n)) if there’s constants n0 and c where for every n >= n0, we have f(n) <= cg(n). So if f(n) = O(n), then mathematically f(n) = O(n log n) using the same values of n0 and c. – templatetypedef – 2018-11-30T01:52:39.257

log n isn't a constant, so O(n) != O(n log n). You can take this to chat, but I won't join you there. Ciao. – Jim Balter – 2018-11-30T03:21:07.747

Answers

27

It's technically correct to say that the runtime of binary search on a doubly-linked list is O(n log n), but that's not a tight upper bound. Using a slightly better implementation of binary search and a more clever analysis, it's possible to get binary search to run in time O(n).

The basic idea behind binary search is the following:

  • If the list is empty, the element being searched for doesn't exist.
  • Otherwise:
    • Look at the middle element.
    • If it matches the element in question, return it.
    • If it's bigger than the element in question, discard the back half of the list.
    • If it's smaller than the element in question, discard the front half of the list.

A naive implementation of binary search on a doubly-linked list would work by computing the indices to look up on each iteration (just like in the array case), then access each one by starting at the front of the list and scanning forward the appropriate number of steps. This is indeed very slow. If the element being searched for is at the very end of the array, the indices looked up would be n/2, 3n/4, 7n/8, etc. Summing up the work done in the worst case, we get

n / 2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) terms)

≥ n / 2 + n / 2 + ... + n / 2 (Θ(log n) terms)

= Θ(n log n)

and

n / 2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) terms)

≤ n + n + ... + n (Θ(log n) terms)

= Θ(n log n)

Therefore, the worst-case time complexity for this algorithm is Θ(n log n).

However, we can speed this up by a factor of Θ(log n) by being more clever with our approach. The reason the previous algorithm is slow is that every time we need to look up an element, we start the search from the beginning of the array. However, we don't need to do this. After looking up the middle element the first time, we're already in the middle of the array, and we know that the next lookup we're going to make will be either at position n / 4 or 3n / 4, which is only distance n / 4 from where we left off (compared to n / 4 or 3n / 4 if we start from the beginning of the array). What if we just trekked out from our stopping position (n / 2) to the next position, rather than restarting at the front of the list?

Here's our new algorithm. Begin by scanning to the middle of the array, which requires n / 2 steps. Then, determine whether to visit the element at the middle of the first half of the array or the middle of the second half of the array. Reaching there from position n / 2 only requires n / 4 total steps. From there, going to the midpoint of the quarter of the array containing the element takes only n / 8 steps, and going from there to the midpoint of the eighth of the array containing the element takes only n / 16 steps, etc. This means that the total number of steps made is given by

n / 2 + n / 4 + n / 8 + n / 16 + ...

= n (1/2 + 1/4 + 1/8 + ...)

≤ n

This follows from the fact that the sum of the infinite geometric series 1/2 + 1/4 + 1/8 + ... is 1. Therefore, the total work done in the worst case only Θ(n), which is much better than the Θ(n log n) worst case from before.

One last detail: why would you ever do this? After all, it already takes O(n) time to search a doubly-linked list for an element. One major advantage of this approach is that even though the runtime is O(n), we only end up doing O(log n) total comparisons (one per step of the binary search). This means that if comparisons are expensive, we might end up doing less work using a binary search than doing a normal linear search, since the O(n) comes from the work done walking the list rather than the work done making comparisons.

templatetypedef

Posted 2013-10-23T23:51:19.380

Reputation: 258 804