Why we write lo+(hi-lo)/2 in binary search?

11

4

I was reading about binary search...I know that the traditional way of finding mid value is like

mid=(hi+lo)/2

But i also see that to avoid overflow mid value is calculated like that

mid=lo+(hi-lo)/2

But why?? I couldn't find the actual reason..Can anyone give me the reason with example?? It is different from other question because other questions didn't have the answer that i wanted with example...

user2291995

Posted 2014-08-29T15:23:20.463

Reputation:

Question was closed 2014-08-29T15:29:07.433

6The reason is in your question, to avoid overflow. – harold – 2014-08-29T15:24:23.020

1This question is off-topic because it's an answer with a question mark. – Quentin – 2014-08-29T15:25:52.337

i didn't get any example..i wanted examples.. – None – 2014-08-29T15:30:19.743

Suppose hi and lo were pointers (or iterators in C++). The sum of two pointers have no meaning. The difference of two pointers does, it's an integer. Adding an integer to a pointer makes senses as well. – user515430 – 2014-08-29T20:18:10.197

Answers

17

Suppose you are searching a 4000000000-element array using 32-bit unsigned int as indexes.

The first step made it appear as though the searched element, if present, would be in the top half. lo's value is 2000000000 and hi's is 4000000000.

hi + lo overflows and produces a value smaller than the intended 6000000000. It actually produces 6000000000-232. As a result, (hi + lo) / 2 is a small value. It is not even between lo and hi!

From then on the search will be wrong (it will probably conclude that the element is absent even if it was there).

By contrast, even with the extreme values in this example, lo + (hi - lo) / 2 always computes an index halfway between hi and lo, as intended by the algorithm.

Pascal Cuoq

Posted 2014-08-29T15:23:20.463

Reputation: 66 586

@ikegami Because I chose an unsigned int type, the addition is not undefined behavior, it is simply defined as producing the wrap-around result. – Pascal Cuoq – 2014-08-29T15:34:02.380

Interesting. Thanks. – ikegami – 2014-08-29T15:34:34.543

This problem was much more likely to surface back when 16-bit architectures ruled the earth. – Mark Ransom – 2014-08-29T15:35:25.197

1

@MarkRansom Google engineers (re-)discovered it for 32-bit in 2006: http://googleresearch.blogspot.fr/2006/06/extra-extra-read-all-about-it-nearly.html

– Pascal Cuoq – 2014-08-29T15:36:23.083

3

Mathematically speaking, they are equivalent.

In computer terms, mid=(hi+lo)/2 has fewer operations, but mid=lo+(hi-lo)/2 is preferred to avoid overflow.

Say the item you are searching are near the end of the array, then hi+lo is nearly 2*size. Since size can be almost as large as your maximum index, 2*size and thus hi+lo can overflow.

ikegami

Posted 2014-08-29T15:23:20.463

Reputation: 259 663

hi + lo can overflow. lo+(hi-lo)/2 will never overflow if lo, hi are positive integers within range and lo <= hi. – becko – 2014-08-29T15:25:42.063

Numeric overflow has nothing to do with the size of addressable space, but with the range that can be represented by the numeric type. – Mike Seymour – 2014-08-29T15:29:01.953

@Mike Seymour, already fixed. – ikegami – 2014-08-29T15:29:14.027