## Why we write lo+(hi-lo)/2 in binary search?

11

4

I was reading about binary search...I know that the traditional way of finding mid value is like

``````mid=(hi+lo)/2
``````

But i also see that to avoid overflow mid value is calculated like that

``````mid=lo+(hi-lo)/2
``````

But why?? I couldn't find the actual reason..Can anyone give me the reason with example?? It is different from other question because other questions didn't have the answer that i wanted with example...

Question was closed 2014-08-29T15:29:07.433

6The reason is in your question, to avoid overflow. – harold – 2014-08-29T15:24:23.020

1This question is off-topic because it's an answer with a question mark. – Quentin – 2014-08-29T15:25:52.337

i didn't get any example..i wanted examples.. – None – 2014-08-29T15:30:19.743

Suppose hi and lo were pointers (or iterators in C++). The sum of two pointers have no meaning. The difference of two pointers does, it's an integer. Adding an integer to a pointer makes senses as well. – user515430 – 2014-08-29T20:18:10.197

## Answers

17

Suppose you are searching a 4000000000-element array using 32-bit `unsigned int` as indexes.

The first step made it appear as though the searched element, if present, would be in the top half. `lo`'s value is `2000000000` and `hi`'s is `4000000000`.

`hi + lo` overflows and produces a value smaller than the intended `6000000000`. It actually produces 6000000000-232. As a result, `(hi + lo) / 2` is a small value. It is not even between `lo` and `hi`!

From then on the search will be wrong (it will probably conclude that the element is absent even if it was there).

By contrast, even with the extreme values in this example, `lo + (hi - lo) / 2` always computes an index halfway between `hi` and `lo`, as intended by the algorithm.

@ikegami Because I chose an `unsigned int` type, the addition is not undefined behavior, it is simply defined as producing the wrap-around result. – Pascal Cuoq – 2014-08-29T15:34:02.380

Interesting. Thanks. – ikegami – 2014-08-29T15:34:34.543

This problem was much more likely to surface back when 16-bit architectures ruled the earth. – Mark Ransom – 2014-08-29T15:35:25.197

1

@MarkRansom Google engineers (re-)discovered it for 32-bit in 2006: http://googleresearch.blogspot.fr/2006/06/extra-extra-read-all-about-it-nearly.html

– Pascal Cuoq – 2014-08-29T15:36:23.083

3

Mathematically speaking, they are equivalent.

In computer terms, `mid=(hi+lo)/2` has fewer operations, but `mid=lo+(hi-lo)/2` is preferred to avoid overflow.

Say the item you are searching are near the end of the array, then `hi+lo` is nearly `2*size`. Since `size` can be almost as large as your maximum index, `2*size` and thus `hi+lo` can overflow.

`hi + lo` can overflow. `lo+(hi-lo)/2` will never overflow if `lo, hi` are positive integers within range and `lo &lt;= hi`. – becko – 2014-08-29T15:25:42.063

Numeric overflow has nothing to do with the size of addressable space, but with the range that can be represented by the numeric type. – Mike Seymour – 2014-08-29T15:29:01.953

@Mike Seymour, already fixed. – ikegami – 2014-08-29T15:29:14.027