## Fixing Binary search bug from Bentley's book (programming pearls: writing correct programs)

9

2

Binary search can be implemented in many ways-recursive, iterative, conditionals, etc. I took this from Bentley's book "Programming pearls: Writing correct programs" which is an iterative implementation, and that includes a bug.

`````` public class BinSearch
{
static int search( int [] A, int K ) {
int l = 0;
int u = A. length -1;
int m;
while ( l <= u ) {
m = (l+u) /2;
if (A[m] < K){
l = m + 1;
} else if (A[m] == K){
return m;
} else {
u = m-1;
}
}
return -1;
}
}
``````

I found a bug in the line m = (l+u) /2; it can lead to overflow. How can we avoid this overflow in this binary search?

Should not `while ( l &lt;= U )` be `while ( l &lt;= u )` instead ? – Machtl – 2013-07-07T06:51:54.050

i took it from the book...programming pearls:writing correct programs. – Jayram Singh – 2013-07-07T06:52:36.743

`U` will be unknown identifier and `u` makes a lot more sense. Have you not tried to compile it? – Machtl – 2013-07-07T07:01:55.220

I changed it @Machtl :) – Jayram Singh – 2013-07-07T07:02:51.130

9

– JeremyDWill – 2013-07-09T20:11:57.947

1Usually set m = l + (u - l) / 2; is just fine. The key principle of binary search is keep precondition unchanged in your binary operation halving the interval. – lulyon – 2013-07-13T09:18:04.323

10

Try the following:

change

`m = (l+u) /2`

to

`m = (u-l) / 2 + l`

The reason why the `(l+u) / 2` can overflow becomes obvious if you consider a very large array of 2^31 - 1 elements (maximum value a signed 32-bit integer can hold). In this case the first iteration is just fine because `2^31 - 1 + 0` is not a big deal but consider the case of `l = m + 1` here. In the second iteration u is still the same and l is `2^31 / 2` so `l + u` will lead to an overflow.

This way we are avoiding the addition of `u + l` by first determining the relative middle between l and u `(u - l) / 2` and then adding the lower number l to it so it becomes absolute. So during the operation `m = (u-l) / 2 + l;` we never exceed the value of u.

To summarize the complete code:

``````public class BinSearch
{
static int search( int [] A, int K )
{
int l = 0;
int u = A. length -1;
int m;

while ( l <= u )
{
m = (u-l) / 2 + l;

if (A[m] < K)
l = m + 1;
else if (A[m] == K)
return m;
else
u = m - 1;
}
return -1;
}
}
``````

Technically this problem can occur on an array with anywhere between `2^30(+1?)` and `2^31-1` elements. – Dukeling – 2013-07-10T13:36:11.333

4

Suppose l and u are int both fall into [0, 2^31-1]. If l, u >= 2^30, then (l+u) >= 2^31 is overflow. To avoid this, use

``````m = l + (u-l)/2;
``````

instead. What is more, it may be more reasonable to write binary search like this:

``````    public class BinSearch
{
static int search( int [] A, int K ) {
int l = -1;             // index lower bound shift left by 1
int u = A.length;       // index upper bound shift right by 1
int m;
while ( l + 1 < u ) {
m = l + (u-l)/2;    // avoid overflow
if (A[m] < K){
l = m;          // keep A[l] < K
} else {
u = m;          // keep A[u] >= K
}
}
if ( (u == A.length) || (A[u] != K) ) return -1;
return u;
}
}
``````

3

As several others have said, the fix is simple, certainly the simplest I've seen for a 100-point bounty! Here is another, which has a nice symmetry, even if it takes a few more clock cycles:

``````m = (l >> 1) + (u >> 1) + (l & u & 1);
``````

You should not malign Bentley for "a bug" until you have better information. When he wrote that article for the ACM (some time in the 1980's I think), he was pseudocoding and writing in 32-bit C; machines with gigabytes of RAM didn't exist. Even if they had, with 4-byte ints, a 32-bit machine can't have an array with more than 2^28 ints. Therefore the highest possible index is 2^28-1. Doubling this value does not cause an in `int` to overflow.

Of course, it's exactly the same with 32-bit Java. You need the broken kludge of 64-bit Java - a language that allows objects of size approaching 2^64 but limits indices to 2^32-1 in order to cause this "bug" to appear.

What you call a bug is a change of operating assumptions. Every program in the universe will manifest some kind of flaw if the environment changes in the right way.

Readers of the book would have included coders for embedded devices using 16-bit integers. I would call it a bug, especially since it is in expository form.

And I suspect Bentley would too. – Matthew Hannigan – 2016-12-28T04:22:59.673

2

Try changing

``````m = (l+u) / 2
``````

to

``````m = l + (u - l) / 2
``````

It is trivial to see that both is equal and also the second statement prevents the overflow.

1

I think you should change u = m-l; to u = m -1.

it's 1 not l. ---------addtion------ cause (l+u) may be greater than 2^31-1 (if the int is 32bits), so it may overflow. so you should change (l+u)/2 to l+((u-l)>>1), and u-l cannot be greater than 2^31-1.

yeah :) changed – Jayram Singh – 2013-07-07T07:04:53.903

1

The iterative implementation binary search indeed has a bug. change `m = (l+u)/2` . As others have mentioned it can lead to integer overflow. Replace that by `m = l + (u-l)/2`.

From experience I have time and again seen buggy binary search implementations. Binary search although a simple concept involving divide and conquer can be difficult to get right. Its easy to change the above `m` assignment. Hope this helps...

Why not "m = l/2 + u/2" ? This would also prevent overflow, isn't it? However, divide operation is performed twice. What do you say? – kunal18 – 2016-06-13T00:46:57.533

2@stalin Consider the case where both L,U are odd. – netanelw – 2016-11-08T13:44:58.803

@netanelw: Yes, I didn't think about that. Thanks! – kunal18 – 2016-11-08T13:46:49.147

0

Although Machtl and others have already written the way to overcome the overflow bug but adding just for the sake of completeness

1. Use int mid = (low + high) >>> 1; faster way
2. And in C and C++ (where we don't have the >>> operator) mid = ((unsigned int)low + (unsigned int)high)) >> 1;