Fixing Binary search bug from Bentley's book (programming pearls: writing correct programs)

9

2

Binary search can be implemented in many ways-recursive, iterative, conditionals, etc. I took this from Bentley's book "Programming pearls: Writing correct programs" which is an iterative implementation, and that includes a bug.

 public class BinSearch 
    {
       static int search( int [] A, int K ) {
          int l = 0;
          int u = A. length -1;
          int m;
          while ( l <= u ) {
              m = (l+u) /2;
              if (A[m] < K){
              l = m + 1;
              } else if (A[m] == K){
                    return m;
              } else {
                    u = m-1;
              }
         }
    return -1;
    }
}

I found a bug in the line m = (l+u) /2; it can lead to overflow. How can we avoid this overflow in this binary search?

Jayram Singh

Posted 2013-06-28T06:47:26.767

Reputation: 10 589

Should not while ( l &lt;= U ) be while ( l &lt;= u ) instead ? – Machtl – 2013-07-07T06:51:54.050

i took it from the book...programming pearls:writing correct programs. – Jayram Singh – 2013-07-07T06:52:36.743

U will be unknown identifier and u makes a lot more sense. Have you not tried to compile it? – Machtl – 2013-07-07T07:01:55.220

I changed it @Machtl :) – Jayram Singh – 2013-07-07T07:02:51.130

9

This issue was identified several years ago - some analysis exists at http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

– JeremyDWill – 2013-07-09T20:11:57.947

1Usually set m = l + (u - l) / 2; is just fine. The key principle of binary search is keep precondition unchanged in your binary operation halving the interval. – lulyon – 2013-07-13T09:18:04.323

Answers

10

Try the following:

change

m = (l+u) /2

to

m = (u-l) / 2 + l

The reason why the (l+u) / 2 can overflow becomes obvious if you consider a very large array of 2^31 - 1 elements (maximum value a signed 32-bit integer can hold). In this case the first iteration is just fine because 2^31 - 1 + 0 is not a big deal but consider the case of l = m + 1 here. In the second iteration u is still the same and l is 2^31 / 2 so l + u will lead to an overflow.

This way we are avoiding the addition of u + l by first determining the relative middle between l and u (u - l) / 2 and then adding the lower number l to it so it becomes absolute. So during the operation m = (u-l) / 2 + l; we never exceed the value of u.

To summarize the complete code:

public class BinSearch 
{
    static int search( int [] A, int K ) 
    {
        int l = 0;
        int u = A. length -1;
        int m;

        while ( l <= u ) 
        {
            m = (u-l) / 2 + l;

            if (A[m] < K)
                l = m + 1;
            else if (A[m] == K)
                return m;
            else
                u = m - 1;
        }
        return -1;
     }
}

Machtl

Posted 2013-06-28T06:47:26.767

Reputation: 641

Technically this problem can occur on an array with anywhere between 2^30(+1?) and 2^31-1 elements. – Dukeling – 2013-07-10T13:36:11.333

4

Suppose l and u are int both fall into [0, 2^31-1]. If l, u >= 2^30, then (l+u) >= 2^31 is overflow. To avoid this, use

m = l + (u-l)/2; 

instead. What is more, it may be more reasonable to write binary search like this:

    public class BinSearch 
    {
        static int search( int [] A, int K ) {
            int l = -1;             // index lower bound shift left by 1
            int u = A.length;       // index upper bound shift right by 1
            int m;
            while ( l + 1 < u ) {
                m = l + (u-l)/2;    // avoid overflow
                if (A[m] < K){
                    l = m;          // keep A[l] < K 
                } else {
                    u = m;          // keep A[u] >= K
                }
            }
            if ( (u == A.length) || (A[u] != K) ) return -1;
            return u;
        }
    }

frankyym

Posted 2013-06-28T06:47:26.767

Reputation: 141

3

As several others have said, the fix is simple, certainly the simplest I've seen for a 100-point bounty! Here is another, which has a nice symmetry, even if it takes a few more clock cycles:

m = (l >> 1) + (u >> 1) + (l & u & 1);

You should not malign Bentley for "a bug" until you have better information. When he wrote that article for the ACM (some time in the 1980's I think), he was pseudocoding and writing in 32-bit C; machines with gigabytes of RAM didn't exist. Even if they had, with 4-byte ints, a 32-bit machine can't have an array with more than 2^28 ints. Therefore the highest possible index is 2^28-1. Doubling this value does not cause an in int to overflow.

Of course, it's exactly the same with 32-bit Java. You need the broken kludge of 64-bit Java - a language that allows objects of size approaching 2^64 but limits indices to 2^32-1 in order to cause this "bug" to appear.

What you call a bug is a change of operating assumptions. Every program in the universe will manifest some kind of flaw if the environment changes in the right way.

Gene

Posted 2013-06-28T06:47:26.767

Reputation: 37 371

Readers of the book would have included coders for embedded devices using 16-bit integers. I would call it a bug, especially since it is in expository form.

And I suspect Bentley would too. – Matthew Hannigan – 2016-12-28T04:22:59.673

2

Try changing

m = (l+u) / 2

to

m = l + (u - l) / 2

It is trivial to see that both is equal and also the second statement prevents the overflow.

Ayberk

Posted 2013-06-28T06:47:26.767

Reputation: 338

1

I think you should change u = m-l; to u = m -1.

it's 1 not l. ---------addtion------ cause (l+u) may be greater than 2^31-1 (if the int is 32bits), so it may overflow. so you should change (l+u)/2 to l+((u-l)>>1), and u-l cannot be greater than 2^31-1.

陆彦帑

Posted 2013-06-28T06:47:26.767

Reputation: 21

yeah :) changed – Jayram Singh – 2013-07-07T07:04:53.903

1

The iterative implementation binary search indeed has a bug. change m = (l+u)/2 . As others have mentioned it can lead to integer overflow. Replace that by m = l + (u-l)/2.

From experience I have time and again seen buggy binary search implementations. Binary search although a simple concept involving divide and conquer can be difficult to get right. Its easy to change the above m assignment. Hope this helps...

Srikar Appalaraju

Posted 2013-06-28T06:47:26.767

Reputation: 48 407

Why not "m = l/2 + u/2" ? This would also prevent overflow, isn't it? However, divide operation is performed twice. What do you say? – kunal18 – 2016-06-13T00:46:57.533

2@stalin Consider the case where both L,U are odd. – netanelw – 2016-11-08T13:44:58.803

@netanelw: Yes, I didn't think about that. Thanks! – kunal18 – 2016-11-08T13:46:49.147

0

Although Machtl and others have already written the way to overcome the overflow bug but adding just for the sake of completeness

  1. Use int mid = (low + high) >>> 1; faster way
  2. And in C and C++ (where we don't have the >>> operator) mid = ((unsigned int)low + (unsigned int)high)) >> 1;

Prakhar

Posted 2013-06-28T06:47:26.767

Reputation: 515