Fast average without division

9

1

I have a binary search loop which gets hit many times in the execution path.

A profiler shows that the division part of the search (finding the middle index given the high and low indices of the search range) is actually the most costly part of the search, by a factor of about 4.

(I think) it is not critical for efficient binary search to find the exact middle value, just a value near the middle which does not have bias in either direction.

Is there a bit-twiddling algorithm to replace mid = (low + high) / 2 with something much faster?

Edit: Language is C#, but the equivalent bit-operation is valid in any language (although it may be of no performance benefit), which is why I left the C# tag off.

Nick

Posted 2009-06-19T21:51:31.677

Reputation: 9 354

I'm skeptical that the finding of the midpoint is the bottleneck here. An integer division by two should be compiled to a right shift. Are high, low, and mid declared as integers? I'd love to see your whole binary search routine. I think we're missing something here. – Nosredna – 2009-06-19T22:57:27.517

It is certainly curious that there is that big of a skew - so I'd be interested to see the whole routine too. – Jonathan Leffler – 2009-06-19T23:46:19.837

@Jonathan Leffler--Yeah, it doesn't make sense to me. Either something's accidentally declared or cast to a float rather than an integer, or the profile is is misleading, or there's just not much in the binary search and there's not much to squeeze out. I'd like to see the compiled output of the routine, too. If the implementation is iterative, the conditionals should be more expensive than the average. If it's recursive, the function call overhead should be more expensive. And doesn't C# have an Array.BinarySearch method already? – Nosredna – 2009-06-19T23:57:15.353

@onebyone.livejournal.com--that's a good idea. I've certainly seen cases where profiles didn't make sense until I looked at the object code generated by the compiler. – Nosredna – 2009-06-20T01:14:13.450

How big are your arrays? Have you tried using a linear search instead? – Michael Myers – 2009-06-19T21:57:21.320

3This really isn't something that can be language agnostic - the details of how fast this sort of operation is are very platform specific. It's entirely possible that if you are dealing with a dynamically typed language that the division is being done in floating point math, or that a big-int structure is being used. It's also expected in most statically typed languages that something like (low + high) / 2 will be automatically optimized to an add and an arithmetic right shift. – Eclipse – 2009-06-19T22:27:19.560

1"just a value near the middle which does not have bias in either direction." Doesn't your integer division by 2 have a bias already? – Nosredna – 2009-06-19T22:51:54.487

Answers

11

int mid = (low + high) >>> 1;

Be advised that using "(low + high) / 2" for midpoint calculations won't work correctly when integer overflow becomes an issue.

Jeff Moser

Posted 2009-06-19T21:51:31.677

Reputation: 16 181

+1 for the right answer, but I do have to comment, "a lot of binary searches are broken" is a bit sensationalist. More like "a lot of binary search implementations contain an overflow bug that occurs with large numbers of items." – Not Sure – 2009-06-19T22:00:21.423

Looks like I updated the text just before you made this comment. Is it better now? :) – Jeff Moser – 2009-06-19T22:01:46.973

2is this actually faster? – Jimmy – 2009-06-19T22:02:14.523

I'm with Jimmy on this one... I'd expect the compiler to be making this optimization already. I'd be very surprised if it's faster. – rmeador – 2009-06-19T22:11:41.417

Shouldn't be any faster with a good compiler, but the fastest way to tell is to try it. – UncleO – 2009-06-19T22:13:06.827

2And isn't it only >> ? – UncleO – 2009-06-19T22:14:25.373

Question doesn't specify language. Not all languages are compiled. Though how anyone could talk about "faster" without even specifying language, I do not know... Personally, I would happily yell at any compiler-writer who didn't optimise division-by-static-constant-2 to right-shift-1, assuming unsigned ints. Right shift of negative values is tricksy in C, and in a binary chop I doubt you need it, so use an unsigned int in case it helps. – Steve Jessop – 2009-06-19T22:27:33.427

In case anyone's wondering: in C/C++, right shift of negative values might be a logical, not arithmetic shift, in which case it plain doesn't work for division. Furthermore, even with arithmetic shift, -3 >> 1 == -2, which isn't the answer you want. If your type is int, then the compiler doesn't know your values are always positive, so it can't just replace division with shift even if you know that it would work. – Steve Jessop – 2009-06-19T22:31:16.010

Language is C#. I'll benchmark on Monday to see if I gained anything with the change. – Nick – 2009-06-19T22:44:58.277

@uncleo, yeah Java has >>> but C# doesn't. – Nosredna – 2009-06-20T02:00:35.590

2After benchmarking, this has significant (4x) performance gain over the division. I used ANTS profiler. – Nick – 2009-06-22T12:53:47.540

18

Here is a bit-hack version of the average that does not suffer from the overflow problem:

unsigned int average (unsigned int x, unsigned int y)
{
  return (x&y)+((x^y)>>1);
}

Nils Pipenbrinck

Posted 2009-06-19T21:51:31.677

Reputation: 64 271

@Nils: Yes indeed, on modern CPUs it is the unpredictable branches of a binary search that are the speed killers. – Zan Lynx – 2009-11-13T05:15:48.223

@nilspipenbrinck can you kindly add an explanation of the solution. I mean it works but how did you find the solution? How can I understand it? – Undefined – 2017-01-14T19:14:50.613

@baltusaj Uh, no. I can't explain how that works within the context of a comment (not having the time). However you're free to ask exactly that as a question. I'm sure the bit-twiddlers around here will be happy to share their knowledge on this topic. – Nils Pipenbrinck – 2017-01-14T21:16:13.577

1Wow. I LOVE that! Of course, I would expect the add, subtract, and shift of the usual solution to be at least as fast. But you've scored major coolness points there! – Nosredna – 2009-06-20T02:12:27.827

IMHO on a modern pc the performance differences between the versions does not makes a difference anymore. – Nils Pipenbrinck – 2009-06-20T02:58:54.770

7

You can use bit shifting and also overcome a possible overflow issue:

low + ((high-low) >> 1)

However I must admit I expect modern compilers and interpreters to do division by 2 (or division by any other constant power of 2) as bit-shifting, so not sure if it will really help - try it out.

Roee Adler

Posted 2009-06-19T21:51:31.677

Reputation: 16 652

4

To further expand on Nils' answer Richard Schroeppel invented this.

http://www.inwap.com/pdp10/hbaker/hakmem/boolean.html#item23

ITEM 23 (Schroeppel):

(A AND B) + (A OR B) = A + B = (A XOR B) + 2 (A AND B).

(A + B)/2 = ((A XOR B) + 2(A AND B))/2
          =  (A XOR B)/2  + (A AND B)
          =  (A XOR B)>>1 + (A AND B)


avg(x,y){return((x^y)>>1)+(x&y);}

(A AND B) + (A OR B) = A + B because A AND B gives the sum of the shared (between A and B) powers of two, A OR B gives both those shared and those that aren't, hence:

(A AND B) + (A OR B) = 
   (sum of shared powers of two) + 
   ((sum of shared powers of two) + (sum of unshared powers of two)) = 
     (sum of shared powers of two) + 
     ((sum of shared powers of two) + (sum of powers of two of A only) + 
     (sum of powers of two of B only)) = 
       ((sum of shared powers of two) + (sum of powers of two of A only)) + 
       ((sum of shared powers of two) + (sum of powers of two of B only)) 
= A + B. 

A XOR B gives a map of those bits that differ between A and B. Hence,

A XOR B = (sum of powers of two of A only) + (sum of powers of two of B only). 

And thus:

2(A AND B) + (A XOR B) = 
       ((sum of shared powers of two) + (sum of powers of two of A only)) + 
       ((sum of shared powers of two) + (sum of powers of two of B only)) 
= A + B.

Kais

Posted 2009-06-19T21:51:31.677

Reputation: 41

That is the same one as Nils's one above, except you switched the terms. – wildplasser – 2012-05-06T11:48:01.677

Here. – Kais – 2012-05-06T14:13:27.220

0

If I recall correctly, there are some cases where using the exact middle of the array can actually be slower. The solution is to randomize the choice of the index where you bisect the array. Equally true of the algorithm for determining the median of an array.

I can't recall the precise details, but I remember hearing in lecture 6 of the MIT algorithms series on iTunes.

duffymo

Posted 2009-06-19T21:51:31.677

Reputation: 267 937

I'd worry about getting too tricky. You want to make sure that you hit them all at the end. Maybe just randomize if the distance between high and low is greater than a certain number. – Nosredna – 2009-06-20T01:20:45.623

I'd check out a good algorithms book to see the details. – duffymo – 2009-06-20T01:41:31.460

0

Try low + ((high - low) / 2)). This should work because you're only taking the average of two numbers. This would reduce the amount of time the algorithm is taking if the binary search list is fairly big, since high - low is much smaller than high + low.

user3917838

Posted 2009-06-19T21:51:31.677

Reputation:

Entirely equivalent to Roee Adler's answer from '09.

– greybeard – 2015-07-15T19:36:49.067

@greybeard I came up with this myself, though. – None – 2015-07-19T16:48:29.307