First occurrence in a binary search

18

7

I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?

``````//ripped from the JDK
public static int binarySearchValue(InvertedContainer.InvertedIndex[] a, long key) {
return bSearchVal(a, 0, a.length, key);
}

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
int toIndex, long key) {
int low = fromIndex;
int high = toIndex - 1;

while (low <= high) {
int mid = (low + high) >>> 1;
long midVal = a[mid].val;

if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
return mid; // key found
}
}
``````

1

Cool - I get to rep-whore my own question and answer... http://stackoverflow.com/questions/4948162/how-can-i-better-understand-the-one-comparison-per-iteration-binary-search explains a form of the binary search that can find the first item > or >=, or the last item < or <=.

– Steve314 – 2011-07-13T08:57:02.797

Hah, thanks! I'll take a look. Every once in a while I notice something like this and I think 'you know nothing'. – Amir Afghani – 2011-07-13T08:59:35.563

9

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

2This solution has O(N) time complexity as there can be up to N elements with the value you are searching for. – Juan Martinez – 2015-01-06T17:30:57.977

@JuanMartinez: Hence the "efficient enough unless you've got a really large number of equal entries" bit at the end of the answer. – Jon Skeet – 2015-01-06T18:15:23.620

Hi Jon - so I need a while loop in the else clause? I understand the idea, but it seems like it would look kludgish, no? – Amir Afghani – 2011-07-13T08:55:33.657

BTW, I'm a big fan. Thanks for your response so far. – Amir Afghani – 2011-07-13T08:56:49.710

1@Amir: That's certainly one way of doing it. Another is to leave the method as it is (preferably with a name change :) but offer another method which will find the first one, by calling the original method and then performing the while loop (if there was no match). Note that in APIs I've used, the insertion point is returned as a negative value (using `~`) to indicate a failure to find the value, but also indicating the insertion point at the same time. – Jon Skeet – 2011-07-13T08:59:12.803

43

An addition to Jon Skeets post:

The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:

``````    if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else if (low != mid) //Equal but range is not fully scanned
high = mid; //Set upper bound to current number and rescan
else //Equal and full range is scanned
return mid;
``````

2first item: `mid = (unsigned int) floor((low + high) / 2.0);` last item: `mid = (unsigned int) ceil((low + high) / 2.0);` – Vaibhav Bajpai – 2012-08-12T14:18:23.847

@bezmax i need a little help of yours... i want to know whether this code will work faster than the code in this link https://www.hackerearth.com/notes/searching-code-monk/ please see the code for first occurrence

– Ashwani Kumar Rahul – 2015-12-22T13:41:23.493

@VaibhavBajpai where / how does that fit in? – jedierikb – 2016-07-15T21:05:17.877

1Awesome!!!!!!!! – Amir Afghani – 2011-07-13T23:06:27.577

1How would you change this to work for a last index of instead of a first index of? – Amir Afghani – 2011-07-14T02:08:15.537

1Simply invert the high and low (didnt test if that works): `.... else if (high != mid) low = mid; else return mid;` – bezmax – 2011-07-14T07:19:08.317

3

You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):

``````public static int binarySearch(int[] a, int key) {
int low=0,high=a.length-1;
while (low<=high) {
int mid=(low+high) >>> 1;
int midVal=a[mid];
if (midVal < key)
low=mid+1;
else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
high=mid-1;
else if (midVal==key) //found the 1st key
return mid;
else
return ~mid;      //found insertion point
}
return ~(a.length);       //insertion point after everything
}
``````

It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.

2

If your data is all integral, then this hack can help. It uses a float array to store values.

``````float array[];    //contains all integral values
int searchValue;

int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);
``````

Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.

Also this runs is log(n) time.

Example:

``````import java.util.Arrays;

public class BinarySearch {
// considering array elements are integers
float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };

public void returnFirstOccurrence(int key) {
int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
if (ar[firstIndex] != key)
System.out.println("Key doesn't exist");
else
System.out.println("First index of key is " + firstIndex);
}

public static void main(String Args[]) throws Exception {
new BinarySearch().returnFirstOccurrence(9);
}

}
``````

OUTPUT: 7

p.s: I've used this trick on several coding contests and it nicely worked everytime.

Could you explain? How does this get the first index of occurrence? – nawfal – 2014-06-15T21:53:48.890

Binary search implementation in java Collections either returns the index of the number OR if the number is not present it returns the index of position where the number can be inserted. see link). Also I've edited to include an example.

– Arnab Das – 2014-07-15T09:51:27.997

got it. It's not just hacky but extremely hacky :) Not only that it works only for integers, but still you would want a `float[]` to hold them all. In case the client originally has an `int[]`, then creating a new `float[]` might have some cost. You might want to make the two conditions written in bold :) – nawfal – 2014-07-15T10:17:04.497

1As I said, I only use it in contests. I'm a student and don't have any industrial experience but I agree it would be very inappropriate and utterly confusing to use it in a production code. – Arnab Das – 2014-07-23T11:12:17.233

2

You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).

1Although this is called a lower bound in the C++ library, it's still a binary search - at least according to my old copy of Algorithms and Data Structures (Modula 2 Edition) by Niklaus Wirth. Maybe it's a matter of opinion, though, where the boundary is between "different algorithm" and "variant of the same algorithm". – Steve314 – 2011-07-13T09:09:22.180

"Binary search" as implemented by many (most?) libraries (e.g. C, C++/STL, Java) does not specify which index is returned when multiple keys are present. This was also the problem in the posed question. "Lower bound" specifies exactly the result. – Jiri Kriz – 2011-07-13T09:21:29.867

Agreed, but good names for library functions aren't necessarily the same as in the algorithms textbook, especially when the library may have several variants. BTW, I didn't mean to claim you're wrong about anything, and I upvoted your answer. – Steve314 – 2011-07-13T09:45:12.787

Thanks @Steve314 :) – Jiri Kriz – 2011-07-13T09:59:32.563

1

here is the solution, i found for getting the lower index of key having multiple occurrences in a sorted array using binary search.

``````int lowerBound(int[] array,int fromIndex, int toIndex, int key)
{
int low = fromIndex-1, high = toIndex;
while (low+1 != high)
{
int mid = (low+high)>>>1;
if (array[mid]< key) low=mid;
else high=mid;
}
int p = high;
if ( p >= toIndex || array[p] != key )
p=-1;//no key found
return p;
}
``````

we have to change a little in this code to work for upper bound, using binary search, so here is the working copy of code.

`````` int upperBound(int[] array,int fromIndex, int toIndex, int key)
{
int low = fromIndex-1, high = toIndex;
while (low+1 != high)
{
int mid = (low+high)>>>1;
if (array[mid]> key) high=mid;
else low=mid;
}
int p = low;
if ( p >= toIndex || array[p] != key )
p=-1;//no key found
return p;
}
``````

1

The following algorithm binary-searches for the first item with a key greater-than-or-equal-to your search key...

``````while (upperbound > lowerbound)
{
testpos = lowerbound + ((upperbound-lowerbound) / 2);

if (item[testpos] >= goal)
{
//  new best-so-far
upperbound = testpos;
}
else
{
lowerbound = testpos + 1;
}
}
``````

This isn't written for Java, which I don't know that well, so it may need a minor tweak. Note that the bounds are half-open (lowerbound is inclusive and upperbound is exclusive) and that this is important for correctness.

This can be adapted to other similar searches - e.g. finding the last key <= the search value.

This is slightly modified from my earlier question-and-answer here.

1while it's trivial to extend, just want to point out that OP wants the first occurrence "equal to", not "greater than or equal to". – nawfal – 2014-06-15T20:41:41.260

0

Here's a variation of the solution in scala. Used pattern-matching and recursion instead of the while loop to get the first occurrence.

``````def binarySearch(arr:Array[Int],key:Int):Int = {
def binSearchHelper(lo:Int,hi:Int,mid:Int):Int = {
if(lo > hi) -1 else {
if(arr(mid) == key) mid else if(arr(mid) > key){
binSearchHelper(lo,mid-1,lo + (((mid-1) - lo)/2))
}else{
binSearchHelper(mid+1,hi,(mid+1) + ((hi - (mid+1))/2))
}
}
}
binSearchHelper(0,arr.size-1,(arr.size-1)/2)
}

def findFirstOccurrence(arr:Array[Int],key:Int):Int = {
val startIdx = binarySearch(arr,key)
startIdx match {
case 0 => 0
case -1 => -1
case _ if startIdx > 0 => {
if(arr(startIdx - 1) < key) startIdx else {
findFirstOccurrence(arr.slice(0,startIdx),key)
}
}
}
}
``````

0

This should do the trick

``````private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
int toIndex, long key) {
int low = fromIndex;
int high = toIndex - 1;
int result = low;
while (low <= high) {
int mid = (low + high) >>> 1;
long midVal = a[mid].val;

if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
{
result = mid;
high = mid -1;
}
}
return result;
``````

}

0

For the last occurrence of the element:

``````static int elementExists(int input[], int element){
int lo=0;
int high = input.length-1;
while(lo<high){
int mid = (lo + high )/2;
if(element >input[mid] ){
lo = mid+1;
}
else if(element < input[mid]){
high= mid-1;
}
else if (high != input.length-1) //Change for the Occurrence check
lo = mid;
else {
return mid;
}
}
return -1;
}
``````

For the first occurrence:

``````else if (lo != mid){
high = mid;
}
``````

0

One approach is to hold an invariant throughout the whole binary search. In your particular case, the invariant would be:

• `array[low] < key`
• `key <= array[high]`

Then you can minimize the gap between low and high using binary search. When `low + 1 == high`, `high` would be the answer. Example code in C++:

``````// check invariant on initial values.
if (array[low] >= key) return low;
if (array[high] < key) return high+1;
// low + 1 < high ensures high is at least low + 2, thus
// mid will always be different from low or high. It will
// stop when low + 1 == high.
while (low + 1 < high) {
int mid = low + (high - low) / 2;
if (array[mid] < key) {
low = mid;   // invariant: array[low] < key
} else {
high = mid;  // invariant: array[high] >= key
}
}
return high;
``````

Key difference between this and your example code is to update `low` and `high` to only `mid` rather than `mid+1` or `mid-1`, because we have checked the value of `array[mid]`, we can guarantee the invariant still holds when updating the boundaries. You need to check the invariant on initial values before starting to search too.

0

I think a simpler approach is storing the latest `mid` index where `xs[mid] == key` into a result variable and then keep running the binary search.

Here's the swift code:

``````func first<T: Comparable>(xs: [T], key: T) -> Int {
var lo = xs.startIndex
var hi = xs.endIndex - 1
var res = -1
while lo <= hi {
let mid = lo + (hi - lo) >> 1
if xs[mid] == key { hi = mid - 1; res = mid }
else if xs[mid] < key { lo = mid + 1}
else if xs[mid] > key { hi = mid - 1 }
}

return res
}
``````

Also, this requires a really small change (just one line) if you were to find the last index of a key.

``````func last<T: Comparable>(xs: [T], key: T) -> Int {
var lo = xs.startIndex
var hi = xs.endIndex - 1
var res = -1
while lo <= hi {
let mid = lo + (hi - lo) >> 1
if xs[mid] == key { lo = mid + 1;  res = mid }
else if xs[mid] < key { lo = mid + 1}
else if xs[mid] > key { hi = mid - 1 }
}

return res
}
``````

0

Try this javascript recursive solution. It's optimal in a sense that it's O(log(N))

``````function solve(A, e) {
function solve (A, start, end, e, bestUntilNow) {
if (start > end) {
if (A[start] === e)
return start
return bestUntilNow
} else {
const mid = start + Math.floor((end - start) / 2)
if (A[mid] === e) {
return solve(A, start, mid - 1, e, mid)
} else if (e < A[mid]) {
return solve(A, start, mid - 1, e, bestUntilNow)
} else {
return solve(A, mid + 1, end, e, bestUntilNow)
}
}
}
return solve(A, 0, A.length, e, -1)
}
``````