Difference between basic binary search for upper bound and lower bound?



In the article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binarySearch, the author discusses binary search. He makes a distinction between finding the lowest value where something is true, and the highest value where something is false. The array being searched looks something like:

false false false true true

I am curious as to why these two cases are different. Why can't you just find the lowest value which is true, then subtract one to find the highest value which is false?

Edit2: Ok, so I understand lower vs upper bound. Now, I am struggling to understand, when searching for the smallest integer greater than or equal to the query, why we can't just change the if(mid>query) to if(mid>=query) and have it do lower instead of upper bound.

Edit: Here is what the article stated:

"Now we finally get to the code which implements binary search as described in this and the previous section:

binary_search(lo, hi, p):
   while lo < hi:
      mid = lo + (hi-lo)/2
      if p(mid) == true:
         hi = mid
         lo = mid+1

   if p(lo) == false:
      complain                // p(x) is false for all x in S!

   return lo         // lo is the least x for which p(x) is true


If we wanted to find the last x for which p(x) is false, we would devise (using a similar rationale as above) something like:

binary_search(lo, hi, p):
   while lo < hi:
      mid = lo + (hi-lo+1)/2    // note: division truncates
      if p(mid) == true:
         hi = mid-1
         lo = mid

   if p(lo) == true:
      complain                // p(x) is true for all x in S!

   return lo         // lo is the greatest x for which p(x) is false


John Targaryen

Posted 2015-02-08T00:15:03.510

Reputation: 459

2Well, im assuming that binary search implies the set looks something like false .... false true ... true no matter what – John Targaryen – 2015-02-08T00:18:08.977

The article im referring to implies that this is the case if we are performing binary search; i beleive thats also a necessity for binary search to even apply to the situation. – John Targaryen – 2015-02-08T00:27:46.843

@DietmarKühl sure, but couldnt you easily check that with like if(lo==0&amp;&amp;works(lo)==true)return false? – John Targaryen – 2015-02-08T00:29:30.830



The lower and upper bound of a binary search are the lowest and highest position where the value could be inserted without breaking the ordering. (In the C++ standard library, these bounds will be represented by iterators referencing the element before which the value could be inserted, but the concept is not essentially changed.)

Take, for example, a sorted range

1 2 3 4 5 5 5 6 7 9

In a binary search for 3, we will have

   v-- lower bound
1 2 3 4 5 5 5 6 7 9
     ^-- upper bound

And in a binary search for 5:

       v-- lower bound
1 2 3 4 5 5 5 6 7 9
             ^-- upper bound

The lower and upper bound are the same if the element does not exist in the range. In a binary search for 8:

                 v-- lower bound
1 2 3 4 5 5 5 6 7 9
                 ^-- upper bound

The author of the article to which you refer phrases all this in the equivalent terms of "smaller than" and "greater than" so that in a search of 5,

       v-- lower bound
t t t t f f f f f f      <-- smaller than?
1 2 3 4 5 5 5 6 7 9
f f f f f f f t t t      <-- greater than?
             ^-- upper bound

The C++ iterators will, in all these cases, refer to the element directly behind the bound. That is to say:

  • In the search for 3, the iterator returned by std::lower_bound would refer to 3 and the one from std::upper_bound would refer to 4
  • In the search for 5, the iterator returned by std::lower_bound would refer to the first 5 and the one from std::upper_bound would refer to 6
  • In the search for 8, both would refer to 9

This is because the convention in the C++ standard library for insertions is to pass an iterator referring to the element before which the new element should be inserted. For example, after

std::vector<int> vec { 1, 3, 4, 5, 5, 5, 6, 7, 9 };
vec.insert(vec.begin() + 1, 2);

vec would contain 1, 2, 3, 4, 5, 5, 5, 6, 7, 9. std::lower_bound and std::upper_bound follow this convention so that

vec.insert(std::lower_bound(vec.begin(), vec.end(), 5), 5);
vec.insert(std::upper_bound(vec.begin(), vec.end(), 8), 8);

work as desired and leave vec sorted.

More generally, this is an expression of the way ranges are specified in the C++ standard library. The beginning iterator of a range refers to the first element of the range (if any), while the ending iterator refers to the element (if any) directly behind the end of the range. Another way to look at it is that the iterators returned by std::lower_bound and std::upper_bound span the range of elements in the searched range that are equivalent to the searched element.

This range is empty if the element is not in the range, so that lower_bound and upper_bound return the same iterator, and otherwise lower_bound returns an iterator referring to the first element in the searched range that's equivalent to the search value while upper_bound returns an iterator referring to the element (if any) that's directly behind the last such element.


Posted 2015-02-08T00:15:03.510

Reputation: 34 543

Ah, I hadn't considered the case where multiple values are the same as the query. However, in your third example when the element doesn't exist in the range, isn't upper bound 9 and lower bound 7? – John Targaryen – 2015-02-08T00:32:20.917

In C++ standard library terms, the iterators you get from lower_bound and upper_bound would both reference 9 because before this element is both the lowest and highest place where an 8 could be inserted. The place where the element could really be inserted will always be one of the gaps or ends, though. – Wintermute – 2015-02-08T00:35:30.680

1lower_bound and upper_bound act in accordance with general iterator conventions in the stdlib there -- it's the same for vector::insert, where passing vec.begin() + 1 will make it insert the new element before the current second element, and other, similar contexts. This is so that you can pass the result of lower_bound and upper_bound directly to these functions and have them do the right thing. – Wintermute – 2015-02-08T00:39:24.137

1@JoeBob lower_bound is the first element that is not less than 8, and upper_bound is the first element that is greater than 8. In both cases, that is the 9. – Barry – 2015-02-08T00:40:43.410


If the array will always be

false … true …

Then the index before the one you find will always be false unless you find true at index 0. Another boundary case, as mentioned in my comment above, is if you don't find true. Then, the highest false will be the last part of the array.


Posted 2015-02-08T00:15:03.510

Reputation: 8 861

Couldn't you take care of both of these with simple boolean if checks? For instance, if(array[0]==true||array[array.size]==false)return false? Also, how would the change in code fix this problem? – John Targaryen – 2015-02-08T00:33:21.100

@JoeBob That's sort of the point. If x is the index for true, x-1 is not necessarily the bound for false. You need to say if x &gt; 0 &amp;&amp; !array[x-1] (second part optional). – royhowie – 2015-02-08T00:34:57.483


The two algorithms obviously differ in the condition of what should happen if there are is no true or no false value as is actually quite obvious from the code snippet: if you find the lowest value where the value is true and subtract 1 from this position to find the highest value yielding false an incorrect result is produced as there is no such object. Since the algorithms simply target different elements dealing with locating the appropriate element directly rather than having a special case also avoids having to deal with a special case, reducing the amount of code. Since special case code tends to be executed only once for each algorithm invocation it is likely to perform slightly worse than avoiding the special case. This is something which may be worth measuring.

Note that the code example isn't C++ despite the question being tagged C++. As a result it isn't idiomatic C++. The typical approach in C++ to implement something like lower_bound() or upper_bound() is to use appropriate iterators. These algorithms wouldn't "complain" if there is no suitable element as they'd just produce an iterator the appropriate position, i.e., an iterator to the start for std::lower_bound() and a past-the-end iterator for std::upper_bound().

Dietmar Kühl

Posted 2015-02-08T00:15:03.510

Reputation: 124 519

Ah, I tagged it c++ for that exact reason. I wasnt quite sure if lower_bound was supposed to return smallest element grater than query, or largest element smaller than query. Also, I didn't quite understand what you mean by "Since special case code tends to be executed only once for each algorithm invocation it is likely to perform slightly worse than avoiding the special case." How would it perform slightly worse? A single if statement would be the only difference between the two, so the difference would be negligible. – John Targaryen – 2015-02-08T01:36:50.957