155

30

I've seen programmers use the formula

```
mid = start + (end - start) / 2
```

instead of using the simpler formula

```
mid = (start + end) / 2
```

for finding the middle element in the array or list.

Why do they use the former one?

155

30

I've seen programmers use the formula

```
mid = start + (end - start) / 2
```

instead of using the simpler formula

```
mid = (start + end) / 2
```

for finding the middle element in the array or list.

Why do they use the former one?

212

There are three reasons.

First of all, `start + (end - start) / 2`

works even if you are using pointers, as long as `end - start`

doesn't overflow^{1}.

```
int *start = ..., *end = ...;
int *mid = start + (end - start) / 2; // works as expected
int *mid = (start + end) / 2; // type error, won't compile
```

Second of all, `start + (end - start) / 2`

won't overflow if `start`

and `end`

are large positive numbers. With signed operands, overflow is undefined:

```
int start = 0x7ffffffe, end = 0x7fffffff;
int mid = start + (end - start) / 2; // works as expected
int mid = (start + end) / 2; // overflow... undefined
```

(Note that `end - start`

may overflow, but only if `start < 0`

or `end < 0`

.)

Or with unsigned arithmetic, overflow is defined but gives you the wrong answer. However, for unsigned operands, `start + (end - start) / 2`

will never overflow as long as `end >= start`

.

```
unsigned start = 0xfffffffeu, end = 0xffffffffu;
unsigned mid = start + (end - start) / 2; // works as expected
unsigned mid = (start + end) / 2; // mid = 0x7ffffffe
```

Finally, you often want to round towards the `start`

element.

```
int start = -3, end = 0;
int mid = start + (end - start) / 2; // -2, closer to start
int mid = (start + end) / 2; // -1, surprise!
```

^{1} According to the C standard, if the result of pointer subtraction is not representable as a `ptrdiff_t`

, then the behavior is undefined. However, in practice, this requires allocating a `char`

array using at least half the entire address space.

16

We can take a simple example to demonstrate this fact. Suppose in a certain **large** array, we are trying to find the midpoint of the range `[1000, INT_MAX]`

. Now, `INT_MAX`

is the largest value the `int`

data type can store. Even if `1`

is added to this, the final value will become negative.

Also, `start = 1000`

and `end = INT_MAX`

.

Using the formula: `(start + end)/2`

,

the mid-point will be

`(1000 + INT_MAX)/2`

=`-(INT_MAX+999)/2`

, which isnegativeandmay give segmentation faultif we try to index using this value.

But, using the formula, `(start + (end-start)/2)`

, we get:

`(1000 + (INT_MAX-1000)/2)`

=`(1000 + INT_MAX/2 - 500)`

=`(INT_MAX/2 + 500)`

which will not overflow.

15

To add to what others have already said, the first one explains its meaning clearer to those less mathematically minded:

```
mid = start + (end - start) / 2
```

reads as:

mid equals start plus half of the length.

whereas:

```
mid = (start + end) / 2
```

reads as:

mid equals half of start plus end

Which does not seem as clear as the first, at least when expressed like that.

as Kos pointed out it can also read:

mid equals the average of start and end

Which is clearer but still not, at least in my opinion, as clear as the first.