You have a vector of entries, say `[x1, x2, ..., xN]`

, and you're aware of the fact that the distribution of the queries is given with probability `1/x`

, on the vector you have. This means your queries will take place with that distribution, i.e., on each consult, you'll take element `xN`

with higher probability.

This causes your binary search tree to be balanced considering your labels, but not enforcing any policy on the search. A possible change on this policy would be to relax the constraint of a balanced binary search tree -- smaller to the left of the parent node, greater to the right --, and actually choosing the parent nodes as the ones with higher probabilities, and their child nodes as the two most probable elements.

Notice this **is not** a binary search tree, as you are not dividing your search space by two in every step, but rather a rebalanced tree, with respect to your search pattern distribution. This means you're worst case of search may reach `O(N)`

. For example, having `v = [10, 20, 30, 40, 50, 60]`

:

```
30
/ \
20 50
/ / \
10 40 60
```

Which can be reordered, or, **rebalanced**, using your function `f(x) = 1 / x`

:

```
f([10, 20, 30, 40, 50, 60]) = [0.100, 0.050, 0.033, 0.025, 0.020, 0.016]
sort(v, f(v)) = [10, 20, 30, 40, 50, 60]
```

Into a new *search tree*, that looks like:

```
10 -------------> the most probable of being taken
/ \ leaving v = [[20, 30], [40, 50, 60]]
20 30 ---------> the most probable of being taken
/ \ leaving v = [[40, 50], [60]]
40 50 -------> the most probable of being taken
/ leaving v = [[60]]
60
```

If you search for `10`

, you only need one comparison, but if you're looking for `60`

, you'll perform `O(N)`

comparisons, which does not qualifies this as a binary search. As pointed by @Steve314, the farthest you go from a fully balanced tree, the worse will be your worst case of search.

1Do you mean the distribution of the data, or the distribution of your queries? – krlmlr – 2013-06-01T12:29:02.823

let's say distribution of the queries (not sure if it makes a difference). The fact is that you're more likely to find the answer near start than at the end. – ic3 – 2013-06-01T12:33:25.647

1You may be better off keeping the binary search but adding a cache, i.e., keeping a list of recent queries and checking it for the current query each time. – MegaWidget – 2013-06-01T22:05:56.697

You might be looking for Splay Trees

– recursion.ninja – 2013-12-01T19:17:57.970can you give a usecase where keys in a dictionary have

`1/x`

distribution – arunmoezhi – 2015-05-08T19:42:01.017A Zipfian distribution approximates many real world situations but classically frequencies of words. The frequency of a word is inversely proportional to its rank, as well as to its length. The normal assumption would be thus that the dictionary is in rank order rather than dictionary order. However lexicographic order with length outranking alphabetic will tend to a 1/x distribution. This ordering is used for counting arguments in theory of computation, but there is evidence that human lexical ordering has such properties. – David M W Powers – 2015-11-21T13:41:05.267