C++ STL: Binary Search Tree Implementation?

27

12

Do you know, please, if C++ STL contains a Binary Search Tree (BST) implementation, or if I should construct my own BST object?

In case STL conains no implementation of BST, are there any libraries available?

My goal is to be able to find the desired record as quickly as possible: I have a list of records (it should not be more few thousands.), and I do a per-frame (its a computer game) search in that list. I use unsigned int as an identifier of the record of my interest. Whatever way is the fastest will work best for me.

Bunkai.Satori

Posted 2011-02-22T23:00:53.250

Reputation: 2 029

Do you want a simple BST in specific or a self balancing tree such as an AVL? – Argote – 2011-02-22T23:05:10.163

7Ask about the goal, not the step. Why do you want a binary-search tree? – GManNickG – 2011-02-22T23:05:49.943

@Argote: My wish is to find a way how to find a desired record in the list of 1000's of records as quickly as possible. The identifier I use is unsigned integer. In my opinion, simple but full featured BST (allowing me to insert, delete, traverse) should be all I need. – Bunkai.Satori – 2011-02-22T23:09:12.173

1@Bunkai.Satori: well a BST could work for that, however in the worst case you would be looking at O(n) instead of O(log n) for retrievals (that is, if the tree is completely unbalanced). For what you want std::map or std::set would probably work best (the former if you want to store a key and a value it points to or the latter if you just care about storing keys). – Argote – 2011-02-22T23:11:44.573

@GMan: Let me modify my question a bit, but hopefully, I gave enough information in my comment above: I wish to quickly find a desired record in thousand's of records. I wish to avoid traversing the container from the beginning, if there is faster way of doing what I need. Because this is for computer game, the speed is the most importand for me. – Bunkai.Satori – 2011-02-22T23:12:13.747

@Argote: yes, I know that simple BST woud give me O(n) performance at worst, but if the tree is well sorted and organized, that should be highly unlikely. Would you have any good libraries in mind offering both BST and AVL, in case I decide for AVL at the end? – Bunkai.Satori – 2011-02-22T23:15:04.450

3If speed is your goal, why not use a set of libraries with a hash map instead of a binary tree? – corsiKa – 2011-02-22T23:17:41.653

@glowcoder: hash maps are not known to me, therefore I was focused on binary tree. I will study about hash maps. Thank you for this good tip. – Bunkai.Satori – 2011-02-22T23:20:34.370

2@Bunkai sure thing. Let me offer you a better tip (from someone with a dozen games under his belt) : Don't focus on the speed of the game loop right now. Focus on getting your game playable in a demo state. Little things like "I need a better performing set than this" is an easy thing to fix. Hell, you can use a vector at this stage and you're still fine. Don't let something like "I need this performance boost" stop you from moving forward with your game! – corsiKa – 2011-02-22T23:23:29.373

2@Bunkai: And that's exactly why you should ask about your goals and not the step, so your horizon can expand. :) – GManNickG – 2011-02-22T23:26:02.217

@glowcoder: that is very good advice indeed. Often, I think about not wasting too much time on coding, and moving forward as quickly as possible. However, if there is a ready to use BST or another library availabile, why not to use it. But I have to agree with what you said. Thank you. – Bunkai.Satori – 2011-02-22T23:40:52.310

@GMan: Yes, now I understand what you meant a bit better. :-) – Bunkai.Satori – 2011-02-22T23:41:38.940

Answers

33

What you need is a way to look up some data given a key. With the key being an unsigned int, this gives you several possibilities. Of course, you could use a std::map:

typedef std::map<unsigned int, record_t> my_records;

However, there's other possibilities as well. For example, it's quite likely that a hash map would be even faster than a binary tree. Hash maps are called unordered_map in C++, and are a part of the C++11 standard, likely already supported by your compiler/std lib (check your compiler version and documentation). They were first available in C++TR1 (std::tr1::unordered_map)

If your keys are rather closely distributed, you might even use a simple array and use the key as an index. When it comes to raw speed, nothing would beat indexing into an array. OTOH, if your key distribution is too random, you'd be wasting a lot of space.

If you store your records as pointers, moving them around is cheap, and an alternative might be to keep your data sorted by key in a vector:

typedef std::vector< std::pair<unsigned int, record_t*> > my_records;

Due to its better data locality, which presumably plays nice with processor cache, a simple std::vector often performs better than other data structures which theoretically should have an advantage. Its weak spot is inserting into/removing from the middle. However, in this case, on a 32bit system, this would require moving entries of 2*32bit POD around, which your implementation will likely perform by calling CPU intrinsics for memory move.

sbi

Posted 2011-02-22T23:00:53.250

Reputation: 163 669

+1 It is worth noting more specifically (see Wikipedia "Red-black tree") that most STL implementations of the std::map use red-black trees, which are self balancing BSTs. So that would be the answer to the question, however, I agree with others that in the example, a finite vector of pointers (since you have only a few thousand entries) is much more optimal (O(log(n)) to O(1), can't get better than that as for big O). I work in professional game dev and I heavily rely on std::map. – Kit10 – 2012-02-28T16:23:00.053

1+1 for the most complete answer. It contains all: std::map, hash maps, array indexing, std::vector with the most complete explanations. Moreover, I can perfectly implement array indexing in my case. Let me therefore mark this as the Acceptable Answer. Thanks. – Bunkai.Satori – 2011-02-23T00:02:05.570

@Bunkai: Consider using std::vector when the number of records varies at runtime, or std::array (std::tr1::array or boost::array) if it is fixed. That way you get begin() and end() member functions and all the other STL container stuff. – sbi – 2011-02-23T00:09:05.850

yeah, that is simple and elegant advice. To use indexing, and as you said, it can be either std::vector based or array based. Thanks again. – Bunkai.Satori – 2011-02-23T00:24:53.477

19

std::set and std::map are usually implemented as red-black trees, which are a variant of binary search trees. The specifics are implementation dependent tho.

ltjax

Posted 2011-02-22T23:00:53.250

Reputation: 13 839

@itjax: +1, and thank you for this good answer. You mentioned, that the specifics are implementation dependent. Would you know, if at least, the red-black approach is applied within every implementation? I would like to estimate, how probable it is, that when I port my application on different platform, the performance vill vary strongly. – Bunkai.Satori – 2011-02-22T23:25:06.383

I'd guess that has maps could be considerably faster than binary trees, especially with integer types as keys. – sbi – 2011-02-22T23:28:21.500

1the other major option is an avl-tree which is also a binary search tree. a heap might be an option but it can't guarantee that the operations have the correct complexity. – flownt – 2011-02-22T23:36:11.707

2@Bunkai: The C++ standard provides guarantees about the algorithmic complexity of the container operations. E.g. querying a map has to be logarithmic in the number of elements. Other than that, you should just measure the performance on all relevant target platforms. – Philipp – 2011-02-22T23:42:51.203

@sbi If you talk about real hash maps (unordered_map in the stl), you're right, but if you talk about stl map, it's actually sorted, so it's most likely implemented as a red-black tree – Dinaiz – 2016-11-26T17:44:18.443

@Dinaiz: http://stackoverflow.com/a/5085285/140719 :)

– sbi – 2016-11-27T21:19:37.947

1

STL's set class is typically implemented as a BST. It's not guaranteed (the only thing that is is it's signature, template < class Key, class Compare = less<Key>, class Allocator = allocator<Key> > class set;) but it's a pretty safe bet.

Your post says you want speed (presumably for a tighter game loop).

So why waste time on these slow-as-molasses O(lg n) structures and go for a hash map implementation?

corsiKa

Posted 2011-02-22T23:00:53.250

Reputation: 64 843

1+1 Indirectly there is the guarantee on the complexity of each one of the operations. – David Rodríguez - dribeas – 2011-02-22T23:15:10.660

2For sizes under 10,000, it's not unusual for O(log n) to meet or beat a hash table's O(k). – Fred Nurk – 2011-02-22T23:35:48.167

2In fact, logarithms can be often be approximated by a small constant when doing complexity analysis. Even for n = 1,000,000,000, log(n) is only in the order of ten. – Philipp – 2011-02-22T23:45:22.327

@glowcoder: +1, thanks again. Including you, more people recommended hash map implemnetaion. I will study this subject, but as you have mentioned above, I might go with simple std::vector<> for now. When things work and when I need more performance, I might swithc to somethign more complex, as has maps look to me right now. – Bunkai.Satori – 2011-02-22T23:46:03.697

@Fred, Philipp I absolutely agree with you. Especially when you consider performance depends a lot on the structure of the data at hand. That would be where the "slow-as-molasses" part was supposed to be a humorous way to introduce Bunkai to the idea of hashtables. The best (and only, imo) way to determine which structure is best for you is with realistic (hopefully real) datasets and profiling. – corsiKa – 2011-02-23T00:38:19.510

6"Slow as molasses"? Wow there. O(lg n) is anything but slow! With a billion elements, a billion elements, you can find your darling in 30 fast steps. This is an epic demonstration of keeping your CPU at ease for a long time before your RAM and swap hamsters fall on their knees. I don't know about your computer but my iPod performs 30 steps pretty fast. Besides, do compare with a profiler your std::unordered_map performance with std::map before you jump to conclusions! – wilhelmtell – 2011-02-23T01:12:35.393

0

A clean and simple BST implementation in CPP:

struct node {
   int val;
   node* left;
   node* right;
};

node* createNewNode(int x)
{
    node* nn = new node;
    nn->val = x;
    nn->left  = nullptr;
    nn->right = nullptr;

    return nn;
}

void bstInsert(node* &root, int x)
{
    if(root == nullptr) {
        root = createNewNode(x);
        return;
    }

    if(x < root->val)
    {
        if(root->left == nullptr) {
            root->left = createNewNode(x);
            return;
        } else {
            bstInsert(root->left, x);
        }
    }

    if( x > root->val )
    {
        if(root->right == nullptr) {
            root->right = createNewNode(x);
            return;
        } else {
            bstInsert(root->right, x);
        }
    }
}

int main()
{
     node* root = nullptr;

     int x;
     while(cin >> x) {
         bstInsert(root, x);
     }

     return 0;
}

amarVashishth

Posted 2011-02-22T23:00:53.250

Reputation: 371