The problem with naive binary trees as a data structure for searching a dynamic set is that a binary tree may become imbalanced, that is, one branch may be significantly longer than the other. In the worst case, a binary tree actually degrades to a linked list (all nodes have only a single child). Fortunately there are techniques to balance trees during node insertion / deletion which ensure that worst-case performance is in the same order of cost as the best-case (that is, O(log(N))).
2-3-4 trees are trees whose internal nodes can have 2, 3 or 4 branches (hence the name); leaf nodes are nil. A tree node has 1, 2, or 3 keys, which when sorted define 2, 3, or 4 ranges respectively, and each branch corresponds to one of the ranges. A 2-3-4 tree is actually a special case of a B-Tree.
Example: A node with 2 keys, 13 and 67, has three ranges: [smaller than 13], [between 13 and 67], and [larger than 67]. In this case, we can say that the boundary key for the first range is 13; the second range has two boundary keys (13 and 67); the third range's boundary key is 67.
Note: It is traditional to say that the leaves are nil; another way of looking at it is that the nodes with nil children actually have no branches and are therefore the leaves themselves. I normally prefer this latter as it suits my way of thinking better - that a node always has a representation in memory. However, in this article I use the former notation as it tends to be the de facto standard.
Inserting a node into a 2-3-4 tree involves searching down the tree to find the bottom-most internal node whose total range contains the key. Then, the new key is inserted into that node (not as a new child node). If this would mean that the node now has more than 3 keys, the node is first split by taking the middle of the existing keys and moving it into the parent node (which may need to be split in turn, and so on); the two remaining keys are placed in separate nodes and become children of the parent node - the inserted node will then become a second key in one of these child nodes.
In the case where there a split occurs but there is no parent node into which the middle key can be moved (that is, the node being split is the root node), a new root node is created. In this case the height of the tree increases by one (and this is the only case in which the height of the tree will increase).
So, an added key always goes into an existing node, or causes a split. It should be clear that a split doesn't change the height of any branches (the only exception is splitting the root node, which increases the height of every branch by one), and inserting a key into an existing node clearly doesn't change the height of a branch; therefore, the tree remains perfectly balanced height-wise.
Inserting a node might require a whole series of node-splits to propagate up the tree. This also requires remembering the parent node, either on a stack during an insertion or as part of the data structure itself. This can be avoided by splitting 4-nodes o the way down the tree, so that the final insertion takes place into a node whose parent will definitely not need to be split (because it will have already been split if splitting was necessary).
Deleting a key should not affect this perfect balance. First look at a simple case, deleting a key from a node with nil (i.e. leaf) children. Suppose we are deleting a key K from a such a target node N. If N has 2 or 3 keys, then K can simply be removed from N. However, if K is N's only key, then we need to move another key into N before removing K; usually we can do this by taking a key from an adjacent sibling node (a node with the same parent, which also shares a boundary key with N).
How do we increase the number of siblings? We only need to do this when the parent node has only one key (there is only one sibling), so we can potentially add either 1 or 2 keys to the parent (and therefore add 1 or 2 siblings); the parent can steal a key from one of its siblings (via its own parent) as in any of the above cases (the final case being recursive, as it requires stealing a key from the parent's parent).
We have only solved the problem of deleting keys from nodes with nil child nodes. To delete keys from nodes with non-nil children is not too hard however; it simply requires swapping the key into a node with nil children before doing the deletion. In order to do this without needing to restructure the tree, choose a sibling of N, either to the left or the right; then, go downwards to find the deepest non-nil descendant on the side that is closest to N (right or left, respectively, depending on which sibling was chosen). The key in this descendant can be swapped with the key (K) in N and then K can be deleted from the descendant, using the procedures outlined above.
In the technique just described, it is necessary to be able to find the parent node. In a manner similar to that possible for insertion, this can be avoided by performing the necessary steps (making sure the parent has at least 2 keys) on the way down the tree while searching for the node to delete. This does mean that the tree structure may be modified even if the key cannot be found, though that's unlikely to be a problem.
The main issue with the use of 2-3-4 trees is the node structure is somewhat unwieldy. In particular, splitting a node requires the allocation of at least two nodes.
Red-Black trees are a representation of 2-3-4 trees using a binary tree, where each branch is "coloured" either red or black. Of course the representation of these two colour choices requires a bit for each branch, which must be stored in the nodes.
Note: In practice, a single bit in each node will suffice to indicate the colour of the branch between the node and its parent. In this representation, it can be argued that it is the nodes themselves which are coloured and not the branches at all. However, using 2 bits in each node to mark the branch colour improves locality-of-reference for the insertion and deletion operations.
Certain stipulations are placed on the colouring of a Red-Black tree. In fact, a Red-Black tree is a representation of a 2-3-4 tree! A red branch between two Red-Black tree nodes indicates that those nodes actually belong to a single 2-3-4 tree node. This implies that more than three nodes cannot be joined together directly or indirectly via only red branches. When this rule is obeyed, Red-Black trees are in the same order of efficiency in terms of of time as 2-3-4 trees.
For purposees of simplifying insertion and deletion, an additional requirement, that 4-nodes must be represented in the Red-Black tree by a node connected via red branches to both its children (thus, red branches cannot appear twice in a row). When this situation would otherwise occur, it can be remedied by one or two node rotations. This stipulation makes identification of 4-nodes in the Red-Black tree much easier.
Naturally, the insertion and deletion algorithms for a Red-Black tree are derived from those for a 2-3-4 tree. The main point to note is that when 4-nodes are created some node rotation may be required. What will also be observed is that splitting a 4-node is now a very simple operation, which requires no allocation of new nodes: all that is required is for the middle node branches to be coloured black.
-- Davin McCall, 2007