12.6. B-Trees — CS3 Data Structures & Algorithms (2024)

12.6.1. B-Trees¶

This module presents the B-tree.B-trees are usually attributed to R. Bayer and E. McCreightwho described the B-tree in a 1972 paper.By 1979, B-trees had replaced virtually all large-file accessmethods other than hashing.B-trees, or some variant of B-trees, are the standard fileorganization for applications requiring insertion, deletion, and keyrange searches.They are used to implement most modern file systems.B-trees address effectively all of the major problems encounteredwhen implementing disk-based search trees:

The B-tree is shallow, in part because the tree is always heightbalanced (all leaf nodes are at the same level), and in partbecause the branching factor is quite high.So only a small number of disk blocks are accessed to reach a givenrecord.
Update and search operations affect only those disk blocks on thepath from the root to the leaf node containing the query record.The fewer the number of disk blocks affected during an operation,the less disk I/O is required.
B-trees keep related records (that is, records with similar keyvalues) on the same disk block, which helps to minimize disk I/O onrange searches.
B-trees guarantee that every node in the tree will befull at least to a certain minimum percentage.This improves space efficiency while reducing the typical number ofdisk fetches necessary during a search or update operation.

A B-tree of order \(m\) is defined to havethe following shape properties:

The root is either a leaf or has at least two children.
Each internal node, except for the root, has between\(\lceil m/2 \rceil\) and \(m\) children.
All leaves are at the same level in the tree, so the tree is alwaysheight balanced.

The B-tree is a generalization of the 2-3 tree.Put another way, a 2-3 tree is a B-tree of order three.Normally, the size of a node in the B-tree is chosen to fill a diskblock.A B-tree node implementation typically allows 100 or more children.Thus, a B-tree node is equivalent to a disk block, and a “pointer”value stored in the tree is actually the number of the blockcontaining the child node (usually interpreted as an offset from thebeginning of the corresponding disk file).In a typical application, the B-tree’s access to the disk file will bemanaged using a buffer pooland a block-replacement scheme such as LRU.

Figure 12.6.1 shows a B-tree of order four.Each node contains up to three keys, andinternal nodes have up to four children.

Figure 12.6.1: A B-tree of order four.

Search in a B-tree is a generalization of search in a 2-3 tree.It is an alternating two-step process, beginning with the root node ofthe B-tree.

Perform a binary search on the records in thecurrent node.If a record with the search key is found, then return that record.If the current node is a leaf node and the key is not found,then report an unsuccessful search.
Otherwise, follow the proper branch and repeat the process.

For example, consider a search for the record with key value 47 in thetree of Figure 12.6.1.The root node is examined and the second (right) branch taken.After examining the node at level 1, the third branch is taken to thenext level to arrive at the leaf node containing a record with keyvalue 47.

B-tree insertion is a generalization of 2-3 tree insertion.The first step is to find the leaf node that should contain thekey to be inserted, space permitting.If there is room in this node, then insert the key.If there is not, then split the node into two and promote the middlekey to the parent.If the parent becomes full, then it is split in turn, and its middlekey promoted.

Note that this insertion process is guaranteed to keep all nodes atleast half full.For example, when we attempt to insert into a full internal node of aB-tree of order four, there will now be five children that must bedealt with.The node is split into two nodes containing two keys each, thusretaining the B-tree property.The middle of the five children is promoted to its parent.

12.6.1.1. B+ Trees¶

The previous section mentioned that B-trees are universally usedto implement large-scale disk-based systems.Actually, the B-tree as described in the previous section is almostnever implemented.What is most commonly implemented is a variant of the B-tree,called the \(\mathrm{B}^+\) tree.When greater efficiency is required, a more complicatedvariant known as the \(\mathrm{B}^*\) tree is used.

Consider again the linear index.When the collection of records will not change, a linear indexprovides an extremely efficient way to search.The problem is how to handle those pesky inserts and deletes.We could try to keep the core idea of storing a sorted array-basedlist, but make it more flexible by breaking the list into manageablechunks that are more easily updated.How might we do that?First, we need to decide how big the chunks should be.Since the data are on disk, it seems reasonable to store a chunk thatis the size of a disk block, or a small multiple of the disk blocksize.If the next record to be inserted belongs to a chunk that hasn’tfilled its block then we can just insert it there.The fact that this might cause other records in that chunk to move alittle bit in the array is not important, since this does not causeany extra disk accesses so long as we move data within that chunk.But what if the chunk fills up the entire block that contains it?We could just split it in half.What if we want to delete a record?We could just take the deleted record out of the chunk, but we mightnot want a lot of near-empty chunks.So we could put adjacent chunks together if they have only a smallamount of data between them.Or we could shuffle data between adjacent chunks that together containmore data.The big problem would be how to find the desired chunk when processinga record with a given key.Perhaps some sort of tree-like structure could be used to locate theappropriate chunk.These ideas are exactly what motivate the \(\mathrm{B}^+\) tree.The \(\mathrm{B}^+\) tree is essentially a mechanism for managing a sortedarray-based list, where the list is broken into chunks.

The most significant difference between the \(\mathrm{B}^+\) treeand the BST or the standard B-tree is thatthe \(\mathrm{B}^+\) tree stores records only at the leaf nodes.Internal nodes store key values, but theseare used solely as placeholders to guide the search.This means that internal nodes are significantly different instructure from leaf nodes.Internal nodes store keys to guide the search, associating each keywith a pointer to a child \(\mathrm{B}^+\) tree node.Leaf nodes store actual records, or else keys and pointers to actualrecords in a separate disk file if the \(\mathrm{B}^+\) tree isbeing used purely as an index.Depending on the size of a record as compared to the size of a key,a leaf node in a \(\mathrm{B}^+\) tree of order \(m\) mighthave enough room to store more or less than \(m\) records.The requirement is simply that the leaf nodes store enough records toremain at least half full.The leaf nodes of a \(\mathrm{B}^+\) tree are normallylinked together to form a doubly linked list.Thus, the entire collection of records can be traversed in sortedorder by visiting all the leaf nodes on the linked list.Here is a Java-like pseudocode representation for the\(\mathrm{B}^+\) tree node interface.Leaf node and internal node subclasses would implement this interface.

/** Interface for B+ Tree nodes */public interface BPNode<Key,E> { public boolean isLeaf(); public int numrecs(); public Key[] keys();}

An important implementation detail to note is that whileFigure 12.6.1 shows internal nodes containing threekeys and four pointers, class BPNode is slightly different in thatit stores key/pointer pairs.Figure 12.6.1 shows the \(\mathrm{B}^+\) tree asit is traditionally drawn.To simplify implementation in practice, nodes really doassociate a key with each pointer.Each internal node should be assumed to hold in the leftmost positionan additional key that is less than or equal to any possible key valuein the node’s leftmost subtree.\(\mathrm{B}^+\) tree implementations typically store anadditional dummy record in the leftmost leaf node whose key value isless than any legal key value.

Let’s see in some detail how the simplest \(\mathrm{B}^+\) treeworks.This would be the “\(2-3^+\) tree”, or a \(\mathrm{B}^+\) tree oforder 3.

Settings

Saving...
Server Error
Resubmit

Figure 12.6.2: An example of building a “\(2-3^+\) tree

Next, let’s see how to search.

Settings

Saving...
Server Error
Resubmit

Figure 12.6.3: An example of searching a “\(2-3^+\) tree

Finally, let’s see an example of deleting from the “\(2-3^+\) tree

Settings

Saving...
Server Error
Resubmit

Figure 12.6.4: An example of deleting from a “\(2-3^+\) tree

Now, let’s extend these ideas to a \(\mathrm{B}^+\) tree of higher order.

\(\mathrm{B}^+\) trees are exceptionally good for range queries.Once the first record in the range has been found, the rest of therecords with keys in the range can be accessed by sequentialprocessing of the remaining records in the first node, and thencontinuing down the linked list of leaf nodes as far as necessary.Figure illustrates the \(\mathrm{B}^+\)tree.

Settings

Saving...
Server Error
Resubmit

Figure 12.6.5: An example of search in a B+ tree of order four.Internal nodes must store between two and four children.

Search in a \(\mathrm{B}^+\) tree is nearly identical to search ina regular B-tree, except that the search must always continue to theproper leaf node.Even if the search-key value is found in an internal node, this isonly a placeholder and does not provide access to the actual record.Here is a pseudocode sketch of the \(\mathrm{B}^+\) tree searchalgorithm.

private E findhelp(BPNode<Key,E> rt, Key k) { int currec = binaryle(rt.keys(), rt.numrecs(), k); if (rt.isLeaf()) { if ((((BPLeaf<Key,E>)rt).keys())[currec] == k) { return ((BPLeaf<Key,E>)rt).recs(currec); } else { return null; } } else{ return findhelp(((BPInternal<Key,E>)rt).pointers(currec), k); }}

\(\mathrm{B}^+\) tree insertion is similar to B-tree insertion.First, the leaf \(L\) that should contain the record is found.If \(L\) is not full, then the new record is added, and noother \(\mathrm{B}^+\) tree nodes are affected.If \(L\) is already full, split it in two (dividing the recordsevenly among the two nodes) and promote a copy of theleast-valued key in the newly formed right node.As with the 2-3 tree, promotion might causethe parent to split in turn, perhaps eventually leading to splittingthe root and causing the \(\mathrm{B}^+\) tree to gain a newlevel.\(\mathrm{B}^+\) tree insertion keeps all leaf nodes at equaldepth.Figure illustrates the insertion process throughseveral examples.

Settings

Saving...
Server Error
Resubmit

Figure 12.6.6: An example of building a B+ tree of order four.

Here is a a Java-like pseudocode sketch of the \(\mathrm{B}^+\)tree insert algorithm.

private BPNode<Key,E> inserthelp(BPNode<Key,E> rt, Key k, E e) { BPNode<Key,E> retval; if (rt.isLeaf()) { // At leaf node: insert here return ((BPLeaf<Key,E>)rt).add(k, e); } // Add to internal node int currec = binaryle(rt.keys(), rt.numrecs(), k); BPNode<Key,E> temp = inserthelp( ((BPInternal<Key,E>)root).pointers(currec), k, e); if (temp != ((BPInternal<Key,E>)rt).pointers(currec)) { return ((BPInternal<Key,E>)rt). add((BPInternal<Key,E>)temp); } else{ return rt; }}

Here is an exercise to see if you get the basic idea of\(\mathrm{B}^+\) tree insertion.

To delete record \(R\) from the \(\mathrm{B}^+\) tree,first locate the leaf \(L\) that contains \(R\).If \(L\) is more than half full, then we need only remove \(R\),leaving \(L\) still at least half full.This is demonstrated by Figure .

Settings

Saving...
Server Error
Resubmit

Figure 12.6.7: An example of deletion in a B+ tree of order four.

If deleting a record reduces the number of records in the node belowthe minimum threshold (called an underflow), then we must dosomething to keep the node sufficiently full.The first choice is to look at the node’s adjacent siblings todetermine if they have a spare record that can be used to fill thegap.If so, then enough records are transferred from thesibling so that both nodes have about the same number of records.This is done so as to delay as long as possible the next time when adelete causes this node to underflow again.This process might require that the parent node has its placeholderkey value revised to reflect the true first key value in each node.

If neither sibling can lend a record to the under-full node(call it \(N\)),then \(N\) must give its records to a sibling and be removedfrom the tree.There is certainly room to do this, because the sibling is at mosthalf full (remember that it had no records to contribute to thecurrent node), and \(N\) has become less than half full because itis under-flowing.This merge process combines two subtrees of the parent, which mightcause it to underflow in turn.If the last two children of the root merge together, then the treeloses a level.

Here is a Java-like pseudocode for the \(\mathrm{B}^+\) treedelete algorithm.

/** Delete a record with the given key value, and  return true if the root underflows */private boolean removehelp(BPNode<Key,E> rt, Key k) { int currec = binaryle(rt.keys(), rt.numrecs(), k); if (rt.isLeaf()) { if (((BPLeaf<Key,E>)rt).keys()[currec] == k) { return ((BPLeaf<Key,E>)rt).delete(currec); } else { return false; } } else{ // Process internal node if (removehelp(((BPInternal<Key,E>)rt).pointers(currec), k)) { // Child will merge if necessary return ((BPInternal<Key,E>)rt).underflow(currec); } else { return false; } }}

The \(\mathrm{B}^+\) tree requires that all nodes be at least halffull (except for the root).Thus, the storage utilization must be at least 50%.This is satisfactory for many implementations, but note that keepingnodes fuller will result both inless space required (because there is less empty space in the disk file)and in more efficient processing (fewer blocks on average will be readinto memory because the amount of information in each block is greater).Because B-trees have become so popular, many algorithm designers havetried to improve B-tree performance.One method for doing so is to use the \(\mathrm{B}^+\) treevariant known as the \(\mathrm{B}^*\) tree.The \(\mathrm{B}^*\) tree is identical to the \(\mathrm{B}^+\)tree, except for the rules used to split and merge nodes.Instead of splitting a node in half when it overflows, the\(\mathrm{B}^*\) treegives some records to its neighboring sibling, if possible.If the sibling is also full, then these two nodes split into three.Similarly, when a node underflows, it is combined with its twosiblings, and the total reduced to two nodes.Thus, the nodes are always at least two thirds full. 1

Finally, here is an example of building a B+ Tree of order five. Youcan compare this to the example above of building a tree of order fourwith the same records.

Settings

Saving...
Server Error
Resubmit

Figure 12.6.8: An example of building a B+ tree of degree 5

Click here for a visualization that will let you construct andinteract with a \(\mathrm{B}^+\) tree.This visualization was written by David Galles of the University ofSan Francisco as part of his Data Structure Visualizations package.

1: This concept can be extended further if higher spaceutilization is required.However, the update routines become much more complicated.I once worked on a project where we implemented 3-for-4 nodesplit and merge routines.This gave better performance than the 2-for-3 node split andmerge routines of the \(\mathrm{B}^*\) tree.However, the spitting and merging routines were so complicatedthat even their author could no longer understand themonce they were completed!

12.6.1.2. B-Tree Analysis¶

The asymptotic cost of search, insertion, and deletion ofrecords from B-trees, \(\mathrm{B}^+\) trees, and\(\mathrm{B}^*\) trees is \(\Theta(\log n)\)where \(n\) is the total number of records in the tree.However, the base of the log is the (average) branching factor of thetree.Typical database applications use extremely high branching factors,perhaps 100 or more.Thus, in practice the B-tree and its variants are extremely shallow.

As an illustration, consider a \(\mathrm{B}^+\) tree of order 100and leaf nodes that contain up to 100 records.A B-\(\mathrm{B}^+\) tree with height one (that is, just a singleleaf node) can have at most 100 records.A \(\mathrm{B}^+\) tree with height two (a root internal nodewhose children are leaves) must have at least 100 records(2 leaves with 50 records each).It has at most 10,000 records (100 leaves with 100 records each).A \(\mathrm{B}^+\) tree with height three must have at least 5000records (two second-level nodes with 50 children containing 50 recordseach) and at most one million records (100 second-level nodes with 100full children each).A \(\mathrm{B}^+\) tree with height four must have at least250,000 records and at most 100 million records.Thus, it would require an extremely large database to generatea \(\mathrm{B}^+\) tree of more than height four.

The \(\mathrm{B}^+\) tree split and insert rules guarantee thatevery node (except perhaps the root) is at least half full.So they are on average about 3/4 full.But the internal nodes are purely overhead, since the keys storedthere are used only by the tree to direct search, rather than storeactual data.Does this overhead amount to a significant use of space?No, because once again the high fan-out rate of the tree structuremeans that the vast majority of nodes are leaf nodes.A K-ary tree hasapproximately \(1/K\) of its nodes as internal nodes.This means that while half of a full binary tree’s nodes are internalnodes, in a \(\mathrm{B}^+\) tree of order 100 probably only about\(1/75\) of its nodes are internal nodes.This means that the overhead associated with internal nodes is verylow.

We can reduce the number of disk fetches required for the B-treeeven more by using the following methods.First, the upper levels of the tree can be stored in main memory at alltimes.Because the tree branches so quickly, the top two levels(levels 0 and 1) require relatively little space.If the B-tree is only height four, then at most two disk fetches(internal nodes at level two and leaves at level three) are requiredto reach the pointer to any given record.

A buffer pool could be used to manage nodes of the B-tree.Several nodes of the tree would typically be in main memory at onetime.The most straightforward approach is to use a standard method such asLRU to do node replacement.However, sometimes it might be desirable to “lock” certain nodessuch as the root into the buffer pool.In general, if the buffer pool is even of modest size (say at leasttwice the depth of the tree), no special techniques for nodereplacement will be required because the upper-level nodes willnaturally be accessed frequently.

12.6. B-Trees — CS3 Data Structures & Algorithms (2024)

FAQs

How are B trees implemented? ›

Inserting a new node into a B-tree includes two steps: finding the correct node to insert the key and splitting the node if the node is full (the number of the node's keys is greater than m-1). If the B-Tree is empty: Allocate a root node, and insert the key.

Know More ›

What is the B-tree in Ada? ›

A B-tree is a self-balancing data structure commonly used in computer science for efficient storage and retrieval of large amounts of data. Its balanced nature ensures fast search, insert, and delete operations by maintaining a sorted order of elements and minimizing the height of the tree.

Why do we need a B-tree? ›

B-tree is used for indexing and is a data structure that provides sorted data and allows searches, sequential access, attachments and removals in sorted order.

Keep Reading ›

How to delete in B-tree? ›

1) Search for the value to delete. 2) If the value is in a leaf node, simply delete it from the node, because there are no subtrees to worry about. 3) If underflow happens (i.e., if we break one of the defining rules/properties of a B-tree - every non-root node has to have at least (M-1)/2 keys), rebalance the tree.

Get More Info Here ›

How are trees implemented? ›

A binary tree can be implemented as a list of lists: the head of a list (the value of the first term) is the left child (subtree), while the tail (the list of second and subsequent terms) is the right child (subtree).