B-heap
A B-heap is a binary heap implemented to keep subtrees in a single page. This reduces the number of pages accessed by up to a factor of ten for big heaps when using virtual memory, compared with the traditional implementation.[1] The traditional mapping of elements to locations in an array puts almost every level in a different page.
There are other heap variants which are efficient in computers using virtual memory or caches, such as cache-oblivious algorithms, k-heaps,[2] and van Emde Boas layouts.[3]
Motivation
Traditionally, binary trees are laid out in consecutive memory according to a n -> {2n, 2n+1}
rule, meaning that if a node is at position n
, its left and right child are taken to be at positions 2n
and 2n+1
in the array. The root is at position 1. A common operation on binary trees is the vertical traversal; stepping down through the levels of a tree in order to arrive at a searched node. However, because of the way memory is organized on modern computers into pages in virtual memory, this scheme of laying out the binary tree can be highly ineffective. The reason is that, as when traversing deeper into the tree, the distance to the next node grows exponentially, so every next node retrieved will likely be on a separate memory page. This will increase the number of page misses, which are very expensive.
The B-heap solves this problem by laying out child nodes in memory in a different way, trying as much as possible to position subtrees within a single page. Therefore, as a vertical traversal proceeds, several of the consecutive retrieved nodes will lay in the same page, leading to a low number of page misses.
Implementation
In detail, a b-heap can be implemented in the following way. Poul-Henning Kamp[4] gives two options for the layout of the nodes: one in which two positions per page are wasted, but the strict binary structure of the tree is preserved, and another which uses the whole available space of the pages, but has the tree fail to expand for one level upon entering a new page (The nodes on that level have only one child). In any case, an important point is that upon leaving a page, both child nodes are always in a common other page, since in a vertical transversal the algorithm will typically compare both children with the parent to know how to proceed. For this reason, each page can be said to contain two parallel subtrees, interspersed with each other. The pages themselves can be seen as a m-ary tree, and since half of the elements in each page will be leaves (within the page), the "tree of pages" has a splitting factor of pagesize/2
.
Parent Function
In contrast to the classic array-like layout, the parent function in a B-heap is more complex because the index of a node's parent must be computed differently depending on where in the page it is. Assuming the positions inside a page are labelled from 0 to pagesize
, the parent function can be as follows.
For nodes 0 and 1, these are only used in the variant which is exploiting all possible space. In this case, the parent index of both nodes is the same, it is in a different page, and its specific offset within that page only depends on the current page number. Specifically, to compute the parent's page number, simply divide the current node's page number by the "page tree's" splitting factor, which is pagesize/2
. To get the right offset within the page, consider that it must be one of the leaf nodes within the parent page, so start at offset pagesize/2
. Then add the difference between the current page number, and the parent's page number, minus one since the first page after the parent page has its parent node in index (pagesize/2
).
For nodes 2 and 3, the parent is different depending on the mode. In space-saving mode, the parents are simply the nodes 0 and 1, respectively, so one needs only to subtract with 2. On the other hand, in strict-binary-tree-mode, these nodes are the roots of the page instead of 0 and 1, and so their common parent is computed the same way as described above.
For all other nodes, their parent will be within the same page, and it is enough to divide their offset within their page by 2, not changing the page number.
See also
References
- Kamp, Poul-Henning (2020-07-26). "You're Doing It Wrong". ACM Queue.
- Naor, Dalit; Martel, Charles U.; Matloff, Norman S. (1991). "Performance of Priority Queue Structures in a Virtual Memory Environment". Comput. J. 34 (5): 428–437. doi:10.1093/comjnl/34.5.428.
- van Emde Boas, P.; Kaas, R.; Zijlstra, E. (1976). "Design and implementation of an efficient priority queue". Mathematical Systems Theory. 10: 99–127. doi:10.1007/BF01683268. S2CID 8105468.
- Kamp, Poul-Henning. "You're Doing It Wrong". phk.freebsd.dk. Retrieved 2019-06-08.
External links
- Implementations at https://github.com/varnish/Varnish-Cache/blob/master/lib/libvarnish/binary_heap.c and http://phk.freebsd.dk/B-Heap/binheap.c
- Generic heap implementation with B-heap support.
- For more on van Emde Boas layouts see Benjamin Sach Descent into Cache-Oblivion or Cache-oblivious data structures.