B-trees with inserts and deletes: Why free-at-empty is better than merge-at-half

Theodore Johnson, Dennis Shasha

Research output: Contribution to journalArticle

Abstract

The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, it there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.

Original languageEnglish (US)
Pages (from-to)45-76
Number of pages32
JournalJournal of Computer and System Sciences
Volume47
Issue number1
DOIs
StatePublished - 1993

Fingerprint

B-tree
Die casting inserts
Merging
Vertex of a graph
Calculate

ASJC Scopus subject areas

  • Computational Theory and Mathematics

Cite this

B-trees with inserts and deletes : Why free-at-empty is better than merge-at-half. / Johnson, Theodore; Shasha, Dennis.

In: Journal of Computer and System Sciences, Vol. 47, No. 1, 1993, p. 45-76.

Research output: Contribution to journalArticle

@article{0f1ac571c4e24e8197a142a319eb0820,
title = "B-trees with inserts and deletes: Why free-at-empty is better than merge-at-half",
abstract = "The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69{\%}. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39{\%} when the number of inserts is the same as the number of deletes. However, it there are just 5{\%} more inserts than deletes, then the utilization is over 62{\%}. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.",
author = "Theodore Johnson and Dennis Shasha",
year = "1993",
doi = "10.1016/0022-0000(93)90020-W",
language = "English (US)",
volume = "47",
pages = "45--76",
journal = "Journal of Computer and System Sciences",
issn = "0022-0000",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - B-trees with inserts and deletes

T2 - Why free-at-empty is better than merge-at-half

AU - Johnson, Theodore

AU - Shasha, Dennis

PY - 1993

Y1 - 1993

N2 - The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, it there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.

AB - The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, it there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.

UR - http://www.scopus.com/inward/record.url?scp=0037917021&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037917021&partnerID=8YFLogxK

U2 - 10.1016/0022-0000(93)90020-W

DO - 10.1016/0022-0000(93)90020-W

M3 - Article

AN - SCOPUS:0037917021

VL - 47

SP - 45

EP - 76

JO - Journal of Computer and System Sciences

JF - Journal of Computer and System Sciences

SN - 0022-0000

IS - 1

ER -