### Abstract

The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, it there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.

Original language | English (US) |
---|---|

Pages (from-to) | 45-76 |

Number of pages | 32 |

Journal | Journal of Computer and System Sciences |

Volume | 47 |

Issue number | 1 |

DOIs | |

State | Published - 1993 |

### Fingerprint

### ASJC Scopus subject areas

- Computational Theory and Mathematics

### Cite this

**B-trees with inserts and deletes : Why free-at-empty is better than merge-at-half.** / Johnson, Theodore; Shasha, Dennis.

Research output: Contribution to journal › Article

*Journal of Computer and System Sciences*, vol. 47, no. 1, pp. 45-76. https://doi.org/10.1016/0022-0000(93)90020-W

}

TY - JOUR

T1 - B-trees with inserts and deletes

T2 - Why free-at-empty is better than merge-at-half

AU - Johnson, Theodore

AU - Shasha, Dennis

PY - 1993

Y1 - 1993

N2 - The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, it there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.

AB - The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e., are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, it there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of free-at-empty B-trees, but the restructuring rate is much higher. For most purposes, this implies that free-at-empty B-trees are a better implementation choice than merge-at-half B-trees. We present two models for computing B-tree utilization, the more accurate of which remembers items inserted and then deleted in a node.

UR - http://www.scopus.com/inward/record.url?scp=0037917021&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037917021&partnerID=8YFLogxK

U2 - 10.1016/0022-0000(93)90020-W

DO - 10.1016/0022-0000(93)90020-W

M3 - Article

AN - SCOPUS:0037917021

VL - 47

SP - 45

EP - 76

JO - Journal of Computer and System Sciences

JF - Journal of Computer and System Sciences

SN - 0022-0000

IS - 1

ER -