B-Tree 数据结构详解及Java代码实现

2019-08-29

加入收藏

B-Tree定义

在计算机科学中，B树（英语：B-tree）是一种自平衡的树，能够保持数据有序。这种数据结构能够让查找数据、顺序访问、插入数据及删除的动作，都在对数时间内完成。

B-Tree的特点

1、树中每个结点最多含有m个孩子（m>=2）；

2、除根结点和叶子结点外，其它每个结点至少有[ceil(m / 2)]个孩子（其中ceil(x)是一个取上限的函数）；

3、若根结点不是叶子结点，则至少有2个孩子（特殊情况：没有孩子的根结点，即根结点为叶子结点，整棵树只有一个根节点）；

4、所有叶子结点都出现在同一层(最底层)，叶子结点为外部结点，保存内容，即key和value。

5、其他结点为内部结点，保存索引，即key和next。

每个节点占用一个盘块的磁盘空间，一个节点上有两个升序排序的关键字和三个指向子树根节点的指针，指针存储的是子节点所在磁盘块的地址。两个关键词划分成的三个范围域对应三个指针指向的子树的数据的范围域。以根节点为例，关键字为17和35，P1指针指向的子树的数据范围为小于17，P2指针指向的子树的数据范围为17~35，P3指针指向的子树的数据范围为大于35。

模拟查找关键字29的过程：

1、根据根节点找到磁盘块1，读入内存。【磁盘I/O操作第1次】

2、比较关键字29在区间（17,35），找到磁盘块1的指针P2。

3、根据P2指针找到磁盘块3，读入内存。【磁盘I/O操作第2次】

4、比较关键字29在区间（26,30），找到磁盘块3的指针P2。

5、根据P2指针找到磁盘块8，读入内存。【磁盘I/O操作第3次】

6、在磁盘块8中的关键字列表中找到关键字29。

分析上面过程，发现需要3次磁盘I/O操作，和3次内存查找操作。由于内存中的关键字是一个有序表结构，可以利用二分法查找提高效率。而3次磁盘I/O操作是影响整个B-Tree查找效率的决定因素。B-Tree相对于AVLTree缩减了节点个数，使每次磁盘I/O取到内存的数据都发挥了作用，从而提高了查询效率。

JAVA代码实现


package tree;
 
 
/**
 * Created by bruce_shan on 2018/7/8 17:08.
 * Description : B-树 也作 B树 java 实现
 */
public class BTree<Key extends Comparable<Key>, Value> {
 
 private static final int M = 4; // B树的阶数
 
 private Node root; // B-tree 的根节点
 private int height; // B-tree 的高度
 private int N; // B-tree 树中键值对的数目
 
 // B-tree 节点类型
 private static final class Node {
 private int m; // number of children
 private Entry[] children = new Entry[M]; // the array of children
 // create a node with k children
 private Node(int k) {
 m = k;
 }
 }
 // B-tree 节点中的元素类型
 private static class Entry {
 private Comparable key;
 private Object val;
 private Node next; // 指向节点中下一元素
 public Entry(Comparable key, Object val, Node next) {
 this.key = key;
 this.val = val;
 this.next = next;
 }
 }
 
 
 /**
 * 初始化空 B-tree树
 */
 public BTree() {
 root = new Node(0);
 }
 
 /**
 * 判断 B-tree 是否是空树
 */
 public boolean isEmpty() {
 return size() == 0;
 }
 
 public int size() {
 return N;
 }
 
 public int height() {
 return height;
 }
 
 /**
 * get操作
 */
 public Value get(Key key) {
 if (key == null) throw new NullPointerException("key must not be null");
 return search(root, key, height);
 }
 
 /**
 * put 操作
 */
 public void put(Key key, Value val) {
 if (key == null) throw new NullPointerException("key must not be null");
 Node u = insert(root, key, val, height);
 N++;
 if (u == null) return;
 
 // need to split root
 Node t = new Node(2);
 t.children[0] = new Entry(root.children[0].key, null, root);
 t.children[1] = new Entry(u.children[0].key, null, u);
 root = t;
 height++;
 }
 
 
 // 搜索操作
 private Value search(Node x, Key key, int ht) {
 Entry[] children = x.children;
 
 // 节点内数组操作 内部遍历
 if (ht == 0) {
 for (int j = 0; j < x.m; j++) {
 if (equals(key, children[j].key)) return (Value) children[j].val;
 }
 }
 
 // 外部定位
 else {
 for (int j = 0; j < x.m; j++) {
 if (j+1 == x.m || less(key, children[j+1].key))
 return search(children[j].next, key, ht-1);
 }
 }
 return null;
 }
 // 插入操作
 private Node insert(Node h, Key key, Value val, int ht) {
 int j;
 Entry t = new Entry(key, val, null);
 
 // 节点内部数组操作
 if (ht == 0) {
 for (j = 0; j < h.m; j++) {
 if (less(key, h.children[j].key)) break;
 }
 }
 // 外部遍历
 else {
 for (j = 0; j < h.m; j++) {
 if ((j+1 == h.m) || less(key, h.children[j+1].key)) {
 Node u = insert(h.children[j++].next, key, val, ht-1);
 if (u == null) return null;
 t.key = u.children[0].key;
 t.next = u;
 break;
 }
 }
 }
 
 for (int i = h.m; i > j; i--)
 h.children[i] = h.children[i-1];
 h.children[j] = t;
 h.m++;
 if (h.m < M) return null;
 else return split(h);
 }
 
 // 分裂节点成两半
 private Node split(Node h) {
 Node t = new Node(M/2);
 h.m = M/2;
 for (int j = 0; j < M/2; j++)
 t.children[j] = h.children[M/2+j];
 return t;
 }
 // 判断两个元素是否相等
 private boolean equals(Comparable k1, Comparable k2) {
 return k1.compareTo(k2) == 0;
 }
 
 // 判断两个元素的大小
 private boolean less(Comparable k1, Comparable k2) {
 return k1.compareTo(k2) < 0;
 }
}

引入B-Tree的原因

首先，包括红黑树是将输入存入内存的一种内部查找树。而B树是前面平衡树算法的扩展，它支持保存在磁盘或者网络上的符号表进行外部查找，这些文件可能比我们以前考虑的输入要大的多（难以存入内存）。

既然内容保存在磁盘中，那么自然会因为树的深度过大而造成磁盘I/O读写过于频繁（磁盘读写速率是有限制的），进而导致查询效率低下。

那么降低树的深度自然很重要了。因此，我们引入了B树，平衡多路查找树。