HashMap的初始容量和加载因子

2019-09-09

注：本文所有代码示例均基于 JDK8。

从源码出发

默认值

通过查看 HashMap 的源码可以得知其默认的初始容量为 16，默认的加载因子为 0.75。

/**
 * The default initial capacity - MUST be a power of two.
 * 默认的初始容量（必须是2的N次幂），默认为2^4＝16
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
 * The load factor used when none specified in constructor.
 * 默认的加载因子为0.75
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

构造方法

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param initialCapacity the initial capacity
 * @param loadFactor the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 * or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
		if (initialCapacity < 0)
				throw new IllegalArgumentException("Illegal initial capacity: " +
																					 initialCapacity);
		if (initialCapacity > MAXIMUM_CAPACITY)
				initialCapacity = MAXIMUM_CAPACITY;
		if (loadFactor <= 0 || Float.isNaN(loadFactor))
				throw new IllegalArgumentException("Illegal load factor: " +
																					 loadFactor);
		this.loadFactor = loadFactor;
		this.threshold = tableSizeFor(initialCapacity);
}
/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
		this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
		this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

一般情况下都是通过这三种构造方法来初始化 HashMap 的。通过默认的无参构造方法初始化后 HashMap 的容量就是默认的16，加载因子也是默认的0.75；通过带参数的构造方法可以初始化一个自定义容量和加载因子的 HashMap。通常情况下，加载因子使用默认的0.75就好。

初始容量

HashMap 的容量要求必须是2的N次幂，这样可以提高散列的均匀性，降低 Hash 冲突的风险。但是容量可以通过构造方法传入的，如果我传入一个非2次幂的数进去呢？比如3？传进去也不会干嘛呀，又不报错。。。哈哈哈哈。是的，不会报错的，那是因为 HashMap 自己将这个数转成了一个最接近它的2次幂的数。这个转换的方法是 tableSizeFor(int cap) 。

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
		int n = cap - 1;
		n |= n >>> 1;
		n |= n >>> 2;
		n |= n >>> 4;
		n |= n >>> 8;
		n |= n >>> 16;
		return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

这个方法会将传入的数转换成一个2次幂的数，比如传入的是3，则返回的是4；传入12，则返回的是16。

加载因子

加载因子和 HashMap 的扩容机制有着非常重要的联系，它可以决定在什么时候才进行扩容。HashMap 是通过一个阀值来确定是否扩容，当容量超过这个阀值的时候就会进行扩容，而加载因子正是参与计算阀值的一个重要属性，阀值的计算公式是容量 * 加载因子。如果通过默认构造方法创建 HashMap ，则阀值为 16 * 0.75 = 12 ，就是说当 HashMap 的容量超过12的时候就会进行扩容。

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
int threshold;

这是 HashMap 的 putVal(...) 方法的一个片段，put(...) 方法其实就是调用的这个方法，size 是当前 HashMap 的元素个数，当元素个数＋1后超过了阀值就会调用 resize() 方法进行扩容。

if (++size > threshold)
		resize();

加载因子在一般情况下都最好不要去更改它，默认的0.75是一个非常科学的值，它是经过大量实践得出来的一个经验值。当加载因子设置的比较小的时候，阀值就会相应的变小，扩容次数就会变多，这就会导致 HashMap 的容量使用不充分，还没添加几个值进去就开始进行了扩容，浪费内存，扩容效率还很低；当加载因子设置的又比较大的时候呢，结果又很相反，阀值变大了，扩容次数少了，容量使用率又提高了，看上去是很不错，实际上还是有问题，因为容量使用的越多，Hash 冲突的概率就会越大。所以，选择一个合适的加载因子是非常重要的。

实践检验真理

默认无参构造方法测试

通过默认构造方法创建一个 HashMap，并循环添加13个值