您现在的位置是:首页 > 文章详情

Object::hashCode的返回值是不是对象的内存地址?

日期:2018-04-02点击:339

某一天,和小伙伴之间的话题不知怎么转到如何实现Object::hashCode上,于是就有了这篇文章。

有什么好讨论的呢,取对象的内存基址不就挺好的吗?方便又高效。且看下文的讨论

当GC发生时……

JavaDoc中描述了Object::hashCode的三个约束,其中要求对象不变时其hash code就应该不变,Object本身没什么属性可变的,自然hash code也就不会变。而Java是自带GC的语言,大家都知道。某些GC算法,比如Copy,比如Mark-Compact都会移动对象,自然地对象的基址也会改变,基于内存基址实现hashCode返回值就有可能在GC后变了。

我们还是假设就用对象内存基址做hashCode的返回值,这样通常也不会有什么问题,毕竟直接调用hashCode方法等场景少之又少。直到遇到以下场景

Object obj = new Object(); // allocated at 0x02 Map<Object, String> map = new HashMap<>(); // 16 slots map.put(obj, "a1"); // assume hashed in slot[0x02] // after GC, obj moved (0x02 -> 0x20) String value = map.get(obj); // assume hashed in slot[0x00] System.out.println("true or false? : " + (value == null)); // ???

虽然我们不太可能会用到一个Object instance作为map的key,但如果以内存基址作为hashCode的实现还真是令人头皮发麻:刚存到map不久的数据居然找不回来了!

解决对象移动

好的,既然对象可能跑来跑去,每次都取内存基址行不通,不过又要求生成后就不变,那我们要找个字段把Object的hashCode存好。类似这样

class Object { private final int _hashCode = _toAddress(this); public int hashCode() { return _hashCode; } }

一切完美,无论对象被移动多少次,我的map都可以正常工作。不过缺点也很明显,比较浪费内存:Java中所有的类都是Object的子类,于是每个类都至少多占用一个Word的内存,而且这个字段绝大部分情况也是用不到的。

怎么更省空间

从上面讨论来看,为了保证hashCode的约束,这个Word无论如何都省不掉,我们最好能让这字段能存更多信息,比如放Java对象头中。首先从openjdk(jdk-9+181)里面抠点信息,了解一下一个Word究竟怎么个物尽其用

// hotspot/src/share/vm/oops/markOop.hpp // 64 bits: // -------- // unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object) // JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object) // PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object) // size:64 ----------------------------------------------------->| (CMS free block) // // unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && normal object) // JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && biased object) // narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object) // unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block) // - the two lock bits are used to describe three states: locked/unlocked and monitor. // // [ptr | 00] locked ptr points to real header on stack // [header | 0 | 01] unlocked regular object header // [ptr | 10] monitor inflated lock (header is wapped out) // [ptr | 11] marked used by markSweep to mark an object // not valid at any other time

可以看到一个Word里面存了几个信息:hash code、锁优化标识、GC标识,主要是根据末两位标识做不同的表意,甚至这个东西上锁时还会copy来copy去。不过我们还是只关注hash code,下面用hsdb工具浏览一下JVM内存。

首先要写一个小demo

public class Hash { int verbose; public Hash(int verbose) {this.verbose = verbose;} public static void main(String[] args) throws Exception { Hash h1 = new Hash(0x1234); Hash h2 = new Hash(0x5678); System.out.println("breakpoint 1"); System.out.println("before gc, h1.hashCode=" + Integer.toHexString(h1.hashCode()) + ", h2.hashCode=" + Integer.toHexString(h2.hashCode())); System.out.println("breakpoint 2"); h1 = null; System.gc(); System.out.println("after gc, h2.hashCode=" + Integer.toHexString(h2.hashCode())); System.out.println("breakpoint 3"); } }

代码的目的是借用Hotspot的System.gc方法触发FullGC,使得h2对象被复制到old gen。接下来要用调试器调试代码,eclipse、IDEA什么的都OK,在对应的地方加上断点。注意为了按预期执行和方便查看,要设置一下JVM参数: -XX:+UseSerialGC -Xmx10m -XX:-UseCompressedOops

假设程序已经停在了 System.out.println("breakpoint 1") ,我们就可以启动hsdb attach到目标进程:

# JDK 8 java -cp .:$JAVA_HOME/lib/sa-jdi.jar sun.jvm.hotspot.HSDB # JDK 9 jhsdb hsdb

进入到hsdb后,先用 Tools - Find Object by Query OQL查出所有实例: select x from test.Hash x ,然后用各种查看器看内存数据即可。一顿操作后类似这个样子

hsdb-usage.png

# Hash h1 hsdb> inspect 0x000000010b33d690 instance of Oop for test/Hash @ 0x000000010b33d690 @ 0x000000010b33d690 (size = 24) _mark: 1 _metadata._klass: InstanceKlass for test/Hash verbose: 4660 hsdb> mem 0x000000010b33d690 3 0x000000010b33d690: 0x0000000000000001 0x000000010b33d698: 0x000000010c000578 0x000000010b33d6a0: 0x0000000000001234 # Hash h2 hsdb> inspect 0x000000010b33d6a8 instance of Oop for test/Hash @ 0x000000010b33d6a8 @ 0x000000010b33d6a8 (size = 24) _mark: 1 _metadata._klass: InstanceKlass for test/Hash verbose: 22136 hsdb> mem 0x000000010b33d6a8 3 0x000000010b33d6a8: 0x0000000000000001 0x000000010b33d6b0: 0x000000010c000578 0x000000010b33d6b8: 0x0000000000005678 

可以看到两个对象的的MarkWord都是0x0000000000000001,即未被锁定、没有偏向、分代年龄为0、hashCode还未分配。后面的Class标识、实例字段和padding略过不谈。

下一步是让程序执行到第二个断点(注意,要先让hsdb detach,否则调试器无法工作),即 System.out.println("breakpoint 2") ,程序控制台也输出了:

breakpoint 1 before gc, h1.hashCode=6f2b958e, h2.hashCode=1eb44e46

hsdb再次连上,查看数据,发现预期一样写入了对应的位: 0x000000 6f2b958e 01 0x000000 1eb44e46 01

# Hash h1 hsdb> mem 0x000000010b33d690 3 0x000000010b33d690: 0x0000006f2b958e01 0x000000010b33d698: 0x000000010c000578 0x000000010b33d6a0: 0x0000000000001234 # Hash h2 hsdb> mem 0x000000010b33d6a8 3 0x000000010b33d6a8: 0x0000001eb44e4601 0x000000010b33d6b0: 0x000000010c000578 0x000000010b33d6b8: 0x0000000000005678 

再让程序执行到第三个断点,程序输出 after gc, h2.hashCode=1eb44e46 ,hash code没变。理论上此时h1被回收,h2被copy到old gen,地址变化了。于是使用OQL再次查询h2的地址为0x000000010b5ea220,查看内存如下

# Hash h2 hsdb> mem 0x000000010b5ea220 3 0x000000010b5ea220: 0x0000001eb44e4601 0x000000010b5ea228: 0x000000010c000578 0x000000010b5ea230: 0x0000000000005678

对象数据不变,所以还是能从MarkWord 0x000000 1eb44e46 01 中取出生成过的hash code。那此时h2被copy到哪里了呢?再次执行universe命令,看堆概况

hsdb> universe Heap Parameters: Gen 0: eden [0x000000010b200000,0x000000010b20dc68,0x000000010b4b0000) space capacity = 2818048, 2.0022370094476742 used from [0x000000010b4b0000,0x000000010b4b0000,0x000000010b500000) space capacity = 327680, 0.0 used to [0x000000010b500000,0x000000010b500000,0x000000010b550000) space capacity = 327680, 0.0 usedInvocations: 0 Gen 1: old [0x000000010b550000,0x000000010b5eabd0,0x000000010bc00000) space capacity = 7012352, 9.038451007593459 usedInvocations: 1

输出含义: [0x000000010b200000,0x000000010b20dc68,0x000000010b4b0000) 表示的是分代回收中区(eden、survivor、old gen等)内存地址段,三个地址分别表示段起始、已分配指针、段截止。可以看到GC前h2地址(0x000000010b33d6a8)在eden区,而GC后h2地址(0x000000010b5ea220)落在old gen。

总结

回到标题,hashCode的返回值很明确不仅仅是对象地址。从openjdk源码中可以找到其实现,目前默认用hashCode=5的实现。有兴趣的同学可以试试加上 -XX:+UnlockExperimentalVMOptions -XX:hashCode=2 再输出对象的hashCode

// hotspot/src/share/vm/runtime/synchronizer.cpp static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random(); } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom; } else if (hashCode == 2) { value = 1; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence; } else if (hashCode == 4) { value = cast_from_oop<intptr_t>(obj); } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX; t ^= (t << 11); Self->_hashStateX = Self->_hashStateY; Self->_hashStateY = Self->_hashStateZ; Self->_hashStateZ = Self->_hashStateW; unsigned v = Self->_hashStateW; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)); Self->_hashStateW = v; value = v; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD; assert(value != markOopDesc::no_hash, "invariant"); TEVENT(hashCode: GENERATE); return value; }

参考资料

原文链接:https://yq.aliyun.com/articles/575705
关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。

文章评论

共有0条评论来说两句吧...

文章二维码

扫描即可查看该文章

点击排行

推荐阅读

最新文章