java原生语言中要想一个自定义类可序列化,很简单,只要让这个类实现java.io.Serializable接口就可以了,但是在Hadoop框架中,要想让自定义类可以被序列化,我们必须手动让其实现WritableCompable接口并且实现write(),readFields(),compareTo()方法。
下面就是一个我们自定义的可序列化的类:
-
-
- package com.charles.writable;
-
- import java.io.DataInput;
- import java.io.DataOutput;
- import java.io.IOException;
-
-
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.io.WritableComparable;
-
-
-
-
-
-
-
-
-
- public class PersonWritable implements WritableComparable<PersonWritable> {
-
- private Text name;
- private IntWritable age;
- private Text title;
-
- public PersonWritable(){
- set("someperson",0,"sometitle");
- }
-
- public PersonWritable(String name ,int age, String title){
- set(name,age,title);
- }
-
-
- public void set(String name ,int age,String title){
- this.name = new Text(name);
-
- age=(age>0)?age:1;
- this.age = new IntWritable(age);
-
- this.title= new Text(title);
- }
-
-
-
-
-
- @Override
- public void write(DataOutput out) throws IOException {
-
-
- name.write(out);
- age.write(out);
- title.write(out);
-
- }
-
-
-
-
- @Override
- public void readFields(DataInput in) throws IOException {
-
-
- name.readFields(in);
- age.readFields(in);
- title.readFields(in);
-
- }
-
-
-
-
- @Override
- public int compareTo(PersonWritable pO) {
-
- int cmp1 = name.compareTo(pO.name);
- if(cmp1 != 0){
- return cmp1;
- }
-
- int cmp2 = age.compareTo(pO.age);
- if(cmp2 !=0){
- return cmp2;
- }
-
- int cmp3 = title.compareTo(pO.title);
- return cmp3;
- }
-
-
-
-
- @Override
- public int hashCode(){
- return name.hashCode()*71+ age.hashCode()*73+title.hashCode()*127;
- }
-
- @Override
- public boolean equals (Object o ){
- if ( o instanceof PersonWritable){
-
- PersonWritable pw = (PersonWritable) o;
- boolean equals = name.equals(pw.name) && age.equals(pw.age) && title.equals(pw.title);
- return equals;
- }
- return false;
- }
-
- @Override
- public String toString(){
- StringBuffer sb = new StringBuffer();
- sb.append("[");
- sb.append("姓名: "+name+",");
- sb.append("年龄: "+age+",");
- sb.append("头衔: "+title);
- sb.append("]");
- return sb.toString();
- }
-
- }
为了方便演示序列化前后的内容,我们定义了一个工具方法,这个方法可以用于跟踪序列化和反序列化的中间产物:
-
-
- package com.charles.writable;
-
-
- import java.io.ByteArrayInputStream;
- import java.io.ByteArrayOutputStream;
- import java.io.DataInputStream;
- import java.io.DataOutputStream;
- import java.io.IOException;
-
- import org.apache.hadoop.io.Writable;
-
-
-
-
-
-
-
-
-
-
-
- public class HadoopSerializationUtil {
-
-
-
-
-
- public static byte[] serialize(Writable writable) throws IOException {
-
- ByteArrayOutputStream out = new ByteArrayOutputStream();
-
- DataOutputStream dataout = new DataOutputStream(out);
-
- writable.write(dataout);
- dataout.close();
-
- return out.toByteArray();
- }
-
-
-
-
- public static void deserialize(Writable writable,byte[] bytes) throws Exception{
-
-
- ByteArrayInputStream in = new ByteArrayInputStream(bytes);
-
- DataInputStream datain = new DataInputStream(in);
-
- writable.readFields(datain);
- datain.close();
- }
-
-
-
- }
最后,我们用一个Demo例子来演示序列化和反序列化我们自定义的类的对象:
-
-
- package com.charles.writable;
-
- import org.apache.hadoop.util.StringUtils;
-
-
-
-
-
-
-
-
-
-
- public class HadoopObjectSerializationDemo {
-
-
- public static void main(String [] args) throws Exception{
-
-
- System.out.println("实验1: 序列化");
- PersonWritable originalPersonWritable = new PersonWritable("Charles Wang" ,26 ,"Technical Lead");
- String typeInfo= "被测试的自定义Hadoop可序列化类类型为: "+originalPersonWritable.getClass().getName()+"\n";
- String primaryPersonWritableInfo = "序列化前对象为: "+originalPersonWritable.toString()+"\n";
-
- byte[] serializedHadoopValue =HadoopSerializationUtil.serialize(originalPersonWritable);
- String lengthInfo= "序列化后的字节数组长度为: "+serializedHadoopValue.length+"\n";
- String serializeValueInfo= "序列化后的值为: " +StringUtils.byteToHexString(serializedHadoopValue)+"\n";
-
- System.out.println(typeInfo+primaryPersonWritableInfo+lengthInfo+serializeValueInfo+"\n");
-
- System.out.println();
-
-
- System.out.println("实验2:反序列化");
- PersonWritable restoredPersonWritable = new PersonWritable();
- String originalByteArrayInfo="被反序列化的字节数组内容为: "+StringUtils.byteToHexString(serializedHadoopValue)+"\n";
-
- HadoopSerializationUtil.deserialize(restoredPersonWritable, serializedHadoopValue);
- String restoredValueInfo = "反序列化之后的Writable对象为: "+restoredPersonWritable.toString();
- System.out.println(originalByteArrayInfo+restoredValueInfo+"\n");
- }
- }
最终结果如下,从而证明,我们自定义的Hadoop可序列化类是正确的:
- 实验1: 序列化
- 被测试的自定义Hadoop可序列化类类型为: com.charles.writable.PersonWritable
- 序列化前对象为: [姓名: Charles Wang,年龄: 26,头衔: Technical Lead]
- 序列化后的字节数组长度为: 32
- 序列化后的值为: 0c436861726c65732057616e670000001a0e546563686e6963616c204c656164
-
-
-
- 实验2:反序列化
- 被反序列化的字节数组内容为: 0c436861726c65732057616e670000001a0e546563686e6963616c204c656164
- 反序列化之后的Writable对象为: [姓名: Charles Wang,年龄: 26,头衔: Technical Lead]
本文转自 charles_wang888 51CTO博客,原文链接:http://blog.51cto.com/supercharles888/885468,如需转载请自行联系原作者