您现在的位置是：首页 > 文章详情

MapReduce InputFormat——DBInputFormat

日期：2015-11-29点击：599收藏

一、背景

为了方便MapReduce直接访问关系型数据库（Mysql,Oracle），Hadoop提供了DBInputFormat和DBOutputFormat两个类。通过

DBInputFormat类把数据库表数据读入到HDFS，根据DBOutputFormat类把MapReduce产生的结果集导入到数据库表中。

二、技术细节

1、DBInputFormat（Mysql为例），先创建表:
CREATE TABLE studentinfo ( id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(32) NOT NULL);
2、由于0.20版本对DBInputFormat和DBOutputFormat支持不是很好，该例用了0.19版本来说明这两个类的用法。
3、DBInputFormat用法如下：
[java] view plain copy

public class DBInput {

   // DROP TABLE IF EXISTS `hadoop`.`studentinfo`;

   // CREATE TABLE studentinfo (

   // id INTEGER NOT NULL PRIMARY KEY,

   // name VARCHAR(32) NOT NULL);



   public static class StudentinfoRecord implements Writable, DBWritable {

     int id;

     String name;

     public StudentinfoRecord() {



     }

     public void readFields(DataInput in) throws IOException {

        this.id = in.readInt();

        this.name = Text.readString(in);

     }

     public void write(DataOutput out) throws IOException {

        out.writeInt(this.id);

        Text.writeString(out, this.name);

     }

     public void readFields(ResultSet result) throws SQLException {

        this.id = result.getInt(1);

        this.name = result.getString(2);

     }

     public void write(PreparedStatement stmt) throws SQLException {

        stmt.setInt(1, this.id);

        stmt.setString(2, this.name);

     }

     public String toString() {

        return new String(this.id + " " + this.name);

     }

   }

   public class DBInputMapper extends MapReduceBase implements

        Mapper<LongWritable, StudentinfoRecord, LongWritable, Text> {

     public void map(LongWritable key, StudentinfoRecord value,

          OutputCollector<LongWritable, Text> collector, Reporter reporter)

          throws IOException {

        collector.collect(new LongWritable(value.id), new Text(value

             .toString()));

     }

   }

   public static void main(String[] args) throws IOException {

     JobConf conf = new JobConf(DBInput.class);

     DistributedCache.addFileToClassPath(new Path(

          "/lib/mysql-connector-java-5.1.0-bin.jar"), conf);



     conf.setMapperClass(DBInputMapper.class);

     conf.setReducerClass(IdentityReducer.class);



     conf.setMapOutputKeyClass(LongWritable.class);

     conf.setMapOutputValueClass(Text.class);

     conf.setOutputKeyClass(LongWritable.class);

     conf.setOutputValueClass(Text.class);



     conf.setInputFormat(DBInputFormat.class);

     FileOutputFormat.setOutputPath(conf, new Path("/hua01"));

     DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",

          "jdbc:mysql://192.168.3.244:3306/hadoop", "hua", "hadoop");

     String[] fields = { "id", "name" };

     DBInputFormat.setInput(conf, StudentinfoRecord.class, "studentinfo",

null, "id", fields);



     JobClient.runJob(conf);

   }

}

a)StudnetinfoRecord类的变量为表字段，实现Writable和DBWritable两个接口。

实现Writable的方法：

[java] view plain copy

public void readFields(DataInput in) throws IOException {

       this.id = in.readInt();

       this.name = Text.readString(in);

    }

    public void write(DataOutput out) throws IOException {

       out.writeInt(this.id);

       Text.writeString(out, this.name);

    }
实现DBWritable的方法：

[java] view plain copy

public void readFields(ResultSet result) throws SQLException {

        this.id = result.getInt(1);

        this.name = result.getString(2);

     }

     public void write(PreparedStatement stmt) throws SQLException {

        stmt.setInt(1, this.id);

        stmt.setString(2, this.name);

     }
b)读入Mapper的value类型是StudnetinfoRecord。

c)配置如何连入数据库，读出表studentinfo数据。

[java] view plain copy

DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",

          "jdbc:mysql://192.168.3.244:3306/hadoop", "hua", "hadoop");

     String[] fields = { "id", "name" };

     DBInputFormat.setInput(conf, StudentinfoRecord.class, "studentinfo",  null, "id", fields);

4、DBOutputFormat用法如下：

[java] view plain copy

public class DBOutput {



   public static class StudentinfoRecord implements Writable,  DBWritable {

     int id;

     String name;

     public StudentinfoRecord() {



     }

     public void readFields(DataInput in) throws IOException {

        this.id = in.readInt();

        this.name = Text.readString(in);

     }

     public void write(DataOutput out) throws IOException {

        out.writeInt(this.id);

        Text.writeString(out, this.name);

     }

     public void readFields(ResultSet result) throws SQLException {

        this.id = result.getInt(1);

        this.name = result.getString(2);

     }

     public void write(PreparedStatement stmt) throws SQLException {

        stmt.setInt(1, this.id);

        stmt.setString(2, this.name);

     }

     public String toString() {

        return new String(this.id + " " + this.name);

     }

   }



   public static class MyReducer extends MapReduceBase implements

        Reducer<LongWritable, Text, StudentinfoRecord, Text> {

     public void reduce(LongWritable key, Iterator<Text> values,

          OutputCollector<StudentinfoRecord, Text> output, Reporter  reporter)

          throws IOException {

        String[] splits = values.next().toString().split("/t");

        StudentinfoRecord r = new StudentinfoRecord();

        r.id = Integer.parseInt(splits[0]);

        r.name = splits[1];

        output.collect(r, new Text(r.name));

     }

   }



   public static void main(String[] args) throws IOException {

     JobConf conf = new JobConf(DBOutput.class);

     conf.setInputFormat(TextInputFormat.class);

     conf.setOutputFormat(DBOutputFormat.class);



     FileInputFormat.setInputPaths(conf, new Path("/hua/hua.bcp"));

     DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",

          "jdbc:mysql://192.168.3.244:3306/hadoop", "hua", "hadoop");

     DBOutputFormat.setOutput(conf, "studentinfo", "id", "name");



  conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);

     conf.setReducerClass(MyReducer.class);



     JobClient.runJob(conf);

   }



}

a)StudnetinfoRecord类的变量为表字段，实现Writable和DBWritable两个接口，同.DBInputFormat的StudnetinfoRecord类。

b)输出Reducer的key/value类型是StudnetinfoRecord。

c)配置如何连入数据库，输出结果到表studentinfo。

[java] view plain copy

DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",

          "jdbc:mysql://192.168.3.244:3306/hadoop", "hua", "hadoop");

     DBOutputFormat.setOutput(conf, "studentinfo", "id", "name");

三、总结

运行MapReduce时候报错：java.io.IOException: com.mysql.jdbc.Driver，一般是由于程序找不到mysql驱动包。解决方法是让每个

tasktracker运行MapReduce程序时都可以找到该驱动包。

添加包有两种方式：

1.在每个节点下的${HADOOP_HOME}/lib下添加该包。重启集群，一般是比较原始的方法。

2.a)把包传到集群上： hadoop fs -put mysql-connector-java-5.1.0- bin.jar /lib

b)在mr程序提交job前，添加语句：istributedCache.addFileToClassPath(new Path("/lib/mysql- connector-java- 5.1.0-bin.jar"), conf);

3、虽然API用的是0.19的，但是使用0.20的API一样可用，只是会提示方法已过时而已。

原文链接：https://yq.aliyun.com/articles/413130

关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有，本站原创内容转载请注明来源。