Spring-Hadoop项目

Spring-Hadoop项目 http://blog.csdn.net/pelick/article/details/8331798

Spring-hadoop这个项目应该是在Spring Data项目的一部分（Srping data其余还包括把Spring和JDBC，REST，主流的NoSQL结合起来了）。其实再一想，Spring和Hadoop结合会发生什么呢，其实就是把Hadoop组件的配置，任务部署之类的东西都统一到Spring的bean管理里去了。

开门见山

话不多说，先来个例子看看吧。MapReduce里有个类似与”Hello World”的example，就是”Word Count”，在Spring Hadoop里，它长这样：

<hdp:configuration />

<hdp:job id="word-count"

input-path="/input/" output-path="/ouput/"

mapper="org.apache.hadoop.examples.WordCount.TokenizerMapper"

reducer="org.apache.hadoop.examples.WordCount.IntSumReducer"/>

<bean id="runner" class="org.springframework.data.hadoop.mapreduce.JobRunner"

p:jobs-ref="word-count"/>

可以看到任务的参数配置和提交都由IoC容器来管理。Mapper和Reducer里需要额外参数的话，也可以进行配置。

同时，Spring Hadoop并不要求MapReduce程序必须由Java编写，你用别的语言编写的Streaming job都可以无缝结合在Spring配置里跑起来，这些jobs都是objects，对于Spring来说，都是beans

<hdp:streaming id="streaming-env"

input-path="/input/" output-path="/ouput/"

mapper="${path.cat}" reducer="${path.wc}">

<hdp:cmd-env>

EXAMPLE_DIR=/home/example/dictionaries/

</hdp:cmd-env>

</hdp:streaming>

此外现有的其他的Hadoop实现工具也支持。比如下面这个Twitter的Scalding（它是一个用来写MapReduce任务的Scala库）

<hdp:tool-runner id="scalding" tool-class="com.twitter.scalding.Tool">

<hdp:arg value="tutorial/Tutorial1"/>

<hdp:arg value="--local"/>

</hdp:tool-runner>

关键特性

- Spring Hadoop支持MapReduce、Streaming、Hive、Pig和级联工作能够通过Spring容器执行。

- HDFS的数据访问能通过JVM支持的脚本语言，如Groovy，JRuby，Jython等等。

- 支持声明式配置HBase

- 对于客户端连接Hadoop，提供强大的Hadoop配置选项和模板机制

- 还计划支持Hadoop工具，包括FsShell和DistCp等。

总之能把Hadoop各成员的配置，创建都和Spring的容器结合起来，得到统一的管理。

继续例子

再来看几个代表性的例子。

HBase和Pig：

<hdp:hbase-configuration stop-proxy="false" delete-connection="true">

foo=bar

</hdp:hbase-configuration>

<!-- create a Pig instance using custom properties

and execute a script (using given arguments) at startup -->

<hdp:pig properties-location="pig-dev.properties" />

<arguments>electric=tears</arguments>

</script>

</hdp:pig>

Hive的：

<bean id="hive-ds"

class="org.springframework.jdbc.datasource.SimpleDriverDataSource"

c:driver-ref="hive-driver" c:url="${hive.url}"/>

<bean id="template" class="org.springframework.jdbc.core.JdbcTemplate"

c:data-source-ref="hive-ds"/>

下面例子是用Groovy进行HDFS上的文件操作，目的是说明能将JVM支持的语言同HDFS进行交互操作。

<hdp:script language="groovy">

inputPath = "/user/gutenberg/input/word/"

outputPath = "/user/gutenberg/output/word/"

if (fsh.test(inputPath)) {

fsh.rmr(inputPath)

}

if (fsh.test(outputPath)) {

fsh.rmr(outputPath)

}

fs.copyFromLocalFile("data/input.txt", inputPath)

</hdp:script>

总结

更具体的说明和使用可以参看github上的spring-hadoop项目。

（全文完）

不静之心

发表评论取消回复

访问信息

功能

近期评论

不静之心

Spring-Hadoop项目

发表评论 取消回复

访问信息

功能

近期评论

发表评论取消回复