虽然Hadoop是用java写的,然则Hadoop供应了Hadoop流,Hadoop流供应一个API,,许可用户运用任何言语编写map函数和reduce函数。 (引荐进修:PHP视频教程)
Hadoop活动关键是,它运用UNIX规范流作为顺序与Hadoop之间的接口。因而,任何顺序只需能够从规范输入流中读取数据,而且能够把数据写入规范输出流中,那末就能够经由过程Hadoop流运用任何言语编写MapReduce顺序的map函数和reduce函数。
比方:
bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper /usr/local/hadoop/mapper.php -reducer /usr/local/hadoop/reducer.php -input test/* -output out4
Hadoop流引入的包:hadoop-streaming-0.20.203.0.jar,Hadoop根目录下是没有hadoop-streaming.jar的,由于streaming是一个contrib,所以要去contrib下面找,以hadoop-0.20.2为例,它在这里:
-input:指明输入hdfs文件的途径
-output:指明输出hdfs文件的途径
-mapper:指明map函数
-reducer:指明reduce函数
mapper函数
mapper.php文件,写入以下代码:
#!/usr/local/php/bin/php <?php $word2count = array(); // input comes from STDIN (standard input) // You can this code :$stdin = fopen(“php://stdin”, “r”); while (($line = fgets(STDIN)) !== false) { // remove leading and trailing whitespace and lowercase $line = strtolower(trim($line)); // split the line into words while removing any empty string $words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY); // increase counters foreach ($words as $word) { $word2count[$word] += 1; } } // write the results to STDOUT (standard output) // what we output here will be the input for the // Reduce step, i.e. the input for reducer.py foreach ($word2count as $word => $count) { // tab-delimited echo $word, chr(9), $count, PHP_EOL; } ?>
以上就是php能用hadoop吗的细致内容,更多请关注ki4网别的相干文章!