Hadoop

Multiple avro output files with Hadoop streaming api

Mappers and reducers using Hadoop streaming api generally have no direct control on how many output files to generate in Hadoop Distributed File System (HDFS). Although it is possible to open Hadoop subprocesses to write to HDFS during mapping and reducing, it may be easier and safer to write an output format in Java that outputs multiple files. Here is an example output format that outputs multiple avro files.