public class SequenceFile extends Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Void> implements FileFormat
Scheme
, which is a flat file consisting of
binary key/value pairs. This is a space and time efficient means to store data.Modifier | Constructor and Description |
---|---|
protected |
SequenceFile()
Protected for use by TempDfs and other subclasses.
|
|
SequenceFile(Fields fields)
Creates a new SequenceFile instance that stores the given field names.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getExtension() |
void |
sink(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SinkCall<java.lang.Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)
Method sink writes out the given
Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkConfInit(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
Tap<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
org.apache.hadoop.conf.Configuration conf)
Method sinkInit initializes this instance as a sink.
|
boolean |
source(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Method source will read a new "record" or value from
SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Method sourceCleanup is used to destroy resources created by
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceConfInit(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
Tap<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
org.apache.hadoop.conf.Configuration conf)
Method sourceInit initializes this instance as a source.
|
void |
sourcePrepare(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of
Scheme.source(cascading.flow.FlowProcess, SourceCall) . |
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFields, presentSinkFieldsInternal, presentSourceFields, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, sinkCleanup, sinkPrepare, sinkWrap, sourceRePrepare, sourceWrap, toString
protected SequenceFile()
@ConstructorProperties(value="fields") public SequenceFile(Fields fields)
fields
- public void sourceConfInit(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, Tap<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.conf.Configuration conf)
Scheme
This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.
It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".
See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
if resources much be initialized
before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall)
if resources must be
destroyed after use.
sourceConfInit
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Void>
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sinkConfInit(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, Tap<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.conf.Configuration conf)
Scheme
This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.
It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".
See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
if resources much be initialized
before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall)
if resources must be
destroyed after use.
sinkConfInit
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Void>
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sourcePrepare(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.source(FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain thread-safe.
sourcePrepare
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Void>
flowProcess
- of type FlowProcesssourceCall
- of type SourceCallpublic boolean source(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall) throws java.io.IOException
Scheme
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
source
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Void>
flowProcess
- of type FlowProcesssourceCall
- of SourceCalltrue
when a Tuple was successfully readjava.io.IOException
public void sourceCleanup(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Scheme
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
.sourceCleanup
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Void>
flowProcess
- of ProcesssourceCall
- of type SourceCallpublic void sink(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SinkCall<java.lang.Void,org.apache.hadoop.mapred.OutputCollector> sinkCall) throws java.io.IOException
Scheme
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
public java.lang.String getExtension()
getExtension
in interface FileFormat
Copyright © 2007-2017 Cascading Maintainers. All Rights Reserved.