public class TextLine extends CompressorScheme<java.io.LineNumberReader,java.io.PrintWriter> implements FileFormat
Scheme
for plain text files. Files are broken into
lines. Either line-feed or carriage-return are used to signal end of line.
By default, this scheme returns a Tuple
with two fields, "num" and "line". Where "num"
is the line number for "line".
Many of the constructors take both "sourceFields" and "sinkFields". sourceFields denote the field names
to be used instead of the names "num" and "line". sinkFields is a selector and is by default Fields.ALL
.
Any available field names can be given if only a subset of the incoming fields should be used.
If a Fields
instance is passed on the constructor as sourceFields having only one field, the return tuples
will simply be the "line" value using the given field name.
Note that TextLine will concatenate all the Tuple values for the selected fields with a TAB delimiter before writing out the line.
By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName
constructor
argument.
In order to read or write a compressed files, pass a CompressorScheme.Compressor
instance to the appropriate constructors. See Compressors
for provided compression algorithms.
TextDelimited
,
Compressors
,
Serialized FormCompressorScheme.Compressor
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_CHARSET |
static Fields |
DEFAULT_SOURCE_FIELDS |
compressor, NO_COMPRESSOR
Constructor and Description |
---|
TextLine()
Creates a new TextLine instance that sources "num" and "line" fields, and sinks all incoming fields, where
"num" is the line number of the line in the input file.
|
TextLine(CompressorScheme.Compressor compressor)
Creates a new TextLine instance that sources "num" and "line" fields, and sinks all incoming fields, where
"num" is the line number of the line in the input file.
|
TextLine(Fields sourceFields)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
CompressorScheme.Compressor compressor)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
CompressorScheme.Compressor compressor,
java.lang.String charsetName)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
Fields sinkFields)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
Fields sinkFields,
CompressorScheme.Compressor compressor)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
Fields sinkFields,
CompressorScheme.Compressor compressor,
java.lang.String charsetName)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
Fields sinkFields,
java.lang.String charsetName)
Creates a new TextLine instance.
|
TextLine(Fields sourceFields,
java.lang.String charsetName)
Creates a new TextLine instance.
|
Modifier and Type | Method and Description |
---|---|
java.io.LineNumberReader |
createInput(java.io.InputStream inputStream) |
java.io.PrintWriter |
createOutput(java.io.OutputStream outputStream) |
java.lang.String |
getCharsetName() |
java.lang.String |
getExtension() |
void |
presentSinkFields(FlowProcess<? extends java.util.Properties> process,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved.
|
void |
presentSourceFields(FlowProcess<? extends java.util.Properties> process,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved.
|
protected void |
setCharsetName(java.lang.String charsetName) |
void |
sink(FlowProcess<? extends java.util.Properties> flowProcess,
SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Method sink writes out the given
Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkCleanup(FlowProcess<? extends java.util.Properties> flowProcess,
SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Method sinkCleanup is used to destroy resources created by
Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) . |
void |
sinkConfInit(FlowProcess<? extends java.util.Properties> flowProcess,
Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap,
java.util.Properties conf)
Method sinkInit initializes this instance as a sink.
|
void |
sinkPrepare(FlowProcess<? extends java.util.Properties> flowProcess,
SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Method sinkPrepare is used to initialize resources needed during each call of
Scheme.sink(cascading.flow.FlowProcess, SinkCall) . |
boolean |
source(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method source will read a new "record" or value from
SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method sourceCleanup is used to destroy resources created by
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceConfInit(FlowProcess<? extends java.util.Properties> flowProcess,
Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap,
java.util.Properties conf)
Method sourceInit initializes this instance as a source.
|
void |
sourcePrepare(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of
Scheme.source(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceRePrepare(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method sourceRePrepare is used to re-initialize resources needed during each call of
Scheme.source(cascading.flow.FlowProcess, SourceCall) after the Input object
has been changed, if needed. |
protected void |
verify(Fields sourceFields) |
setCompressor, sinkWrap, sourceWrap
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, toString
public static final java.lang.String DEFAULT_CHARSET
public static final Fields DEFAULT_SOURCE_FIELDS
public TextLine()
@ConstructorProperties(value="sourceFields") public TextLine(Fields sourceFields)
sourceFields
- of Fields@ConstructorProperties(value={"sourceFields","charsetName"}) public TextLine(Fields sourceFields, java.lang.String charsetName)
sourceFields
- of FieldscharsetName
- of type String@ConstructorProperties(value={"sourceFields","sinkFields"}) public TextLine(Fields sourceFields, Fields sinkFields)
sourceFields
- of FieldssinkFields
- of Fields@ConstructorProperties(value={"sourceFields","sinkFields","charsetName"}) public TextLine(Fields sourceFields, Fields sinkFields, java.lang.String charsetName)
sourceFields
- of FieldssinkFields
- of FieldscharsetName
- of type Stringpublic TextLine(CompressorScheme.Compressor compressor)
@ConstructorProperties(value={"sourceFields","compressor"}) public TextLine(Fields sourceFields, CompressorScheme.Compressor compressor)
sourceFields
- of Fieldscompressor
- of type Compressor@ConstructorProperties(value={"sourceFields","compressor","charsetName"}) public TextLine(Fields sourceFields, CompressorScheme.Compressor compressor, java.lang.String charsetName)
sourceFields
- of Fieldscompressor
- of type CompressorcharsetName
- of type String@ConstructorProperties(value={"sourceFields","sinkFields","compressor"}) public TextLine(Fields sourceFields, Fields sinkFields, CompressorScheme.Compressor compressor)
sourceFields
- of FieldssinkFields
- of Fieldscompressor
- of type Compressor@ConstructorProperties(value={"sourceFields","sinkFields","compressor","charsetName"}) public TextLine(Fields sourceFields, Fields sinkFields, CompressorScheme.Compressor compressor, java.lang.String charsetName)
sourceFields
- of FieldssinkFields
- of Fieldscompressor
- of type CompressorcharsetName
- of type Stringprotected void setCharsetName(java.lang.String charsetName)
public java.lang.String getCharsetName()
public java.io.LineNumberReader createInput(java.io.InputStream inputStream)
public java.io.PrintWriter createOutput(java.io.OutputStream outputStream)
public void presentSourceFields(FlowProcess<? extends java.util.Properties> process, Tap tap, Fields fields)
Scheme
This method is called after Scheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSourceFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
process
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void presentSinkFields(FlowProcess<? extends java.util.Properties> process, Tap tap, Fields fields)
Scheme
This method is called after Scheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSinkFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
process
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void sourceConfInit(FlowProcess<? extends java.util.Properties> flowProcess, Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap, java.util.Properties conf)
Scheme
This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.
It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".
See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
if resources much be initialized
before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall)
if resources must be
destroyed after use.
sourceConfInit
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sinkConfInit(FlowProcess<? extends java.util.Properties> flowProcess, Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap, java.util.Properties conf)
Scheme
This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.
It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".
See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
if resources much be initialized
before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall)
if resources must be
destroyed after use.
sinkConfInit
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sourcePrepare(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.source(FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain thread-safe.
sourcePrepare
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssourceCall
- of type SourceCalljava.io.IOException
public void sourceRePrepare(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
after the Input
object
has been changed, if needed.
This method may be called zero or more times. Note Scheme.sourcePrepare(FlowProcess, SourceCall)
will always
be called before any Scheme.source(FlowProcess, SourceCall)
invocation.
sourceRePrepare
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssourceCall
- of type SourceCalljava.io.IOException
public boolean source(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
source
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssourceCall
- of SourceCalltrue
when a Tuple was successfully readjava.io.IOException
public void sourceCleanup(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
.sourceCleanup
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of ProcesssourceCall
- of type SourceCalljava.io.IOException
public void sinkPrepare(FlowProcess<? extends java.util.Properties> flowProcess, SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall) throws java.io.IOException
Scheme
Scheme.sink(cascading.flow.FlowProcess, SinkCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.sink(FlowProcess, SinkCall)
.
Be sure to place any initialized objects in the SinkContext
so each instance
will remain threadsafe.
sinkPrepare
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssinkCall
- of type SinkCalljava.io.IOException
public void sink(FlowProcess<? extends java.util.Properties> flowProcess, SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall) throws java.io.IOException
Scheme
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
public void sinkCleanup(FlowProcess<? extends java.util.Properties> flowProcess, SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall) throws java.io.IOException
Scheme
Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
.sinkCleanup
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssinkCall
- of type SinkCalljava.io.IOException
public java.lang.String getExtension()
getExtension
in interface FileFormat
Copyright © 2007-2017 Cascading Maintainers. All Rights Reserved.