public class TextDelimited extends CompressorScheme<java.io.LineNumberReader,java.io.PrintWriter> implements FileFormat
TextDelimited may also be used to skip the "header" in a file, where the header is defined as the very first line in every input file. That is, if the byte offset of the current line from the input is zero (0), that line will be skipped.
It is assumed if sink/source fields
is set to either Fields.ALL
or Fields.UNKNOWN
and
skipHeader
or hasHeader
is true
, the field names will be retrieved from the header of the
file and used during planning. The header will parsed with the same rules as the body of the file.
By default headers are not skipped.
TextDelimited may also be used to write a "header" in a file. The fields names for the header are taken directly
from the declared fields. Or if the declared fields are Fields.ALL
or Fields.UNKNOWN
, the
resolved field names will be used, if any.
By default headers are not written.
If hasHeaders
is set to true
on a constructor, both skipHeader
and writeHeader
will
be set to true
.
By default this Scheme
is both strict
and safe
.
Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a
TapException
. If strict is false
, then Tuple
will be returned with null
values
for the missing fields.
Safe meaning if a field cannot be coerced into an expected type, a null
will be used for the value.
If safe is false
, a TapException
will be thrown.
Also by default, quote
strings are not searched for to improve processing speed. If a file is
COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically
double quotes (").
Note all empty fields in a line will be returned as null
unless coerced into a new type.
This Scheme may source/sink Fields.ALL
, when given on the constructor the new instance will automatically
default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given
either, so all values will be returned as Strings.
By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName
constructor
argument.
To override field and line parsing behaviors, sub-class DelimitedParser
or provide a
FieldTypeResolver
implementation.
Note that there should be no expectation that TextDelimited, or specifically DelimitedParser
, can handle
all delimited and quoted combinations reliably. Attempting to do so would impair its performance and maintainability.
Further, it can be safely said any corrupted files will not be supported for obvious reasons. Corrupted files may result in exceptions or could cause edge cases in the underlying java regular expression engine.
A large part of Cascading was designed to help users cleans data. Thus the recommendation is to create Flows that are responsible for cleansing large data-sets when faced with the problem.
DelimitedParser maybe sub-classed and extended if necessary.
In order to read or write a compressed files, pass a CompressorScheme.Compressor
instance to the appropriate constructors. See Compressors
for provided compression algorithms.
TextLine
,
Compressors
,
Serialized FormCompressorScheme.Compressor
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_CHARSET |
compressor, NO_COMPRESSOR
Constructor and Description |
---|
TextDelimited()
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(CompressorScheme.Compressor compressor)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(CompressorScheme.Compressor compressor,
boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(CompressorScheme.Compressor compressor,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(Fields fields)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
boolean strict,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
boolean strict,
java.lang.String quote,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String charsetName,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
boolean strict,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
boolean strict,
java.lang.String quote,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String charsetName,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
CompressorScheme.Compressor compressor,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
Modifier and Type | Method and Description |
---|---|
java.io.LineNumberReader |
createInput(java.io.InputStream inputStream) |
java.io.PrintWriter |
createOutput(java.io.OutputStream outputStream) |
java.lang.String |
getCharsetName() |
java.lang.String |
getDelimiter()
Method getDelimiter returns the delimiter used to parse fields from the current line of text.
|
java.lang.String |
getExtension() |
java.lang.String |
getQuote()
Method getQuote returns the quote string, if any, used to encapsulate each field in a line to delimited text.
|
protected boolean |
isAppendingFile(SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall,
java.io.OutputStream originalOutput) |
boolean |
isSymmetrical()
Method isSymmetrical returns
true if the sink fields equal the source fields. |
void |
presentSinkFields(FlowProcess<? extends java.util.Properties> flowProcess,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved.
|
void |
presentSourceFields(FlowProcess<? extends java.util.Properties> process,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved.
|
Fields |
retrieveSourceFields(FlowProcess<? extends java.util.Properties> process,
Tap tap)
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically
update the fields it sources.
|
void |
setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object.
|
void |
setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object.
|
void |
sink(FlowProcess<? extends java.util.Properties> flowProcess,
SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Method sink writes out the given
Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkCleanup(FlowProcess<? extends java.util.Properties> flowProcess,
SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Method sinkCleanup is used to destroy resources created by
Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) . |
void |
sinkConfInit(FlowProcess<? extends java.util.Properties> flowProcess,
Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap,
java.util.Properties conf)
Method sinkInit initializes this instance as a sink.
|
void |
sinkPrepare(FlowProcess<? extends java.util.Properties> flowProcess,
SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Method sinkPrepare is used to initialize resources needed during each call of
Scheme.sink(cascading.flow.FlowProcess, SinkCall) . |
boolean |
source(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method source will read a new "record" or value from
SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method sourceCleanup is used to destroy resources created by
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceConfInit(FlowProcess<? extends java.util.Properties> flowProcess,
Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap,
java.util.Properties conf)
Method sourceInit initializes this instance as a source.
|
void |
sourcePrepare(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of
Scheme.source(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceRePrepare(FlowProcess<? extends java.util.Properties> flowProcess,
SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall)
Method sourceRePrepare is used to re-initialize resources needed during each call of
Scheme.source(cascading.flow.FlowProcess, SourceCall) after the Input object
has been changed, if needed. |
setCompressor, sinkWrap, sourceWrap
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, setNumSinkParts, toString
public static final java.lang.String DEFAULT_CHARSET
public TextDelimited()
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
@ConstructorProperties(value={"hasHeader","delimiter"}) public TextDelimited(boolean hasHeader, java.lang.String delimiter)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimiter
- @ConstructorProperties(value={"hasHeader","delimiter","quote"}) public TextDelimited(boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimiter
- quote
- @ConstructorProperties(value={"hasHeader","delimitedParser"}) public TextDelimited(boolean hasHeader, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimitedParser
- @ConstructorProperties(value="delimitedParser") public TextDelimited(DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
This constructor will set skipHeader
and writeHeader
values to true.
delimitedParser
- @ConstructorProperties(value="fields") public TextDelimited(Fields fields)
fields
- of type Fields@ConstructorProperties(value={"fields","delimiter"}) public TextDelimited(Fields fields, java.lang.String delimiter)
fields
- of type Fieldsdelimiter
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","delimiter","types"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","delimiter","quote"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.String quote)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter","quote"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","charsetName"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.String charsetName)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type StringcharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","strict","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, boolean strict, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","strict","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, boolean strict, java.lang.String quote, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimitedParser"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
fields
- of type FieldswriteHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","hasHeader","delimitedParser"}) public TextDelimited(Fields fields, boolean hasHeader, DelimitedParser delimitedParser)
fields
- of type FieldshasHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","charsetName","delimitedParser"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String charsetName, DelimitedParser delimitedParser)
fields
- of type FieldswriteHeader
- of type booleancharsetName
- of type StringdelimitedParser
- of type DelimitedParser@ConstructorProperties(value="compressor") public TextDelimited(CompressorScheme.Compressor compressor)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
compressor
- of type Compressor, see Compressors
@ConstructorProperties(value={"compressor","hasHeader","delimiter"}) public TextDelimited(CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
compressor
- of type Compressor, see Compressors
hasHeader
- delimiter
- @ConstructorProperties(value={"compressor","hasHeader","delimiter","quote"}) public TextDelimited(CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
compressor
- of type Compressor, see Compressors
hasHeader
- delimiter
- quote
- @ConstructorProperties(value={"compressor","hasHeader","delimitedParser"}) public TextDelimited(CompressorScheme.Compressor compressor, boolean hasHeader, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
compressor
- of type Compressor, see Compressors
hasHeader
- delimitedParser
- @ConstructorProperties(value={"compressor","delimitedParser"}) public TextDelimited(CompressorScheme.Compressor compressor, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
This constructor will set skipHeader
and writeHeader
values to true.
compressor
- of type Compressor, see Compressors
delimitedParser
- @ConstructorProperties(value={"fields","compressor"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
@ConstructorProperties(value={"fields","compressor","delimiter"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, java.lang.String delimiter)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
delimiter
- of type String@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimiter"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String delimiter)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
skipHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","compressor","delimiter","types"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
delimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter","types"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimiter","types"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
skipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","compressor","delimiter","quote","types"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
delimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
skipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","compressor","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
delimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
skipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","compressor","delimiter","quote"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, java.lang.String delimiter, java.lang.String quote)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
delimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter","quote"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","compressor","hasHeader","delimiter","quote","charsetName"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.String charsetName)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimiter
- of type Stringquote
- of type StringcharsetName
- of type String@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimiter","strict","quote","types","safe"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, boolean strict, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
skipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimiter","strict","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, boolean strict, java.lang.String quote, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
skipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","delimitedParser"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
writeHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","compressor","hasHeader","delimitedParser"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean hasHeader, DelimitedParser delimitedParser)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
hasHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","compressor","skipHeader","writeHeader","charsetName","delimitedParser"}) public TextDelimited(Fields fields, CompressorScheme.Compressor compressor, boolean skipHeader, boolean writeHeader, java.lang.String charsetName, DelimitedParser delimitedParser)
fields
- of type Fieldscompressor
- of type Compressor, see Compressors
compressor
- of type Compressor, see Compressors
writeHeader
- of type booleancharsetName
- of type StringdelimitedParser
- of type DelimitedParserpublic java.lang.String getCharsetName()
public java.lang.String getDelimiter()
public java.lang.String getQuote()
public java.io.LineNumberReader createInput(java.io.InputStream inputStream)
public java.io.PrintWriter createOutput(java.io.OutputStream outputStream)
public void setSinkFields(Fields sinkFields)
Scheme
setSinkFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
sinkFields
- the sinkFields of this Scheme object.public void setSourceFields(Fields sourceFields)
Scheme
setSourceFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
sourceFields
- the sourceFields of this Scheme object.public boolean isSymmetrical()
Scheme
true
if the sink fields equal the source fields. That is, this
scheme sources the same fields as it sinks.isSymmetrical
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
public Fields retrieveSourceFields(FlowProcess<? extends java.util.Properties> process, Tap tap)
Scheme
The FlowProcess
presents all known properties resolved by the current planner.
The tap
instance is the parent Tap
for this Scheme instance.
retrieveSourceFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
process
- of type FlowProcesstap
- of type Tappublic void presentSourceFields(FlowProcess<? extends java.util.Properties> process, Tap tap, Fields fields)
Scheme
This method is called after Scheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSourceFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
process
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void presentSinkFields(FlowProcess<? extends java.util.Properties> flowProcess, Tap tap, Fields fields)
Scheme
This method is called after Scheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSinkFields
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void sourceConfInit(FlowProcess<? extends java.util.Properties> flowProcess, Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap, java.util.Properties conf)
Scheme
This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.
It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".
See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
if resources much be initialized
before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall)
if resources must be
destroyed after use.
sourceConfInit
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sourcePrepare(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.source(FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain thread-safe.
sourcePrepare
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssourceCall
- of type SourceCalljava.io.IOException
public void sourceRePrepare(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
after the Input
object
has been changed, if needed.
This method may be called zero or more times. Note Scheme.sourcePrepare(FlowProcess, SourceCall)
will always
be called before any Scheme.source(FlowProcess, SourceCall)
invocation.
sourceRePrepare
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssourceCall
- of type SourceCalljava.io.IOException
public boolean source(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
source
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssourceCall
- of SourceCalltrue
when a Tuple was successfully readjava.io.IOException
public void sourceCleanup(FlowProcess<? extends java.util.Properties> flowProcess, SourceCall<java.io.LineNumberReader,java.io.InputStream> sourceCall) throws java.io.IOException
Scheme
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
.sourceCleanup
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of ProcesssourceCall
- of type SourceCalljava.io.IOException
public void sinkConfInit(FlowProcess<? extends java.util.Properties> flowProcess, Tap<java.util.Properties,java.io.InputStream,java.io.OutputStream> tap, java.util.Properties conf)
Scheme
This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.
It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".
See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
if resources much be initialized
before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall)
if resources must be
destroyed after use.
sinkConfInit
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sinkPrepare(FlowProcess<? extends java.util.Properties> flowProcess, SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Scheme
Scheme.sink(cascading.flow.FlowProcess, SinkCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.sink(FlowProcess, SinkCall)
.
Be sure to place any initialized objects in the SinkContext
so each instance
will remain threadsafe.
sinkPrepare
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssinkCall
- of type SinkCallprotected boolean isAppendingFile(SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall, java.io.OutputStream originalOutput)
public void sink(FlowProcess<? extends java.util.Properties> flowProcess, SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall) throws java.io.IOException
Scheme
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
public void sinkCleanup(FlowProcess<? extends java.util.Properties> flowProcess, SinkCall<java.io.PrintWriter,java.io.OutputStream> sinkCall)
Scheme
Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
.sinkCleanup
in class Scheme<java.util.Properties,java.io.InputStream,java.io.OutputStream,java.io.LineNumberReader,java.io.PrintWriter>
flowProcess
- of type FlowProcesssinkCall
- of type SinkCallpublic java.lang.String getExtension()
getExtension
in interface FileFormat
Copyright © 2007-2017 Cascading Maintainers. All Rights Reserved.