public class TextDelimited extends TextLine
TextLine
. It provides direct support for delimited text files, like
TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.
TextDelimited may also be used to skip the "header" in a file, where the header is defined as the very first line in every input file. That is, if the byte offset of the current line from the input is zero (0), that line will be skipped.
It is assumed if sink/source fields
is set to either Fields.ALL
or Fields.UNKNOWN
and
skipHeader
or hasHeader
is true
, the field names will be retrieved from the header of the
file and used during planning. The header will parsed with the same rules as the body of the file.
By default headers are not skipped.
TextDelimited may also be used to write a "header" in a file. The fields names for the header are taken directly
from the declared fields. Or if the declared fields are Fields.ALL
or Fields.UNKNOWN
, the
resolved field names will be used, if any.
By default headers are not written.
If hasHeaders
is set to true
on a constructor, both skipHeader
and writeHeader
will
be set to true
.
By default this Scheme
is both strict
and safe
.
Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a
TapException
. If strict is false
, then Tuple
will be returned with null
values
for the missing fields.
Safe meaning if a field cannot be coerced into an expected type, a null
will be used for the value.
If safe is false
, a TapException
will be thrown.
Also by default, quote
strings are not searched for to improve processing speed. If a file is
COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically
double quotes (").
Note all empty fields in a line will be returned as null
unless coerced into a new type.
This Scheme may source/sink Fields.ALL
, when given on the constructor the new instance will automatically
default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given
either, so all values will be returned as Strings.
By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName
constructor
argument.
To override field and line parsing behaviors, sub-class DelimitedParser
or provide a
FieldTypeResolver
implementation.
Note that there should be no expectation that TextDelimited, or specifically DelimitedParser
, can handle
all delimited and quoted combinations reliably. Attempting to do so would impair its performance and maintainability.
Further, it can be safely said any corrupted files will not be supported for obvious reasons. Corrupted files may result in exceptions or could cause edge cases in the underlying java regular expression engine.
A large part of Cascading was designed to help users cleans data. Thus the recommendation is to create Flows that are responsible for cleansing large data-sets when faced with the problem
DelimitedParser maybe sub-classed and extended if necessary.
TextLine
,
Serialized FormTextLine.Compress
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_CHARSET |
protected DelimitedParser |
delimitedParser
Field delimitedParser
|
DEFAULT_SOURCE_FIELDS
Constructor and Description |
---|
TextDelimited()
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(Fields fields)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
boolean strict,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
boolean strict,
java.lang.String quote,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String charsetName,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
boolean writeHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.Class[] types,
boolean safe,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote,
java.lang.String charsetName)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
java.lang.String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
java.lang.String delimiter,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
java.lang.String delimiter,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
java.lang.String delimiter,
java.lang.String quote,
java.lang.Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.
|
TextDelimited(TextLine.Compress sinkCompression,
boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
TextDelimited(TextLine.Compress sinkCompression,
boolean hasHeader,
java.lang.String delimiter,
java.lang.String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
TextDelimited(TextLine.Compress sinkCompression,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing
Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getDelimiter()
Method getDelimiter returns the delimiter used to parse fields from the current line of text.
|
java.lang.String |
getExtension() |
java.lang.String |
getQuote()
Method getQuote returns the quote string, if any, used to encapsulate each field in a line to delimited text.
|
boolean |
isSymmetrical()
Method isSymmetrical returns
true if the sink fields equal the source fields. |
void |
presentSinkFields(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved.
|
void |
presentSourceFields(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved.
|
Fields |
retrieveSourceFields(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
Tap tap)
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically
update the fields it sources.
|
void |
setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object.
|
void |
setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object.
|
void |
sink(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SinkCall<java.lang.Object[],org.apache.hadoop.mapred.OutputCollector> sinkCall)
Method sink writes out the given
Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkPrepare(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SinkCall<java.lang.Object[],org.apache.hadoop.mapred.OutputCollector> sinkCall)
Method sinkPrepare is used to initialize resources needed during each call of
Scheme.sink(cascading.flow.FlowProcess, SinkCall) . |
boolean |
source(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Method source will read a new "record" or value from
SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourcePrepare(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess,
SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of
Scheme.source(cascading.flow.FlowProcess, SourceCall) . |
protected void |
writeHeader(SinkCall<java.lang.Object[],org.apache.hadoop.mapred.OutputCollector> sinkCall) |
getCharsetName, getSinkCompression, makeEncodedString, setCharsetName, setSinkCompression, sinkConfInit, sourceCleanup, sourceConfInit, sourceHandleInput, verify
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, setNumSinkParts, sinkCleanup, sinkWrap, sourceRePrepare, sourceWrap, toString
public static final java.lang.String DEFAULT_CHARSET
protected final DelimitedParser delimitedParser
public TextDelimited()
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
@ConstructorProperties(value={"hasHeader","delimiter"}) public TextDelimited(boolean hasHeader, java.lang.String delimiter)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"hasHeader","delimiter","quote"}) public TextDelimited(boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"hasHeader","delimitedParser"}) public TextDelimited(boolean hasHeader, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value="delimitedParser") public TextDelimited(DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
This constructor will set skipHeader
and writeHeader
values to true.
delimitedParser
- of type DelimitedParser@ConstructorProperties(value={"sinkCompression","hasHeader","delimitedParser"}) public TextDelimited(TextLine.Compress sinkCompression, boolean hasHeader, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
sinkCompression
- of type CompresshasHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"sinkCompression","delimitedParser"}) public TextDelimited(TextLine.Compress sinkCompression, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
This constructor will set skipHeader
and writeHeader
values to true.
delimitedParser
- of type DelimitedParser@ConstructorProperties(value={"sinkCompression","hasHeader","delimiter","quote"}) public TextDelimited(TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
sinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value="fields") public TextDelimited(Fields fields)
fields
- of type Fields@ConstructorProperties(value={"fields","delimiter"}) public TextDelimited(Fields fields, java.lang.String delimiter)
fields
- of type Fieldsdelimiter
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","delimiter","types"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","delimiter"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, java.lang.String delimiter)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type String@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","sinkCompression","delimiter","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.Class[] types)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","delimiter","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, java.lang.String delimiter, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","types","safe","charsetName"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","delimiter","quote"}) public TextDelimited(Fields fields, java.lang.String delimiter, java.lang.String quote)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter","quote"}) public TextDelimited(Fields fields, boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, java.lang.String delimiter, java.lang.String quote)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.String quote)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote","charsetName"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.String charsetName)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringquote
- of type StringcharsetName
- of type String@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","quote"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","hasHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean hasHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompresshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","strict","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, boolean strict, java.lang.String quote, java.lang.Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimiter","strict","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String delimiter, boolean strict, java.lang.String quote, java.lang.Class[] types, boolean safe, java.lang.String charsetName)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimitedParser"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
fields
- of type FieldswriteHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","hasHeader","delimitedParser"}) public TextDelimited(Fields fields, boolean hasHeader, DelimitedParser delimitedParser)
fields
- of type FieldshasHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","delimitedParser"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
fields
- of type FieldswriteHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","sinkCompression","skipHeader","writeHeader","charsetName","delimitedParser"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, boolean writeHeader, java.lang.String charsetName, DelimitedParser delimitedParser)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleanwriteHeader
- of type booleancharsetName
- of type StringdelimitedParser
- of type DelimitedParserpublic java.lang.String getDelimiter()
public java.lang.String getQuote()
public boolean isSymmetrical()
Scheme
true
if the sink fields equal the source fields. That is, this
scheme sources the same fields as it sinks.isSymmetrical
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Object[]>
public void setSinkFields(Fields sinkFields)
Scheme
setSinkFields
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Object[]>
sinkFields
- the sinkFields of this Scheme object.public void setSourceFields(Fields sourceFields)
Scheme
setSourceFields
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Object[]>
sourceFields
- the sourceFields of this Scheme object.public Fields retrieveSourceFields(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, Tap tap)
Scheme
The FlowProcess
presents all known properties resolved by the current planner.
The tap
instance is the parent Tap
for this Scheme instance.
retrieveSourceFields
in class Scheme<org.apache.hadoop.conf.Configuration,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,java.lang.Object[],java.lang.Object[]>
flowProcess
- of type FlowProcesstap
- of type Tappublic void presentSourceFields(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, Tap tap, Fields fields)
Scheme
This method is called after Scheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSourceFields
in class TextLine
flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void presentSinkFields(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, Tap tap, Fields fields)
Scheme
This method is called after Scheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSinkFields
in class TextLine
flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void sourcePrepare(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.source(FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain thread-safe.
sourcePrepare
in class TextLine
flowProcess
- of type FlowProcesssourceCall
- of type SourceCallpublic boolean source(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SourceCall<java.lang.Object[],org.apache.hadoop.mapred.RecordReader> sourceCall) throws java.io.IOException
Scheme
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
public void sinkPrepare(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SinkCall<java.lang.Object[],org.apache.hadoop.mapred.OutputCollector> sinkCall) throws java.io.IOException
Scheme
Scheme.sink(cascading.flow.FlowProcess, SinkCall)
.
This method is guaranteed to be called once before the first invocation of Scheme.sink(FlowProcess, SinkCall)
.
Be sure to place any initialized objects in the SinkContext
so each instance
will remain threadsafe.
sinkPrepare
in class TextLine
flowProcess
- of type FlowProcesssinkCall
- of type SinkCalljava.io.IOException
protected void writeHeader(SinkCall<java.lang.Object[],org.apache.hadoop.mapred.OutputCollector> sinkCall) throws java.io.IOException
java.io.IOException
public void sink(FlowProcess<? extends org.apache.hadoop.conf.Configuration> flowProcess, SinkCall<java.lang.Object[],org.apache.hadoop.mapred.OutputCollector> sinkCall) throws java.io.IOException
Scheme
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
public java.lang.String getExtension()
getExtension
in interface FileFormat
getExtension
in class TextLine
Copyright © 2007-2017 Cascading Maintainers. All Rights Reserved.