|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcascading.tap.Tap<Config,Input,Output>
public abstract class Tap<Config,Input,Output>
A Tap represents the physical data source or sink in a connected Flow
.
Pipe
and Tuple
stream, and
a sink Tap is the tail end. Kinds of Tap types are used to manage files from a local disk,
distributed disk, remote storage like Amazon S3, or via FTP. It simply abstracts
out the complexity of connecting to these types of data sources.
A Tap takes a Scheme
instance, which is used to identify the type of resource (text file, binary file, etc).
A Tap is responsible for how the resource is reached.
A Tap is not given an explicit name by design. This is so a given Tap instance can be
re-used in different Flow
s that may expect a source or sink by a different
logical name, but are the same physical resource. If a tap had a name other than its path, which would be
used for the tap identity? If the name, then two Tap instances with different names but the same path could
interfere with one another.
Constructor Summary | |
---|---|
protected |
Tap()
|
protected |
Tap(Scheme<Config,Input,Output,?,?> scheme)
|
protected |
Tap(Scheme<Config,Input,Output,?,?> scheme,
SinkMode sinkMode)
|
Method Summary | |
---|---|
boolean |
commitResource(Config conf)
Method commitResource allows the underlying resource to be notified when all write processing is successful so that any additional cleanup or processing may be completed. |
abstract boolean |
createResource(Config conf)
Method createResource creates the underlying resource. |
abstract boolean |
deleteResource(Config conf)
Method deleteResource deletes the resource represented by this instance. |
boolean |
equals(java.lang.Object object)
|
void |
flowConfInit(Flow<Config> flow)
Method flowInit allows this Tap instance to initialize itself in context of the given Flow instance. |
ConfigDef |
getConfigDef()
Returns a ConfigDef instance that allows for local properties to be set and made available via
a resulting FlowProcess instance when the tap is invoked. |
java.lang.String |
getFullIdentifier(Config conf)
Method getFullIdentifier returns a fully qualified resource identifier. |
abstract java.lang.String |
getIdentifier()
Method getIdentifier returns a String representing the resource this Tap instance represents. |
abstract long |
getModifiedTime(Config conf)
Method getModifiedTime returns the date this resource was last modified. |
Scheme<Config,Input,Output,?,?> |
getScheme()
Method getScheme returns the scheme of this Tap object. |
Fields |
getSinkFields()
Method getSinkFields returns the sinkFields of this Tap object. |
SinkMode |
getSinkMode()
Method getSinkMode returns the SinkMode }of this Tap object. |
Fields |
getSourceFields()
Method getSourceFields returns the sourceFields of this Tap object. |
ConfigDef |
getStepConfigDef()
Returns a ConfigDef instance that allows for process level properties to be set and made available via
a resulting FlowProcess instance when the tap is invoked. |
java.lang.String |
getTrace()
Method getTrace return the trace of this object. |
boolean |
hasConfigDef()
Returns true if there are properties in the configDef instance. |
int |
hashCode()
|
boolean |
hasProcessConfigDef()
Returns true if there are properties in the processConfigDef instance. |
boolean |
isEquivalentTo(FlowElement element)
|
boolean |
isKeep()
Method isKeep indicates whether the resource represented by this instance should be kept if it already exists when the Flow is started. |
boolean |
isReplace()
Method isReplace indicates whether the resource represented by this instance should be deleted if it already exists when the Flow is started. |
boolean |
isSink()
Method isSink returns true if this Tap instance can be used as a sink. |
boolean |
isSource()
Method isSource returns true if this Tap instance can be used as a source. |
boolean |
isTemporary()
Method isTemporary returns true if this Tap is temporary (used for intermediate results). |
boolean |
isUpdate()
Method isUpdate indicates whether the resource represented by this instance should be updated if it already exists. |
TupleEntryIterator |
openForRead(FlowProcess<Config> flowProcess)
Method openForRead opens the resource represented by this Tap instance. |
abstract TupleEntryIterator |
openForRead(FlowProcess<Config> flowProcess,
Input input)
Method openForRead opens the resource represented by this Tap instance. |
TupleEntryCollector |
openForWrite(FlowProcess<Config> flowProcess)
Method openForWrite opens the resource represented by this Tap instance. |
abstract TupleEntryCollector |
openForWrite(FlowProcess<Config> flowProcess,
Output output)
Method openForWrite opens the resource represented by this Tap instance. |
Scope |
outgoingScopeFor(java.util.Set<Scope> incomingScopes)
Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement. |
void |
presentSinkFields(FlowProcess<Config> flowProcess,
Fields fields)
|
void |
presentSourceFields(FlowProcess<Config> flowProcess,
Fields fields)
|
Fields |
resolveFields(Scope scope)
Method resolveFields returns the actual field names represented by the given Scope. |
Fields |
resolveIncomingOperationFields(Scope incomingScope)
Method resolveIncomingOperationFields resolves the incoming scopes to the actual incoming operation field names. |
abstract boolean |
resourceExists(Config conf)
Method resourceExists returns true if the path represented by this instance exists. |
Fields |
retrieveSinkFields(FlowProcess<Config> flowProcess)
A hook for allowing a Scheme to lazily retrieve its sink fields. |
Fields |
retrieveSourceFields(FlowProcess<Config> flowProcess)
A hook for allowing a Scheme to lazily retrieve its source fields. |
boolean |
rollbackResource(Config conf)
Method rollbackResource allows the underlying resource to be notified when any write processing has failed or was stopped so that any cleanup may be started. |
protected void |
setScheme(Scheme<Config,Input,Output,?,?> scheme)
|
void |
sinkConfInit(FlowProcess<Config> flowProcess,
Config conf)
Method sinkInit initializes this instance as a sink. |
void |
sourceConfInit(FlowProcess<Config> flowProcess,
Config conf)
Method sourceInit initializes this instance as a source. |
static Tap[] |
taps(Tap... taps)
Convenience function to make an array of Tap instances. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected Tap()
protected Tap(Scheme<Config,Input,Output,?,?> scheme)
protected Tap(Scheme<Config,Input,Output,?,?> scheme, SinkMode sinkMode)
Method Detail |
---|
public static Tap[] taps(Tap... taps)
taps
- of type Tap
protected void setScheme(Scheme<Config,Input,Output,?,?> scheme)
public Scheme<Config,Input,Output,?,?> getScheme()
public java.lang.String getTrace()
public void flowConfInit(Flow<Config> flow)
Flow
instance.
This method is guaranteed to be called before the Flow is started and the
FlowListener.onStarting(cascading.flow.Flow)
event is fired.
This method will be called once per Flow, and before #sourceConfInit(FlowProcess, Config)
and
#sinkConfInit(FlowProcess, Config)
methods.
flow
- of type Flowpublic void sourceConfInit(FlowProcess<Config> flowProcess, Config conf)
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
flowProcess
- conf
- of type JobConf @throws IOException on resource initialization failure.public void sinkConfInit(FlowProcess<Config> flowProcess, Config conf)
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
flowProcess
- conf
- of type JobConf @throws IOException on resource initialization failure.public abstract java.lang.String getIdentifier()
public Fields getSourceFields()
public Fields getSinkFields()
public abstract TupleEntryIterator openForRead(FlowProcess<Config> flowProcess, Input input) throws java.io.IOException
input
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sourceConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
input type and instantiate it before calling super.openForRead()
.
Note the returned iterator will return the same instance of TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.
flowProcess
- input
-
java.io.IOException
public TupleEntryIterator openForRead(FlowProcess<Config> flowProcess) throws java.io.IOException
TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.
flowProcess
-
java.io.IOException
public abstract TupleEntryCollector openForWrite(FlowProcess<Config> flowProcess, Output output) throws java.io.IOException
output
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sinkConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
output type and instantiate it before calling super.openForWrite()
.
flowProcess
- output
-
java.io.IOException
- whenpublic TupleEntryCollector openForWrite(FlowProcess<Config> flowProcess) throws java.io.IOException
flowProcess
-
java.io.IOException
- whenpublic Scope outgoingScopeFor(java.util.Set<Scope> incomingScopes)
FlowElement
outgoingScopeFor
in interface FlowElement
incomingScopes
- of type Setpublic Fields retrieveSourceFields(FlowProcess<Config> flowProcess)
flowProcess
-
public void presentSourceFields(FlowProcess<Config> flowProcess, Fields fields)
public Fields retrieveSinkFields(FlowProcess<Config> flowProcess)
public void presentSinkFields(FlowProcess<Config> flowProcess, Fields fields)
public Fields resolveIncomingOperationFields(Scope incomingScope)
FlowElement
resolveIncomingOperationFields
in interface FlowElement
incomingScope
- of type Scope
public Fields resolveFields(Scope scope)
FlowElement
resolveFields
in interface FlowElement
scope
- of type Scope
public java.lang.String getFullIdentifier(Config conf)
conf
- of type Config
public abstract boolean createResource(Config conf) throws java.io.IOException
conf
- of type JobConf
java.io.IOException
- when there is an error making directoriespublic abstract boolean deleteResource(Config conf) throws java.io.IOException
conf
- of type JobConf
java.io.IOException
- when the resource cannot be deletedpublic boolean commitResource(Config conf) throws java.io.IOException
rollbackResource(Object)
to handle cleanup in the face of failures.
This method is invoked once "client side" and not in the cluster, if any.
conf
-
java.io.IOException
public boolean rollbackResource(Config conf) throws java.io.IOException
commitResource(Object)
to handle cleanup when the write has successfully completed.
This method is invoked once "client side" and not in the cluster, if any.
conf
-
java.io.IOException
public abstract boolean resourceExists(Config conf) throws java.io.IOException
conf
- of type JobConf
java.io.IOException
- when the status cannot be determinedpublic abstract long getModifiedTime(Config conf) throws java.io.IOException
conf
- of type Config
java.io.IOException
public SinkMode getSinkMode()
SinkMode
}of this Tap object.
public boolean isKeep()
public boolean isReplace()
public boolean isUpdate()
public boolean isSink()
public boolean isSource()
public boolean isTemporary()
public ConfigDef getConfigDef()
ConfigDef
instance that allows for local properties to be set and made available via
a resulting FlowProcess
instance when the tap is invoked.
Any properties set on the configDef will not show up in any Flow
or FlowStep
process
level configuration, but will override any of those values as seen by the current Tap instance method call where a
FlowProcess is provided except for the sourceConfInit(cascading.flow.FlowProcess, Object)
and
sinkConfInit(cascading.flow.FlowProcess, Object)
methods.
That is, the *confInit
methods are called before any ConfigDef is applied, so any values placed into
a ConfigDef instance will not be visible to them.
getConfigDef
in interface FlowElement
public boolean hasConfigDef()
true
if there are properties in the configDef instance.
hasConfigDef
in interface FlowElement
public ConfigDef getStepConfigDef()
ConfigDef
instance that allows for process level properties to be set and made available via
a resulting FlowProcess
instance when the tap is invoked.
Any properties set on the stepConfigDef will not show up in any Flow configuration, but will show up in
the current process FlowStep
(in Hadoop the MapReduce jobconf). Any value set in the
stepConfigDef will be overridden by the tap local #getConfigDef
instance.
Use this method to tweak properties in the process step this tap instance is planned into.
Note the *confInit
methods are called before any ConfigDef is applied, so any values placed into
a ConfigDef instance will not be visible to them.
getStepConfigDef
in interface FlowElement
public boolean hasProcessConfigDef()
true
if there are properties in the processConfigDef instance.
hasProcessConfigDef
in interface FlowElement
public boolean isEquivalentTo(FlowElement element)
isEquivalentTo
in interface FlowElement
public boolean equals(java.lang.Object object)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |