cascading.tap.hadoop
Class TemplateTap

java.lang.Object
  extended by cascading.tap.Tap<Config,java.lang.Void,Output>
      extended by cascading.tap.SinkTap<JobConf,OutputCollector>
          extended by cascading.tap.hadoop.TemplateTap
All Implemented Interfaces:
FlowElement, java.io.Serializable

public class TemplateTap
extends SinkTap<JobConf,OutputCollector>

Class TemplateTap can be used to write tuple streams out to sub-directories based on the values in the Tuple instance.

The constructor takes a Hfs Tap and a Formatter format syntax String. This allows Tuple values at given positions to be used as directory names. Note that Hadoop can only sink to directories, and all files in those directories are "part-xxxxx" files.

openTapsThreshold limits the number of open files to be output to. This value defaults to 300 files. Each time the threshold is exceeded, 10% of the least recently used open files will be closed.

See Also:
Serialized Form

Nested Class Summary
static class TemplateTap.TemplateScheme
           
 
Constructor Summary
TemplateTap(Hfs parent, java.lang.String pathTemplate)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, Fields pathFields)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, Fields pathFields, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, Fields pathFields, SinkMode sinkMode)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, Fields pathFields, SinkMode sinkMode, boolean keepParentOnDelete)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, Fields pathFields, SinkMode sinkMode, boolean keepParentOnDelete, int openTapsThreshold)
          /** Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, SinkMode sinkMode)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, SinkMode sinkMode, boolean keepParentOnDelete)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, java.lang.String pathTemplate, SinkMode sinkMode, boolean keepParentOnDelete, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
 
Method Summary
 boolean commitResource(JobConf conf)
          Method commitResource allows the underlying resource to be notified when all write processing is successful so that any additional cleanup or processing may be completed.
 boolean createResource(JobConf conf)
          Method createResource creates the underlying resource.
 boolean deleteResource(JobConf conf)
          Method deleteResource deletes the resource represented by this instance.
 boolean equals(java.lang.Object object)
           
 java.lang.String getIdentifier()
          Method getIdentifier returns a String representing the resource this Tap instance represents.
 long getModifiedTime(JobConf conf)
          Method getModifiedTime returns the date this resource was last modified.
 int getOpenTapsThreshold()
          Method getOpenTapsThreshold returns the openTapsThreshold of this TemplateTap object.
 Tap getParent()
          Method getParent returns the parent Tap of this TemplateTap object.
 java.lang.String getPathTemplate()
          Method getPathTemplate returns the pathTemplate Formatter format String of this TemplateTap object.
 int hashCode()
           
 TupleEntryCollector openForWrite(FlowProcess<JobConf> flowProcess, OutputCollector output)
          Method openForWrite opens the resource represented by this Tap instance.
 boolean resourceExists(JobConf conf)
          Method resourceExists returns true if the path represented by this instance exists.
 boolean rollbackResource(JobConf conf)
          Method rollbackResource allows the underlying resource to be notified when any write processing has failed or was stopped so that any cleanup may be started.
 java.lang.String toString()
           
 
Methods inherited from class cascading.tap.SinkTap
getSourceFields, isSource, openForRead, sourceConfInit
 
Methods inherited from class cascading.tap.Tap
flowConfInit, getConfigDef, getFullIdentifier, getScheme, getSinkFields, getSinkMode, getStepConfigDef, getTrace, hasConfigDef, hasProcessConfigDef, isEquivalentTo, isKeep, isReplace, isSink, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveFields, resolveIncomingOperationFields, retrieveSinkFields, retrieveSourceFields, setScheme, sinkConfInit, taps
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

Parameters:
parent - of type Tap
pathTemplate - of type String

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Hfs
pathTemplate - of type String
openTapsThreshold - of type int

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","sinkMode"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         SinkMode sinkMode)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

Parameters:
parent - of type Tap
pathTemplate - of type String
sinkMode - of type SinkMode

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","sinkMode","keepParentOnDelete"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         SinkMode sinkMode,
                                         boolean keepParentOnDelete)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when deleteResource(org.apache.hadoop.mapred.JobConf) is called, typically an issue when used inside a Cascade.

Parameters:
parent - of type Tap
pathTemplate - of type String
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","sinkMode","keepParentOnDelete","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         SinkMode sinkMode,
                                         boolean keepParentOnDelete,
                                         int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when deleteResource(org.apache.hadoop.mapred.JobConf) is called, typically an issue when used inside a Cascade.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Tap
pathTemplate - of type String
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean
openTapsThreshold - of type int

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         Fields pathFields)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

Parameters:
parent - of type Tap
pathTemplate - of type String
pathFields - of type Fields

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         Fields pathFields,
                                         int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Hfs
pathTemplate - of type String
pathFields - of type Fields
openTapsThreshold - of type int

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","sinkMode"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         Fields pathFields,
                                         SinkMode sinkMode)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

Parameters:
parent - of type Tap
pathTemplate - of type String
pathFields - of type Fields
sinkMode - of type SinkMode

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","sinkMode","keepParentOnDelete"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         Fields pathFields,
                                         SinkMode sinkMode,
                                         boolean keepParentOnDelete)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when deleteResource(org.apache.hadoop.mapred.JobConf) is called, typically an issue when used inside a Cascade.

Parameters:
parent - of type Tap
pathTemplate - of type String
pathFields - of type Fields
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","sinkMode","keepParentOnDelete","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                         java.lang.String pathTemplate,
                                         Fields pathFields,
                                         SinkMode sinkMode,
                                         boolean keepParentOnDelete,
                                         int openTapsThreshold)
/** Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when deleteResource(org.apache.hadoop.mapred.JobConf) is called, typically an issue when used inside a Cascade.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Hfs
pathTemplate - of type String
pathFields - of type Fields
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean
openTapsThreshold - of type int
Method Detail

getParent

public Tap getParent()
Method getParent returns the parent Tap of this TemplateTap object.

Returns:
the parent (type Tap) of this TemplateTap object.

getPathTemplate

public java.lang.String getPathTemplate()
Method getPathTemplate returns the pathTemplate Formatter format String of this TemplateTap object.

Returns:
the pathTemplate (type String) of this TemplateTap object.

getIdentifier

public java.lang.String getIdentifier()
Description copied from class: Tap
Method getIdentifier returns a String representing the resource this Tap instance represents.

Often, if the tap accesses a filesystem, the identifier is nothing more than the path to the file or directory. In other cases it may be a an URL or URI representing a connection string or remote resource.

Any two Tap instances having the same value for the identifier are considered equal.

Specified by:
getIdentifier in class Tap<JobConf,java.lang.Void,OutputCollector>
Returns:
String
See Also:
Tap.getIdentifier()

getOpenTapsThreshold

public int getOpenTapsThreshold()
Method getOpenTapsThreshold returns the openTapsThreshold of this TemplateTap object.

Returns:
the openTapsThreshold (type int) of this TemplateTap object.

openForWrite

public TupleEntryCollector openForWrite(FlowProcess<JobConf> flowProcess,
                                        OutputCollector output)
                                 throws java.io.IOException
Description copied from class: Tap
Method openForWrite opens the resource represented by this Tap instance.

output value may be null, if so, sub-classes must inquire with the underlying Scheme via Scheme.sinkConfInit(cascading.flow.FlowProcess, Tap, Object) to get the proper output type and instantiate it before calling super.openForWrite().

Specified by:
openForWrite in class Tap<JobConf,java.lang.Void,OutputCollector>
Returns:
TupleEntryCollector
Throws:
java.io.IOException - when

createResource

public boolean createResource(JobConf conf)
                       throws java.io.IOException
Description copied from class: Tap
Method createResource creates the underlying resource.

Specified by:
createResource in class Tap<JobConf,java.lang.Void,OutputCollector>
Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
java.io.IOException - when there is an error making directories
See Also:
Tap.createResource(Object)

deleteResource

public boolean deleteResource(JobConf conf)
                       throws java.io.IOException
Description copied from class: Tap
Method deleteResource deletes the resource represented by this instance.

Specified by:
deleteResource in class Tap<JobConf,java.lang.Void,OutputCollector>
Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
java.io.IOException - when the resource cannot be deleted
See Also:
Tap.deleteResource(Object)

commitResource

public boolean commitResource(JobConf conf)
                       throws java.io.IOException
Description copied from class: Tap
Method commitResource allows the underlying resource to be notified when all write processing is successful so that any additional cleanup or processing may be completed.

See Tap.rollbackResource(Object) to handle cleanup in the face of failures.

This method is invoked once "client side" and not in the cluster, if any.

This is an experimental API and subject to refinement!!

Overrides:
commitResource in class Tap<JobConf,java.lang.Void,OutputCollector>
Returns:
returns true if successful
Throws:
java.io.IOException

rollbackResource

public boolean rollbackResource(JobConf conf)
                         throws java.io.IOException
Description copied from class: Tap
Method rollbackResource allows the underlying resource to be notified when any write processing has failed or was stopped so that any cleanup may be started.

See Tap.commitResource(Object) to handle cleanup when the write has successfully completed.

This method is invoked once "client side" and not in the cluster, if any.

This is an experimental API and subject to refinement!!

Overrides:
rollbackResource in class Tap<JobConf,java.lang.Void,OutputCollector>
Returns:
returns true if successful
Throws:
java.io.IOException

resourceExists

public boolean resourceExists(JobConf conf)
                       throws java.io.IOException
Description copied from class: Tap
Method resourceExists returns true if the path represented by this instance exists.

Specified by:
resourceExists in class Tap<JobConf,java.lang.Void,OutputCollector>
Parameters:
conf - of type JobConf
Returns:
true if the underlying resource already exists
Throws:
java.io.IOException - when the status cannot be determined
See Also:
Tap.resourceExists(Object)

getModifiedTime

public long getModifiedTime(JobConf conf)
                     throws java.io.IOException
Description copied from class: Tap
Method getModifiedTime returns the date this resource was last modified.

Specified by:
getModifiedTime in class Tap<JobConf,java.lang.Void,OutputCollector>
Parameters:
conf - of type Config
Returns:
The date this resource was last modified.
Throws:
java.io.IOException
See Also:
Tap.getModifiedTime(Object)

equals

public boolean equals(java.lang.Object object)
Overrides:
equals in class Tap<JobConf,java.lang.Void,OutputCollector>

hashCode

public int hashCode()
Overrides:
hashCode in class Tap<JobConf,java.lang.Void,OutputCollector>

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object


Copyright © 2007-2011 Concurrent, Inc. All Rights Reserved.