|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcascading.tap.Tap<JobConf,RecordReader,OutputCollector>
cascading.tap.hadoop.Hfs
public class Hfs
Class Hfs is the base class for all Hadoop file system access. Hfs may only be used with the
HadoopFlowConnector
when creating Hadoop executable Flow
instances.
Dfs
or Lfs
for resources specific to Hadoop Distributed file system or
the Local file system, respectively.
Use the Hfs class if the 'kind' of resource is unknown at design time. To use, prefix a scheme to the 'stringPath'. Where
hdfs://...
will denote Dfs, and file://...
will denote Lfs.
Call setTemporaryDirectory(java.util.Map, String)
to use a different temporary file directory path
other than the current Hadoop default path.
By default Cascading on Hadoop will assume any source or sink Tap using the file://
URI scheme
intends to read files from the local client filesystem (for example when using the Lfs
Tap) where the Hadoop
job jar is started, Tap so will force any MapReduce jobs reading or writing to file://
resources to run in
Hadoop "local mode" so that the file can be read.
To change this behavior, setLocalModeScheme(java.util.Map, String)
to set a different scheme value,
or to "none" to disable entirely for the case the file to be read is available on every Hadoop processing node
in the exact same path.
Field Summary | |
---|---|
static java.lang.String |
LOCAL_MODE_SCHEME
Fields LOCAL_MODE_SCHEME * |
static java.lang.String |
TEMPORARY_DIRECTORY
Field TEMPORARY_DIRECTORY |
Constructor Summary | |
---|---|
protected |
Hfs()
|
|
Hfs(Fields fields,
java.lang.String stringPath)
Deprecated. |
|
Hfs(Fields fields,
java.lang.String stringPath,
boolean replace)
Deprecated. |
|
Hfs(Fields fields,
java.lang.String stringPath,
SinkMode sinkMode)
Deprecated. |
protected |
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme)
|
|
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme,
java.lang.String stringPath)
Constructor Hfs creates a new Hfs instance. |
|
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme,
java.lang.String stringPath,
boolean replace)
Deprecated. |
|
Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme,
java.lang.String stringPath,
SinkMode sinkMode)
Constructor Hfs creates a new Hfs instance. |
Method Summary | |
---|---|
boolean |
commitResource(JobConf conf)
Method commitResource allows the underlying resource to be notified when all write processing is successful so that any additional cleanup or processing may be completed. |
boolean |
createResource(JobConf conf)
Method createResource creates the underlying resource. |
boolean |
deleteResource(JobConf conf)
Method deleteResource deletes the resource represented by this instance. |
boolean |
equals(java.lang.Object object)
|
long |
getBlockSize(JobConf conf)
Method getBlockSize returns the blocksize specified by the underlying file system for this resource. |
java.lang.String[] |
getChildIdentifiers(JobConf conf)
Method getChildIdentifiers returns an array of child identifiers if this resource is a directory. |
protected FileSystem |
getDefaultFileSystem(JobConf jobConf)
|
java.net.URI |
getDefaultFileSystemURIScheme(JobConf jobConf)
Method getDefaultFileSystemURIScheme returns the URI scheme for the default Hadoop FileSystem. |
protected FileSystem |
getFileSystem(JobConf jobConf)
|
java.lang.String |
getFullIdentifier(JobConf conf)
Method getFullIdentifier returns a fully qualified resource identifier. |
java.lang.String |
getIdentifier()
Method getIdentifier returns a String representing the resource this Tap instance represents. |
protected static java.lang.String |
getLocalModeScheme(JobConf conf,
java.lang.String defaultValue)
|
long |
getModifiedTime(JobConf conf)
Method getModifiedTime returns the date this resource was last modified. |
Path |
getPath()
|
int |
getReplication(JobConf conf)
Method getReplication returns the replication specified by the underlying file system for
this resource. |
long |
getSize(JobConf conf)
Method getSize returns the size of the file referenced by this tap. |
static java.lang.String |
getTemporaryDirectory(java.util.Map<java.lang.Object,java.lang.Object> properties)
Method getTemporaryDirectory returns the configured temporary directory from the given properties object. |
static Path |
getTempPath(JobConf conf)
|
java.net.URI |
getURIScheme(JobConf jobConf)
|
int |
hashCode()
|
boolean |
isDirectory(JobConf conf)
Method isDirectory returns true if the underlying resource represents a directory or folder instead of an individual file. |
protected java.lang.String |
makeTemporaryPathDirString(java.lang.String name)
|
protected java.net.URI |
makeURIScheme(JobConf jobConf)
|
TupleEntryIterator |
openForRead(FlowProcess<JobConf> flowProcess,
RecordReader input)
Method openForRead opens the resource represented by this Tap instance. |
TupleEntryCollector |
openForWrite(FlowProcess<JobConf> flowProcess,
OutputCollector output)
Method openForWrite opens the resource represented by this Tap instance. |
boolean |
resourceExists(JobConf conf)
Method resourceExists returns true if the path represented by this instance exists. |
static void |
setLocalModeScheme(java.util.Map<java.lang.Object,java.lang.Object> properties,
java.lang.String scheme)
Method setLocalModeScheme provides a means to change the scheme value used to detect when a MapReduce job should be run in Hadoop local mode. |
protected void |
setStringPath(java.lang.String stringPath)
|
static void |
setTemporaryDirectory(java.util.Map<java.lang.Object,java.lang.Object> properties,
java.lang.String tempDir)
Method setTemporaryDirectory sets the temporary directory on the given properties object. |
protected void |
setUriScheme(java.net.URI uriScheme)
|
void |
sinkConfInit(FlowProcess<JobConf> process,
JobConf conf)
Method sinkInit initializes this instance as a sink. |
void |
sourceConfInit(FlowProcess<JobConf> process,
JobConf conf)
Method sourceInit initializes this instance as a source. |
java.lang.String |
toString()
|
Methods inherited from class cascading.tap.Tap |
---|
flowConfInit, getConfigDef, getScheme, getSinkFields, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasProcessConfigDef, isEquivalentTo, isKeep, isReplace, isSink, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveFields, resolveIncomingOperationFields, retrieveSinkFields, retrieveSourceFields, rollbackResource, setScheme, taps |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String TEMPORARY_DIRECTORY
public static final java.lang.String LOCAL_MODE_SCHEME
Constructor Detail |
---|
protected Hfs()
@ConstructorProperties(value="scheme") protected Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme)
@Deprecated @ConstructorProperties(value={"fields","stringPath"}) public Hfs(Fields fields, java.lang.String stringPath)
fields
- of type FieldsstringPath
- of type String@Deprecated @ConstructorProperties(value={"fields","stringPath","replace"}) public Hfs(Fields fields, java.lang.String stringPath, boolean replace)
fields
- of type FieldsstringPath
- of type Stringreplace
- of type boolean@Deprecated @ConstructorProperties(value={"fields","stringPath","sinkMode"}) public Hfs(Fields fields, java.lang.String stringPath, SinkMode sinkMode)
fields
- of type FieldsstringPath
- of type StringsinkMode
- of type SinkMode@ConstructorProperties(value={"scheme","stringPath"}) public Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme, java.lang.String stringPath)
scheme
- of type SchemestringPath
- of type String@Deprecated @ConstructorProperties(value={"scheme","stringPath","replace"}) public Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme, java.lang.String stringPath, boolean replace)
scheme
- of type SchemestringPath
- of type Stringreplace
- of type boolean@ConstructorProperties(value={"scheme","stringPath","sinkMode"}) public Hfs(Scheme<JobConf,RecordReader,OutputCollector,?,?> scheme, java.lang.String stringPath, SinkMode sinkMode)
scheme
- of type SchemestringPath
- of type StringsinkMode
- of type SinkModeMethod Detail |
---|
public static void setTemporaryDirectory(java.util.Map<java.lang.Object,java.lang.Object> properties, java.lang.String tempDir)
properties
- of type Mappublic static java.lang.String getTemporaryDirectory(java.util.Map<java.lang.Object,java.lang.Object> properties)
properties
- of type Mappublic static void setLocalModeScheme(java.util.Map<java.lang.Object,java.lang.Object> properties, java.lang.String scheme)
"file"
, set to
"none"
to disable entirely.
properties
- of tyep Mapprotected static java.lang.String getLocalModeScheme(JobConf conf, java.lang.String defaultValue)
protected void setStringPath(java.lang.String stringPath)
protected void setUriScheme(java.net.URI uriScheme)
public java.net.URI getURIScheme(JobConf jobConf)
protected java.net.URI makeURIScheme(JobConf jobConf)
public java.net.URI getDefaultFileSystemURIScheme(JobConf jobConf)
jobConf
- of type JobConf
protected FileSystem getDefaultFileSystem(JobConf jobConf)
protected FileSystem getFileSystem(JobConf jobConf)
public java.lang.String getIdentifier()
Tap
getIdentifier
in class Tap<JobConf,RecordReader,OutputCollector>
public Path getPath()
public java.lang.String getFullIdentifier(JobConf conf)
Tap
getFullIdentifier
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type Config
public void sourceConfInit(FlowProcess<JobConf> process, JobConf conf)
Tap
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
sourceConfInit
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type JobConf @throws IOException on resource initialization failure.public void sinkConfInit(FlowProcess<JobConf> process, JobConf conf)
Tap
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
sinkConfInit
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type JobConf @throws IOException on resource initialization failure.public TupleEntryIterator openForRead(FlowProcess<JobConf> flowProcess, RecordReader input) throws java.io.IOException
Tap
input
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sourceConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
input type and instantiate it before calling super.openForRead()
.
Note the returned iterator will return the same instance of TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.
openForRead
in class Tap<JobConf,RecordReader,OutputCollector>
java.io.IOException
public TupleEntryCollector openForWrite(FlowProcess<JobConf> flowProcess, OutputCollector output) throws java.io.IOException
Tap
output
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sinkConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
output type and instantiate it before calling super.openForWrite()
.
openForWrite
in class Tap<JobConf,RecordReader,OutputCollector>
java.io.IOException
- whenpublic boolean createResource(JobConf conf) throws java.io.IOException
Tap
createResource
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type JobConf
java.io.IOException
- when there is an error making directoriespublic boolean deleteResource(JobConf conf) throws java.io.IOException
Tap
deleteResource
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type JobConf
java.io.IOException
- when the resource cannot be deletedpublic boolean commitResource(JobConf conf) throws java.io.IOException
Tap
Tap.rollbackResource(Object)
to handle cleanup in the face of failures.
This method is invoked once "client side" and not in the cluster, if any.
commitResource
in class Tap<JobConf,RecordReader,OutputCollector>
java.io.IOException
public boolean resourceExists(JobConf conf) throws java.io.IOException
Tap
resourceExists
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type JobConf
java.io.IOException
- when the status cannot be determinedpublic boolean isDirectory(JobConf conf) throws java.io.IOException
conf
- of JobConf
java.io.IOException
- whenpublic long getSize(JobConf conf) throws java.io.IOException
conf
- of type Properties
java.io.IOException
public long getBlockSize(JobConf conf) throws java.io.IOException
blocksize
specified by the underlying file system for this resource.
conf
- of JobConf
java.io.IOException
- whenpublic int getReplication(JobConf conf) throws java.io.IOException
replication
specified by the underlying file system for
this resource.
conf
- of JobConf
java.io.IOException
- whenpublic java.lang.String[] getChildIdentifiers(JobConf conf) throws java.io.IOException
_log
).
conf
- of JobConf
java.io.IOException
- whenpublic long getModifiedTime(JobConf conf) throws java.io.IOException
Tap
getModifiedTime
in class Tap<JobConf,RecordReader,OutputCollector>
conf
- of type Config
java.io.IOException
public static Path getTempPath(JobConf conf)
protected java.lang.String makeTemporaryPathDirString(java.lang.String name)
public java.lang.String toString()
toString
in class java.lang.Object
public boolean equals(java.lang.Object object)
equals
in class Tap<JobConf,RecordReader,OutputCollector>
public int hashCode()
hashCode
in class Tap<JobConf,RecordReader,OutputCollector>
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |