File Transfer
- Typical data flow for a file-based application
- The File Transfer Service
- Code example: Upload of specified files from local drive
- Code example: How to get a Shared Access Signature token
- Uploading from a local directory
- Uploading to an Azure blob container
- Selecting Files for Transfer
- File Transfer Events
- ShouldOverwrite Callback Function
- Upload Retry Mechanism
- Defining file transfers for a WorkUnit
- Worker-controlled file transfer
Typical data flow for a file-based application
A typical data flow for a file-based application using OneCompute is:
- The client application uploads input files to a blob container
- The client application creates and submits a job
- The OneCompute Worker Host retrieves the input files from the blob container and invokes the application Worker
- The application Worker completes the analysis
- The OneCompute Worker Host uploads output files to a blob container
- The client application retrieves the result files from the blob container
OneCompute provides support for all of the above steps in various ways. It is important to notice that the client application is in full control of this data flow:
- It selects which input files should be available for each WorkUnit. The OneCompute Worker Host will download these files before the Worker is invoked.
- It selects which output files to be uploaded to the blob container upon completion of each WorkUnit. The OneCompute Worker Host will upload these files after the Worker completes execution.
The File Transfer Service
The file transfer service is designed to support the transfer of files between a local file system and remote file storage. More specifically, when running in Azure, the remote file storage will be Azure Blob storage. Azure Blob storage is a service for storing large amounts of unstructured object data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. OneCompute's BlobStorageFileTransferService implements the IFileTransferService interface and provides the ability to upload files to Azure Blob storage.
Code example: Upload of specified files from local drive
Below is an example of uploading image files to Azure Blob storage using the BlobStorageTransferService. It gathers all JPEG image files from the Public\Pictures folder on a user's local drive and uploads them to the Images directory inside a Blob container.
public async Task UploadJpegFilesAsync()
{
var sourceDirectory = @"C:\Users\Public\Pictures";
// Get a list of JPEG images from the source directory.
var sourceFiles = Directory.GetFiles(sourceDirectory, "*.jpg").ToList();
// Define the destination in Blob storage to copy the image files to.
var blobContainerDestinationSpecification = new BlobDirectorySpecification
{
// URL of the container within Azure Blob Storage, including a
// Shared Access Signature (SAS) granting write permissions.
ContainerUrl = GetContainerUrl(),
Directory = "Images"
};
// Create a file transfer specification with file system source and Blob storage destination.
var fileTransferSpecification = new FileTransferSpecification
{
SourceSpecification = new FileSystemDirectorySpecification
{
Directory = sourceDirectory
},
SelectedFiles = sourceFiles,
DestinationSpecification = blobContainerDestinationSpecification
};
// Create a file transfer specification from the file transfer specification.
var fileTransferSpecifications = new [] { fileTransferSpecification };
// Create the file transfer service and upload the files.
var fileTransferService = new BlobStorageFileTransferService();
await fileTransferService.UploadFilesAsync(fileTransferSpecifications);
}
Code example: How to get a Shared Access Signature (SAS) token
The URL for the container consists of a Storage Resource URI and a Shared Access Signature (SAS) token. BlobStorageHelper can be used to generate the SAS token and append it to the Storage Resource URI or this can be done explicitly, as shown below.
public string GetContainerUrl()
{
// Create a blob client, with a blob service endpoint Uri.
var blobClient = new CloudBlobClient(new Uri("MyBaseUri"));
// Get the cloud blob container from the container name.
var container = blobClient.GetContainerReference("MyContainerName");
// Create the SAS constraints, with an expiry time and permissions.
var sasConstraints = new SharedAccessBlobPolicy
{
SharedAccessExpiryTime = DateTime.UtcNow.AddDays(1),
Permissions = SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.Create
};
// Generate a signature and return the Url.
var sasBlobToken = container.GetSharedAccessSignature(sasConstraints);
return $"{container.Uri.AbsoluteUri}{sasBlobToken}";
}
Uploading from a local directory
To upload from a local directory, a FileSystemDirectorySpecification should be assigned to the FileTransferSpecification.SourceSpecification property:
// Create a data transfer specification with source files and Blob storage destination.
var dataTransferSpecification = new FileTransferSpecification
{
SourceSpecification = new FileSystemDirectorySpecification
{
Directory = sourceDirectory
},
SelectedFiles = sourceFiles,
Destination = ...
};
The FileSystemDirectorySpecification.Directory is used to determine the location of the file at the destination. For instance, if the source file is located in a directory below the specified Directory, the file will be transferred to a location with the same relative location at the destination.
Uploading to an Azure blob container
To upload to an Azure Blob container, a BlobDirectorySpecification should be assigned to the FileTransferSpecification.Destination property.
// Define the destination in Blob storage to copy the image files to.
var blobContainerDestinationSpecification = new BlobDirectorySpecification
{
// URL of the container within Azure Blob Storage, including a
// Shared Access Signature (SAS) granting write permissions.
ContainerUrl = GetContainerUrl(),
Directory = "Images"
};
// Create a file transfer specification with source files and Blob storage destination.
var dataTransferSpecification = new FileTransferSpecification
{
SourceSpecification = new FileSystemDirectorySpecification
{
Directory = sourceDirectory
},
SelectedFiles = sourceFiles,
DestinationSpecification = blobContainerDestinationSpecification
};
Selecting Files for Transfer
File Globbing patterns can be used to specify which files should and should not be transferred. The FileTransferSpecificationhas two properties are used to determine which files to transfer:
FileTransferSpecification.SelectedFiles: Which files to include in file transfer
FileTransferSpecification.ExcludedFiles: which files to exclude from file transfer.
Both properties are lists of strings that can either take specific file paths or paths containing patterns.
File Transfer Events
The IFileTransferService interface prescribes the following events:
- FileTransferred: Occurs when a file was successfully transferred
- FileFailed: Occurs when the transfer of a file failed
- FileSkipped: Occurs when the transfer of a file is skipped.
All events are of type EventHandler<DataTransferEventArgs>.
ShouldOverwrite Callback Function
The IFileTransferService interface enables the calling code to set the ShouldOverwrite callback function. This function enables the calling code to control which files should be overwritten during file transfer. By default, all files are overwritten, but by overriding this method, it is possible to control when to overwrite files during file transfer.
Upload Retry Mechanism
The BlobStorageFileTransferService has a retry mechanism that by default is configured to retry file transfer failures a specified number of times. The following properties are used to control the file transfer:
- MaxNumberOfRequestRetries: Controls the maximum number of retries at the lowest level of transfer (the HTTP level).
- MaxNumberOfUploadRetries: Controls the maximum number of retries at a higher level. When the file transfer fails at the lowest transfer level, the transfer will be restarted, after a short delay,from the latest checkpoint, which is the latest byte of the file that was successfully transferred.
Defining file transfers for a WorkUnit
Transfers of input and output files for a WorkUnit can be defined by the client application. This enables all input files to be transferred from blob storage to the local file system of the compute node before the application Worker executes and transfer of selected output files from the local file system of the compute node to blob storage after the Worker has completed execution. This reduces the responsibility of the Worker, simplifies implementation of the Worker and leaves the client application in full control of the data flow.
Defining transfer of input files for the WorkUnit
The WorkUnit.InputFileSpecifications property can be used by the client application to define which input files should be transferred from a blob container to the working directory of the Worker before the processing of the WorkUnit starts. See the code examples and Selecting Files for Transfer section above for more information.
Defining transfer of input files for the WorkUnit using resource files
An alternative way of specifying input files for a WorkUnit is to list each input file as an Azure Batch Resource File. The WorkUnit.InputFiles property takes a list of resource file URIs. The following code example shows how to construct a list of resource file URIs from a list of blob names (relative paths of files within a container):
IEnumerable<string> GetResourceFiles(CloudBlobContainer, IEnumerable<string> blobNames)
{
// Create the SAS constraints, with an expiry time and permissions.
var sasConstraints = new SharedAccessBlobPolicy
{
SharedAccessExpiryTime = DateTime.UtcNow.AddDays(1),
Permissions = SharedAccessBlobPermissions.Read
};
foreach(var blobName in blobNames)
{
// Generate a signature and return the Url.
var blob = container.GetBlockBlobReference(blobName);
var sasBlobToken = blob.GetSharedAccessSignature(sasConstraints);
yield return container.Uri + sasBlobToken;
}
}
This method can be used to set the InputFiles property on a WorkUnit:
IEnumerable<string> blobNames = .... // The list of blobs to retrieve
var wu = new WorkUnit
{
InputFiles = GetResourceFiles(blobContainer, blobNames).ToList()
};
Defining transfer of output files for the WorkUnit
The WorkUnit.OutputFileSpecifications property can be used by the client application to define which output files should be transferred to a blob container from the working directory of the Worker after the processing of the WorkUnit has completed. See the code examples and Selecting Files for Transfer section above for more information.
Worker-controlled file transfer
The alternative to letting the client application specify the files to be transferred for each work unit is letting file transfers be handled by the application Worker. To support worker-controlled file transfers, the following must be done:
- The client application must transfer the BLOB container URI (with SAS token) to the worker, e.g. in a WorkUnit property
- The application Worker must support a constructor that allows injection of the IFileTransferService interface to the worker
Transfer the BLOB container URI in a work unit property
The following application client code illustrates how to retrieve the container URI and transfer it in a work unit property:
// Set the container URI in an application specific property:
WorkUnit wu;
... // Application specific work unit initialization
wu["MyContainerUri"] = GetContainerUri(); // Retrieve the container URI and set in a work unit property
This enables the application worker to retrieve the container URI from the work unit and use it to download input files from the container and/or upload output files to the container.
Implement an importing Worker constructor
The following code illustrates how to implement an importing constructor that supports injection of the IFileTransferService interface to the application worker:
public class MyApplicationWorker : IWorker
{
IFileTransferService fileTransferService;
[System.Composition.ImportingConstructor]
public MyApplicationWorker(IFileTransferService fts)
{
this.fileTransferService = fts;
Doing file transfer in the Worker
The following worker code illustrates how to control file transfer of result files from the application worker
public async Task<object> ExecuteAsync(IWorkerExecutionStatusNotificationService statusNotification, IWorkUnit workUnit, IEnumerable<Result> dependencyResults)
{
// Do all the work
.....
// Transfer result files
string containerUri = workUnit["MyContainerUri"];
// Define the destination in Blob storage to copy the image files to.
var blobContainerDestinationSpecification = new BlobDirectorySpecification
{
// URL of the container within Azure Blob Storage, including a
// Shared Access Signature (SAS) granting write permissions.
ContainerUrl = containerUri,
Directory = "ResultFiles"
};
// Create a file transfer specification with source files and Blob storage destination.
var dataTransferSpecification = new FileTransferSpecification
{
SourceSpecification = new FileSystemDirectorySpecification
{
Directory = sourceDirectory
},
SelectedFiles = sourceFiles,
DestinationSpecification = blobContainerDestinationSpecification
};
await this.fileTransferService.UploadFilesAsync(new []{fileTransferSpecifications});
See the preceding paragraphs for more information on how to use the file transfer service.