Workflows as code

Use of the Python client makes it straightforward to execute workflows more complex than DAGs or simple parameter sweeps, for example situations where the results of one job (or group of jobs) need to affect the input parameters of the next job(s).

Passing input parameters and files to jobs

A basic requirement is to be able to pass input parameters or input files to jobs. This can be done programmatically in a variety of ways.

Parameters in command line arguments

For applications for which input parameters can be provided on the command line, the appropriate values just need to be specified when constructing the task, e.g.

...
task.command = '/usr/local/blender/blender -b classroom/classroom.blend -o frame_### -f %d' % frame
...

Here the variable frame can be adjusted as necessary.

Small input files

If a small input file needs to be different for each job, the appropriate input files can easily be constructed and passed to the jobs using the Python client. As an example, suppose we need to use a small input file sample.in with content as follows:

variable nsteps equal 200 # simulation steps

and we want to adjust the value 200 for different jobs. This can done using the InputFile class, e.g.

...
contents = 'variable nsteps equal %d # simulation steps' % nsteps
job.input_files.append(InputFile('sample.in', contents))
...

Large input files

In this case we need to upload the input file to object storage and configure the job to automatically retrieve it.

...
artifact = Artifact('largefile.h5')       # Specify the object name
artifact.upload('/path/to/largefile.h5')  # Specify the path to the file and upload it
job.artifacts.append(artifact)            # Add the artifact to the job
...

Retrieving output from jobs

Another basic requirement is to be able to retrieve output from jobs. Again, this can be done in several ways.

Extracting information from job standard output

In some cases there may be information we want to extract from the standard output (or error) from a job. This can be obtained easily using the stdout() (or stderr()) methods of the Job class.

Note that if the main application itself doesn’t write the required information into standard output, one option is to add another task to the job which reads an output file from the main application and writes the required information into standard output. An advantage of this is that large output files won’t need to be moved to the machine executing the Python client.

Accessing output files

Files generated by jobs which are specified as output files (i.e. uploaded to object storage) can be downloaded as files or loaded into memory. For example, assuming job here is referring to a completed job:

# Load the output file output1.txt into the variable output1
output1 = job.get_output_file('output1.txt')

# Save the output file output2.txt to a file
job.get_output_file('output2.txt', save_as='/tmp/output2.txt')

Examples