Complex workflows
Multiple job factories
It is possible to define a single workflow including multiple job factories applied to different jobs. For example:
{
"name": "multiple-job-factories",
"jobs": [
{
"resources": {
"nodes": 1,
"cpus": 1,
"memory": 1,
"disk": 10
},
"tasks": [
{
"image": "busybox",
"runtime": "singularity",
"cmd": "echo A $frame"
}
],
"name": "renderA"
},
{
"resources": {
"nodes": 1,
"cpus": 1,
"memory": 1,
"disk": 10
},
"tasks": [
{
"image": "busybox",
"runtime": "singularity",
"cmd": "echo B $frame"
}
],
"name": "renderB"
}
],
"factories": [
{
"name": "render-frames-A",
"type": "parameterSweep",
"jobs": [
"renderA"
],
"parameters":[
{
"name": "frame",
"start": 1,
"end": 4,
"step": 1
}
]
},
{
"name": "render-frames-B",
"type": "parameterSweep",
"jobs": [
"renderB"
],
"parameters":[
{
"name": "frame",
"start": 1,
"end": 3,
"step": 1
}
]
}
]
}
When this workflow is submitted we can see the different job factories have been applied to the appropriate jobs:
$ prominence list
ID NAME CREATED STATUS ELAPSED IMAGE CMD
53242 multiple-job-factories/renderA/0 2021-10-02 07:27:50 idle busybox echo A $frame
53243 multiple-job-factories/renderA/1 2021-10-02 07:27:50 idle busybox echo A $frame
53244 multiple-job-factories/renderA/2 2021-10-02 07:27:52 idle busybox echo A $frame
53245 multiple-job-factories/renderA/3 2021-10-02 07:27:52 idle busybox echo A $frame
53246 multiple-job-factories/renderB/0 2021-10-02 07:27:55 idle busybox echo B $frame
53247 multiple-job-factories/renderB/1 2021-10-02 07:27:55 idle busybox echo B $frame
A single workflow can of course contain different types of job factories.
Combining job factories and DAGs
It is possible to define a workflow involving both job factories and dependencies between jobs, for example:
In the example job description below we use a job factory to run 3 process
jobs, then once these have completed a merge
job is run.
{
"name": "factory-dag-workflow",
"jobs": [
{
"resources": {
"nodes": 1,
"cpus": 1,
"memory": 1,
"disk": 10
},
"tasks": [
{
"image": "busybox",
"runtime": "singularity",
"cmd": "echo $id"
}
],
"name": "process"
},
{
"resources": {
"nodes": 1,
"cpus": 1,
"memory": 1,
"disk": 10
},
"tasks": [
{
"image": "busybox",
"runtime": "singularity",
"cmd": "echo merge"
}
],
"name": "merge"
}
],
"factories": [
{
"name": "processing",
"type": "parameterSweep",
"jobs": [
"process"
],
"parameters":[
{
"name": "id",
"start": 1,
"end": 3,
"step": 1
}
]
}
],
"dependencies": {
"process": ["merge"]
}
}
In the dependencies
section of the job description, for each parent job we list children. So in this case jobs with name process
are run before the job with name merge
. As expected, we see that initially only the process
jobs are created and start running:
$ prominence list
ID NAME CREATED STATUS ELAPSED IMAGE CMD
53250 factory-dag-workflow/process/0 2021-10-02 07:34:24 running 0+00:00:01 busybox echo $id
53251 factory-dag-workflow/process/1 2021-10-02 07:34:24 running 0+00:00:01 busybox echo $id
53252 factory-dag-workflow/process/2 2021-10-02 07:34:26 running 0+00:00:01 busybox echo $id
Once all these have completed any dependent jobs will start, in this case the merge
job.