Job factories

The following types of job factories are available:

  • parameter sweep: a set of jobs is created by varying one or more parameters through a range of values
  • zip: a set of jobs is created from multiple lists, where the i-th job contains the i-th element from each list
  • repeat: runs the same job multiple times

In all cases a set of jobs is created by substituting a range of values into a template job. Substitutions can be made in:

  • The command to be executed
  • Environment variables
  • Artifacts
  • Output filenames or directories

If you want to carry out a parameter study using parameters generated externally, zip is the most appropriate factory type.

When a workflow using job factory is submitted to PROMINENCE individual jobs will automatically be created. The job names will be of the form <workflow name>/<job name>/<id> where <id> is an integer.

Note: Not all jobs will be created immediately as there is a limit to the number of idle jobs that can exist for any individual workflow. The remaining jobs will be created as the idle jobs start running.

Parameter sweep

In this case numeric values are generated from start and end points in addition to an increment provided by the user.

Here is an example fragment which would need to be included in a workflow description:

"factories": [
  {
    "type": "parameterSweep",
    "name": "sweep",
    "jobs": [
      "<job-name>"
    ],
    "parameters": [
      {
        "name": "frame",
        "start": 1,
        "end": 4,
        "step": 1
      }
    ]
  }
]

Here we specify the factory to be of type parameterSweep. The range of values used to create the jobs is defined in parameters. The name of the parameter is given by name. In this example the parameter frame is varied between the value start and at most end in increments of step. In the list jobs the name of the jobs to apply the factory to are listed.

Jobs can obtain the value of the parameter through the use of substitutions or environment variables. If a job’s command was to include $frame or ${frame}, this would be substituted by the appropriate value. An environment variable PROMINENCE_PARAMETER_frame would also be available to the job containing this value.

Additional parameters can be included in order to carry out multi-dimensional parameter sweeps. For example:

"factories": [
  {
    "type": "parameterSweep",
    "name": "sweep",
    "jobs": [
      "<job-name>"
    ],
    "parameters":[
      {
        "name": "x",
        "start": 1,
        "end": 4,
        "step": 1
      },
      {
        "name": "y",
        "start": 2,
        "end": 5,
        "step": 1
      },
      {
        "name": "z",
        "start": 3,
        "end": 6,
        "step": 0.5
      }
   ]
  }
]

For a multi-dimensional parameter sweep the name of each parameter set must be unique.

Here is simple complete example of a 1D parameter sweep job:

{
  "name": "ps-workflow",
  "jobs": [
    {
      "resources": {
        "nodes": 1,
        "cpus": 1,
        "memory": 1,
        "disk": 10
      },
      "tasks": [
        {
          "image": "busybox",
          "runtime": "singularity",
          "cmd": "echo $frame"
        }
      ],
      "name": "render"
    }
  ],
  "factories": [
    {
      "name": "render-frames",
      "type": "parameterSweep",
      "jobs": [
        "render"
      ],
      "parameters":[
        {
          "name": "frame",
          "start": 1,
          "end": 4,
          "step": 1
        }
     ]
    }
  ]
}

See here for a visualisation of the above JSON.

When running prominence describe for a job generated by a job factoy workflow there will be a parameters section in the JSON job description specifying the values of the parameters used for that particular job, e.g.

"parameters": {
  "x1": 0.4,
  "x2": 0.6
}

If you wish to explicitly specify each value to be used, rather than specifying start and end values and a step, use a zip (described below) rather than a parameter sweep.

Zip

A set of jobs is created by substituting a range of values into a template job. The values to be used are specified in the form of lists. If multiple parameters are provided, the i-th job is provided with the i-th element from each list. The name comes from Python’s zip function.

Here’s an example fragment which would need to be included in a workflow description:

"factories": [
  {
    "name": "example",
    "jobs": [
      "<job-name>"
    ],
    "type": "zip",
    "parameters":[
      {
        "name": "start_value",
        "values": [
          0, 1, 2, 3
        ]
      },
      {
        "name": "end_value",
        "values": [
          8, 9, 10, 11
        ]
      }
   ]
  }
]

Here we specify the factory to be of type zip. The range of values used to create the jobs is defined in parameters. The name of each parameter is given by name and a list of values for each parameter is provided. In this example 4 jobs would be created, with:

  • start_value = 0, end_value = 8
  • start_value = 1, end_value = 9
  • start_value = 2, end_value = 10
  • start_value = 3, end_value = 11

Repeat

A set of identical jobs is created. The parameter number specifies the number of jobs to create from the template. Example:

"factories": [
  {
    "name": "example",
    "jobs": [
      "<job-name>"
    ],
    "type": "repeat",
    "number": 10
  }
]

Here 10 instances of the job with name example will be created.