Skip to main content Link Search Menu Expand Document (external link)

Job status

Listing jobs

The list command by default lists any active jobs, i.e. jobs which are idle or running:

prominence list

output

ID     STATUS   IMAGE                       CMD     ARGS
3101   idle     alahiff/testpi
3103   idle     alahiff/cherab-jet:latest   python  batch_make_sensitivity_matrix.py 0 59
3104   idle     ikester/blender:latest      blender -b classroom/classroom.blend -o frame_### -f 1

It’s also possible to request a list of jobs using a constraint on the labels associated with each job. For example, if you submitted a group of jobs with a label name=run5, the following would list all such jobs:

prominence list --all --constraint name=run5

Here the --all option means that both active (i.e. idle or running) and completed jobs will be listed.

Describing a job

To get more information about an individual job, use the describe command, for example:

prominence describe 345

output

[
  {
    "id": 345,
    "status": "created",
    "resources": {
      "cpus": 1,
      "disk": 10,
      "memory": 1,
      "nodes": 1,
      "walltime": 720
    },
    "tasks": [
      {
        "image": "alahiff/testpi",
        "runtime": "singularity"
      }
    ],
    "events": {
      "createTime": "2019-06-18T10:16:36"
    }
  }
]

To show information about completed jobs, both the list and describe commands accept a --completed option. For example, to list the last 2 completed jobs:

prominence list --completed --last 2

output

ID     STATUS      IMAGE                       CMD          ARGS
2980   completed   alahiff/tensorflow:1.11.0   python       models-1.11/official/mnist/mnist.py --export_dir mnist_saved_model
2982   completed   alahiff/tensorflow:1.11.0   python       models-1.11/official/mnist/mnist.py --export_dir mnist_saved_model

Note that jobs which are completed or have been removed for some reason may be visible briefly without using the --completed option.

Completed jobs

The JSON descriptions of completed jobs contain additional information. This may include:

  • status: current job status (idle, running, completed, failed, deleted, or killed)
  • statusReason: for jobs in a terminal state other than the completed state this may give a reason for the current status.
  • createTime: date & time when the job was created by the user.
  • startTime: date & time when the job started running.
  • endTime: date & time when the job ended.
  • site: the site where the job was executed.
  • maxMemoryUsageKB: the maximum total memory usage of the job, summed over all processes (note this is not available for jobs running on remote HTC or HPC resources)
  • retries: the number of job retries attempted.
  • provisionedResources: the number of CPU cores, memory (in GB), disk (in GB) and number of nodes provisioned for the job.
  • cpu: details of the CPU used to run the job (clock, model, and vendor).
  • runtimeVersion: the versions of singularity and udocker (in the form <version>/<tarball_release>) used.

The following information is also provided for each task:

  • retries: the number of task retries attempted.
  • exitCode: the exit code returned by the user’s job. This would usually would be 0 for success.
  • imagePullTime: time taken to pull the container image. If a cached image from a previous task was used this will be -1.
  • imagePullStatus: image pull status, e.g. completed.
  • imageSha256: SHA256 sum of the container image.
  • wallTimeUsage: wall time used by the task.
  • cpuTimeUsage: CPU time usage by the task. For a task using multiple CPUs this will be larger than the wall time.
  • maxResidentSetSizeKB: maximum resident size (in KB) of the largest process.
  • stageInTime: total time to stage-in any files.
  • stageOutTime: total time to stage-out any files.

For example:

{
  "id": 61,
  "status": "completed",
  "resources": {
    "cpus": 1,
    "disk": 10,
    "memory": 1,
    "nodes": 1
  },
  "tasks": [
    {
      "image": "eoscprominence/testpi",
      "runtime": "singularity"
    }
  ],
  "events": {
    "createTime": "2022-04-28 20:03:06",
    "startTime": "2022-04-28 20:04:06",
    "endTime": "2022-04-28 20:04:12"
  },
  "execution": {
    "site": "SLURM-CSD3",
    "provisionedResources": {
      "cpus": 1,
      "disk": 154,
      "memory": 1,
      "nodes": 1
    },
    "cpu": {
      "clock": "3039.367",
      "model": "Intel(R) Xeon(R) Platinum 8276 CPU @ 2.20GHz",
      "vendor": "GenuineIntel"
    },
    "runtimeVersion": {
      "singularity": "3.8.7-1.el7"
    },
    "tasks": [
      {
        "exitCode": 0,
        "retries": 0,
        "imagePullTime": 1.424,
        "imageSha256": "c694b456f242e484753a83107e47ca29d5a08e784c45f4225b1758713ad2e236",
        "wallTimeUsage": 0.3726,
        "cpuTimeUsage": 0.3326,
        "maxResidentSetSizeKB": 38384
      }
    ]
  }
}