CHPC - Known issue: Slurm jobs stuck in completing (CG) status – Incident details

Experiencing partial outage

Known issue: Slurm jobs stuck in completing (CG) status

Resolved
Degraded performance
Started 9 months agoLasted 28 days

Affected

General Environment (GE)

Degraded performance from 6:00 AM to 10:02 PM

HPC clusters

Degraded performance from 6:00 AM to 10:02 PM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Identified
    Identified

    Slurm jobs are getting stuck in the completing (CG) state, which also affects the ability of new jobs to start. We are aware of this issue and working on a resolution. Jobs cannot be cleared from a node without affecting other jobs on the node, so we are waiting until running jobs finish in some instances.