CHPC - Issues with Open OnDemand jobs not starting – Incident details

Experiencing partial outage

Issues with Open OnDemand jobs not starting

Resolved
Major outage
Started 3 days agoLasted about 12 hours

Affected

General Environment (GE)

Major outage from 6:30 AM to 6:07 PM

Open OnDemand

Major outage from 6:30 AM to 6:07 PM

Updates
  • Resolved
    Resolved

    CHPC staff have resolved the issue with Open OnDemand jobs. The cause was many jobs with high input and output load. This affected the time required to load the environment; many jobs were taking a significant amount of time to start, and many jobs started with incomplete environments, causing immediate job failures.

  • Investigating
    Investigating

    The CHPC is aware of issues with Open OnDemand (OOD) jobs remaining in a "Starting" state without providing an option to connect. Users have reported issues with VS Code Server, RStudio Server, and ParaView, and this issue likely affects other OOD applications. This was first observed late in the evening on February 2 and persists into the morning of February 3.