CHPC - Known issue: Many notchpeak nodes are currently offline due to an issue with power distribution units – Incident details

All systems operational

Known issue: Many notchpeak nodes are currently offline due to an issue with power distribution units

Resolved
Partial outage 15 %
Started about 2 months agoLasted about 1 hour

Affected

HPC

Partial outage from 4:55 PM to 6:06 PM

Updates
  • Resolved
    Resolved

    This incident has been resolved. All nodes have returned to service.

  • Monitoring
    Monitoring

    Most nodes have returned to service. Some nodes have long boot times and are not in service yet, though they should be soon if there are no further issues.

  • Identified
    Identified

    Issue is related to power distribution units that support the racks with nodes that are currently offline. Staff are addressing this issue.

  • Investigating
    Investigating
    We are currently investigating this incident.