CHPC - Issues with Protected Environment resources – Incident details

All systems operational

Issues with Protected Environment resources

Resolved
Major outage
Started 22 days agoLasted about 7 hours

Affected

Protected Environment (PE)

Degraded performance from 2:30 PM to 5:22 PM, Major outage from 5:22 PM to 7:07 PM, Degraded performance from 7:07 PM to 9:10 PM

HPC clusters

Degraded performance from 2:30 PM to 5:22 PM, Major outage from 5:22 PM to 7:07 PM, Degraded performance from 7:07 PM to 9:10 PM

Open OnDemand

Degraded performance from 2:30 PM to 5:22 PM, Major outage from 5:22 PM to 7:07 PM, Degraded performance from 7:07 PM to 9:10 PM

Windows servers

Degraded performance from 2:30 PM to 5:22 PM, Major outage from 5:22 PM to 7:07 PM, Degraded performance from 7:07 PM to 9:10 PM

Virtual machines (VMs)

Degraded performance from 2:30 PM to 5:22 PM, Major outage from 5:22 PM to 7:07 PM, Degraded performance from 7:07 PM to 9:10 PM

Computational servers, independent of clusters

Degraded performance from 2:30 PM to 5:22 PM, Major outage from 5:22 PM to 7:07 PM, Degraded performance from 7:07 PM to 9:10 PM

Updates
  • Resolved
    Resolved

    CHPC staff have determined that the issue with the Protected Environment has been resolved. Systems have remained accessible since the update earlier this afternoon. If you continue to encounter issues with a resource in the Protected Environment, please contact the CHPC at helpdesk@chpc.utah.edu. Thank you for your patience.

  • Update
    Update

    While we have identified the cause of the problems, and most services in the PE are available, individual systems may still have lingering problems due to the previous network loss to the file systems. If your particular service has issues, please, contact helpdesk@chpc.utah.edu.

  • Monitoring
    Monitoring

    Issues in the Protected Environment are attributable to packet loss. CHPC staff have stopped replication between the General Environment VAST and Protected Environment VAST, which significantly reduced the packet loss. Services appear to be responsive again. Logins and services in the PE should begin working. CHPC staff will continue to monitor the situation.

  • Update
    Update

    CHPC staff are continuing to investigate the issue. Based on user reports, this incident's impact is being updated to an outage rather than degraded performance.

  • Investigating
    Investigating

    The CHPC is aware of issues with the Protected Environment, leading to degraded performance. Staff are working to identify and correct the issue. We will provide updates as we learn more.