There was some service disruption last night and this afternoon with both the UV1's cosmos/universe. This is apparently due to some users jobs not dying properly and turning into zombie processes that slow down/crash the system.

*NB* The DiRAC system Cosmos2 has been unaffected by this.


This has been opened as a case with our scheduler vendor and we should hopefully find out what is going wrong and how to prevent it in future.

If you have trouble sshing to either cosmos then please try universe (and vice versa).

Sorry to those whose jobs were disrupted.





** 13/1/14 **

This happened again on Saturday. Investigating today on universe. Universe will be unavailable today. Please login via cosmos instead.

** update **

Universe is now open again. Cause of the problem still unclear.