Checkpointing


Message boards : Unix/Linux : Checkpointing

Message board moderation

To post messages, you must log in.
AuthorMessage
MarkJ
Avatar

Send message
Joined: 27 Jun 12
Posts: 129
Credit: 62,715,476
RAC: 81
Message 292 - Posted: 7 Oct 2012, 0:42:54 UTC

Last modified: 7 Oct 2012, 0:43:52 UTC
I had a machine crash. When I restarted the VM it had a task that had an elapsed time of 30 minutes however progress was at 0.000%. That suggests it didn't resume from its checkpoint. Does the app even do checkpoints and resume from them?

Seeing as it wasn't moving after I tried a suspend/resume and waiting 10 minutes I decided to abort it. Link to task here
BOINC blog
ID: 292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 294 - Posted: 7 Oct 2012, 10:42:50 UTC
Checkpointing should work fine, I tried many various situations when computing somehow crashed and it worked. There might somehow corrupt some files or you had some kind of crash I couldn't simulate. If this has happned only once there should be no worry about it. Sometimes any problem can happened.
ID: 294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Amauri

Send message
Joined: 21 Jun 12
Posts: 13
Credit: 1,456,117
RAC: 1,197
Message 295 - Posted: 7 Oct 2012, 22:45:41 UTC - in response to Message 294.  

Last modified: 7 Oct 2012, 22:56:05 UTC
Checkpoints are not frequent at start, take more than 10 minutes, so there's nothing to worry about, just leave the task running.
ID: 295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Unix/Linux : Checkpointing