Posts by rhb
1)
(Message 2183)
Posted 13 Dec 2013 by rhb Post: I would recommend you check the manufacturer's specs for your specific cpu model. I can assure you that 35 is generally quite low for almost all devices. Common running temperatures are in the range 45 - 65 deg C, and many (not all) can run reliably up to 90 C. Generally a component will start to malfunction long before it will be permanently damaged by heat. On the other hand, all the chips in your computer may wear out somewhat sooner if they run very hot -- but in today's world it is hard to wear them out before they become obsolete. |
2)
(Message 335)
Posted 26 Oct 2012 by rhb Post: Other tasks are running fine. I think the errant task was likely not complete, but failed during processing, because the time is significantly shorter than any of the other tasks. I have no idea why no files or memory map showed up, but that must have been a false appearance anyway because it printed to stderr when aborted. |
3)
(Message 334)
Posted 25 Oct 2012 by rhb Post: I'm quite certain the task was in the process of finishing up, as you suggest. I would have thought the client would send an abort request to the task, but all we see is the no heartbeat. The no heartbeat message occured less than a minute before the task was reported (4pm edt, 8pm utc). I suggest either the client terminates tasks by failing to send a heartbeat, or (more likely?) the task failed to get the request to abort, but was aware immediately that no heartbeat was present. It is also possible that the client sent a signal, which the task caught but reported the no heartbeat instead. If so, the no heartbeat might have persisted for a long time as you suggest. I don't know the IPC design of boinc, but it probably doesn't matter. The task appears to have got stuck exiting for unknown reasons, possibly a race condition. I did stop and continue the task before aborting it, hoping that might shake something up. In any case, I suspect the error is random and not likely to happen again. I will release the others one-by-one just in case. |
4)
(Message 331)
Posted 24 Oct 2012 by rhb Post: Task 88762 was sleeping and not using any cpu time after using about 3 hours cpu. I looked at it with system monitor, and it had no open files and no memory map. Perhaps it got stuck as it was about ready to exit. I will suspend the project for now, in case the others I have queued are similar. Let me know if some tasks are known to be bad, or if there is anything I can do to help in solving the problem. If it appears to be just my system, I will try the others and see what happens. Name ps_121018_9005_232_0 Workunit 36604 Created 18 Oct 2012 | 5:16:13 UTC Sent 21 Oct 2012 | 6:01:13 UTC Received 24 Oct 2012 | 19:50:31 UTC Server state Over Outcome Computation error Client state Aborted by user Exit status 203 (0xcb) Unknown error number Computer ID 362 Report deadline 31 Oct 2012 | 18:01:13 UTC Run time 63,090.56 CPU time 10,696.53 Validate state Invalid Credit 0.00 Application version Period Search Application v101.00 Stderr output <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> aborted by user </message> <stderr_txt> 15:49:49 (17224): No heartbeat from core client for 30 sec - exiting </stderr_txt> ]]> |