Cast string "array.id" to integer, makeClusterFunctionTorque with SSH#53
Cast string "array.id" to integer, makeClusterFunctionTorque with SSH#53dagola wants to merge 58 commits into
Conversation
In a multi-node cluster site only one node may be able to accept Torque commands. If this node is accessible via SSH, BatchJobs can run on any other node tunnelling the Torque command to that node.
On submit of array jobs each sub job gets as batch id array.id[]. array.id[] is not in qselect output. Thus waitForJobs stops after 5 sleeps because matching of internal batch ids with listJobs returns an empty set.
|
Do you login on the master via ssh to call qselect? |
|
Yes, I do. Please see a17396e: in |
|
I'm amazed that this works! We've to check this on some more systems though. |
|
Why should it not work? |
I need to double check that exit codes are correctly forwarded and quoting is correct. |
@berndbischl Your opinion? |
|
That's true, a shared file system is still needed. |
|
If Bernd does not have any objections, I'm afraid the SSH stuff will not make it into the next release because I do not have enough time to test this. But I would pull your changes after October 20th and try to generalize it for other cluster functions as well. |
|
@mllg |
… file systems it can happen that the file is not available instantaneous
… file systems it can happen that the file is not available instantaneous
# Conflicts: # R/clusterFunctionsTorque.R
Change to slurm scheduler
asIntexpects a numeric input value.array.idas output fromSys.getenvis a string.In a multi-node cluster site only one node may be able to accept Torque
commands. If this node is accessible via SSH, BatchJobs can run on any
other node tunnelling the Torque command to that node.