In QMC -> Configuration system -> Schedulers, there's an "Advanced" section which is used to config concurrent tasks throughout.
IMPORTANT TO REMEMBER
- A reload node could run NumberOfCores - 2
- If for example, the reload node has 8 CPU Cores then it could possibly run 6 concurrent reloads at its maximum.
Use the "Max concurrent reloads" to limit the maximal concurrent tasks can be run at same time on current node
. By default, it's set to 4, which means only 4 tasks can be run at same time on this node.
When the 5th task comes in:
- It will be queued by sequence.
- The queue has a time setting, which will eliminate (erase) a queued task if the timeout limit reaches. By default, it's set to 30 mins, in this case, if none of the first 4 tasks finishes in 30 minutes after the 5th task comes in, then the 5th task will be cancelled due to timeout. It has to be triggered again (manually or next scheduled time slot).
- Once one of the 4 running tasks finishes, the 5th task will be executed on this node if it's not timed out.
- If you set timeout to 0, this will make the queue never timeout.
- It's suggested not to run more than 10 concurrent reloads at once, unless the schedular node has an extreme amount of resources dedicated to it.
- If there is low memory or cores available on the node, then some tasks may kick off however take several hours to complete. Adding resources to the node running the reloads is suggested.
On a multi-node deployment, tasks will be balanced from the master node to any node(s) designated as slaves.
It's highly advised to check if the central node in the QMC in the Schedulers section is set to "Master and Slave" or "Master". When set to "Master", it will send all reload jobs to the reload/scheduler nodes, as it should. However if a central node is set to Master and Slave, this means the Central node will also be involved in performing reloads, and is not suggested.
When there's a 3 nodes deployment (one master and two slaves) and each node is set to "Max concurrent reloads" = 4 and "Engine timeout" = 30:
*The balance check only involves CPU occupation and memory usage but NOT each Scheduler node queue for now. So even if the queue on a certain node has free space but the system usage is high (such as CPU 95% used, 2 tasks running), the new coming in task will be assigned to another fully queued node but with lower system usage (let's say CPU 60% used, but 4 tasks running)
- Master node receives a new task execution request.
- Master node checks the resource availability on each of the slave nodes.
- Master node assigns this task to the node which is least busy at the moment of triggering*.