-
Type: New Feature
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master, 4.3.2
-
Component/s: Planner: Transfer Module
-
None
The following would be a nice performance improvement which we have talked about in the past.
For workflows with a large number of inputs, the idea is that we can spread do the transfers in parallel by having a set of stagin jobs running in parallel. However, Pegasus currently does a poor job balancing the inputs across those jobs. For example, for a 8deg Montage workflow with 20 stagin jobs:
$ for FILE in `ls stage_in_local_local_*.in | sort`; do COUNT=`cat $FILE | grep http | wc -l`; echo "$FILE $COUNT"; done
stage_in_local_local_0_0.in 692
stage_in_local_local_0_1.in 689
stage_in_local_local_0_2.in 637
stage_in_local_local_0_3.in 638
stage_in_local_local_2_0.in 1
stage_in_local_local_3_0.in 1
stage_in_local_local_5_0.in 4
stage_in_local_local_5_1.in 4
stage_in_local_local_5_2.in 4
stage_in_local_local_5_3.in 4
stage_in_local_local_6_0.in 4
stage_in_local_local_6_1.in 4
stage_in_local_local_6_2.in 4
stage_in_local_local_6_3.in 4
stage_in_local_local_8_0.in 1
stage_in_local_local_9_0.in 1