data reuse algorithm should consider file locations while cascading deletion upwards

XMLWordPrintable

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major
    • master, 4.6.0, 4.5.3
    • Affects Version/s: master, 4.5.2
    • Component/s: Pegasus Planner
    • None

      From, Chris Eklund

      The new code/settings seemed to do the trick of reusing files from multiple directories, but there’s still the problem of the reducer not pruning the successful jobs whose output files have been deleted previously by cleanup jobs. I had expected the following jobs (which previously succeeded) to be pruned: RealignerTargetCreator_RealignerTargetCreator1, IndelRealigner_IndelRealigner1, BaseRecalibrator_BaseRecalibrator1firstPass, BaseRecalibrator_BaseRecalibrator1secondPass. Here is the relevant debug output from the planner:

      2015.09.21 10:33:02.249 PDT: [DEBUG] Reducing the workflow
      2015.09.21 10:33:02.249 PDT: [INFO] event.pegasus.reduce dax.id RecalibrateAndRealignDax_0 - STARTED
      2015.09.21 10:33:02.250 PDT: [DEBUG] Jobs whose o/p files already exist
      2015.09.21 10:33:02.250 PDT: [DEBUG] HaplotypeCaller_HaplotypeCallermale1
      2015.09.21 10:33:02.250 PDT: [DEBUG] PrintReads_PrintReads1
      2015.09.21 10:33:02.251 PDT: [DEBUG] AnalyzeCovariates_AnalyzeCovariates1
      2015.09.21 10:33:02.251 PDT: [DEBUG] Jobs whose o/p files already exist - DONE
      2015.09.21 10:33:02.253 PDT: [DEBUG] Marking node for removal from the workflow AnalyzeCovariates_AnalyzeCovariates1
      2015.09.21 10:33:02.254 PDT: [DEBUG] CombineGVCFs_CombineGVCFs1 will not be deleted as not as child CombineGVCFs_CombineGVCFsacross0 is not marked for deletion
      2015.09.21 10:33:02.254 PDT: [DEBUG] HaplotypeCaller_HaplotypeCaller1 will not be deleted as not as child CombineGVCFs_CombineGVCFs1 is not marked for deletion
      2015.09.21 10:33:02.254 PDT: [DEBUG] Marking node for removal from the workflow HaplotypeCaller_HaplotypeCallermale1
      2015.09.21 10:33:02.254 PDT: [DEBUG] Marking node for removal from the workflow PrintReads_PrintReads1
      2015.09.21 10:33:02.254 PDT: [DEBUG] BaseRecalibrator_BaseRecalibrator1firstPass will not be deleted as not as child BaseRecalibrator_BaseRecalibrator1secondPass is not marked for deletion
      2015.09.21 10:33:02.255 PDT: [DEBUG] IndelRealigner_IndelRealigner1 will not be deleted as not as child BaseRecalibrator_BaseRecalibrator1secondPass is not marked for deletion
      2015.09.21 10:33:02.255 PDT: [DEBUG] RealignerTargetCreator_RealignerTargetCreator1 will not be deleted as not as child IndelRealigner_IndelRealigner1 is not marked for deletion
      2015.09.21 10:33:02.256 PDT: [DEBUG] Removing node from the workflow AnalyzeCovariates_AnalyzeCovariates1
      2015.09.21 10:33:02.256 PDT: [DEBUG] Removing node from the workflow HaplotypeCaller_HaplotypeCallermale1
      2015.09.21 10:33:02.256 PDT: [DEBUG] Removing node from the workflow PrintReads_PrintReads1
      2015.09.21 10:33:02.256 PDT: [INFO] Nodes/Jobs Deleted from the Workflow during reduction
      2015.09.21 10:33:02.257 PDT: [INFO] AnalyzeCovariates_AnalyzeCovariates1
      2015.09.21 10:33:02.257 PDT: [INFO] HaplotypeCaller_HaplotypeCallermale1
      2015.09.21 10:33:02.257 PDT: [INFO] PrintReads_PrintReads1
      2015.09.21 10:33:02.257 PDT: [INFO] Nodes/Jobs Deleted from the Workflow during reduction - DONE
      2015.09.21 10:33:02.257 PDT: [INFO] event.pegasus.reduce dax.id RecalibrateAndRealignDax_0 (0.008 seconds) – FINISHED

            Assignee:
            Karan Vahi
            Reporter:
            Karan Vahi
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: