InPlace cleanup deletes an intermediate output file before it is consumed by the child job

XMLWordPrintable

      Email from Tobias Tikaa@USC
      I hope you are doing good. I am currently trying to modify my pipeline so that pegasus cleans up the files that are not longer needed, so I removed the "--cleanup none" from my script that runs pegasus-plan. Pegasus does however remove an intermediate file that is created by a job, and the whole run crashes since that file is needed by next job.

      How should I do in order to get this to work? I have attached a picture of the jobs (it is the cleanup_PegasusVM_level_3 that deletes the file I need) and the folder from the work directory that pegasus generates

      the name of the file is EC000284.intervals_from_target_creator.intervals and it is created by GATKTargetCreatorJob.

      I get the following log from pegasus-analyzer:

      =========================IndelRealigner_IndelRealigner1=========================

      last state: POST_SCRIPT_FAILED
      site: PegasusVM
      submit file: IndelRealigner_IndelRealigner1.sub
      output file: IndelRealigner_IndelRealigner1.out.001
      error file: IndelRealigner_IndelRealigner1.err.001

      ------------------------------Task #1 - Summary-------------------------------

      site : PegasusVM
      hostname : unknown
      executable : /usr/java/jdk1.8.0_51/jre/bin/java
      arguments : -Xmx4g -jar /home/bcpipeline/software/gatk/default/GenomeAnalysisTK.jar -T IndelRealigner -R reference.fa -I EC000284.bam -targetIntervals EC000284.intervals_from_target_creator.intervals -o EC000284.realigned.bam -log bySample/EC000284/EC000284.indelRealigner.log -known knownIndels.0.vcf.gz -known knownIndels.1.vcf.gz
      exitcode : 1
      working dir : /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016

      ----Task #1 - bc::IndelRealigner:1.0 - IndelRealigner1 - Kickstart stderr-----

      INFO 11:57:22,769 HelpFormatter - --------------------------------------------------------------------------------
      INFO 11:57:22,773 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
      INFO 11:57:22,774 HelpFormatter - Copyright (c) 2010 The Broad Institute
      INFO 11:57:22,775 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
      INFO 11:57:22,778 HelpFormatter - Program Args: -T IndelRealigner -R reference.fa -I EC000284.bam -targetIntervals EC000284.intervals_from_target_creator.intervals -o EC000284.realigned.bam -log bySample/EC000284/EC000284.indelRealigner.log -known knownIndels.0.vcf.gz -known knownIndels.1.vcf.gz
      INFO 11:57:22,784 HelpFormatter - Executing as bcpipeline@localhost.localdomain on Linux 2.6.32-504.30.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16.
      INFO 11:57:22,785 HelpFormatter - Date/Time: 2015/08/04 11:57:22
      INFO 11:57:22,785 HelpFormatter - --------------------------------------------------------------------------------
      INFO 11:57:22,786 HelpFormatter - --------------------------------------------------------------------------------
      INFO 11:57:23,495 GenomeAnalysisEngine - Strictness is SILENT
      INFO 11:57:23,639 GenomeAnalysisEngine - Downsampling Settings: No downsampling
      INFO 11:57:23,680 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
      WARNING: BAM index file /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016/EC000284.bam.bai is older than BAM /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016/EC000284.bam
      INFO 11:57:23,784 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.10
      WARN 11:57:24,090 IndexDictionaryUtils - Track knownAlleles doesn't have a sequence dictionary built in, skipping dictionary validation
      WARN 11:57:24,094 IndexDictionaryUtils - Track knownAlleles2 doesn't have a sequence dictionary built in, skipping dictionary validation
      INFO 11:57:24,307 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
      INFO 11:57:24,313 GenomeAnalysisEngine - Done preparing for traversal
      INFO 11:57:24,313 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
      INFO 11:57:24,313 ProgressMeter - | processed | time | per 1M | | total | remaining
      INFO 11:57:24,314 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
      INFO 11:57:25,643 GATKRunReport - Uploaded run statistics report to AWS S3

              1. ERROR ------------------------------------------------------------------------------------------
              2. ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
              3. ERROR
              4. ERROR This means that one or more arguments or inputs in your command are incorrect.
              5. ERROR The error message below tells you what is the problem.
              6. ERROR
              7. ERROR If the problem is an invalid argument, please check the online documentation guide
              8. ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
              9. ERROR
              10. ERROR Visit our website and forum for extensive documentation and answers to
              11. ERROR commonly asked questions http://www.broadinstitute.org/gatk
              12. ERROR
              13. ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
              14. ERROR
              15. ERROR MESSAGE: Couldn't read file /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016/EC000284.intervals_from_target_creator.intervals because The interval file does not exist.
              16. ERROR ------------------------------------------------------------------------------------------

            Assignee:
            Karan Vahi
            Reporter:
            Karan Vahi
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: