Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-969

InPlace cleanup deletes an intermediate output file before it is consumed by the child job

    XMLWordPrintable

Details

    Description

      Email from Tobias Tikaa@USC
      I hope you are doing good. I am currently trying to modify my pipeline so that pegasus cleans up the files that are not longer needed, so I removed the "--cleanup none" from my script that runs pegasus-plan. Pegasus does however remove an intermediate file that is created by a job, and the whole run crashes since that file is needed by next job.

      How should I do in order to get this to work? I have attached a picture of the jobs (it is the cleanup_PegasusVM_level_3 that deletes the file I need) and the folder from the work directory that pegasus generates

      the name of the file is EC000284.intervals_from_target_creator.intervals and it is created by GATKTargetCreatorJob.


      I get the following log from pegasus-analyzer:


      =========================IndelRealigner_IndelRealigner1=========================

       last state: POST_SCRIPT_FAILED
             site: PegasusVM
      submit file: IndelRealigner_IndelRealigner1.sub
      output file: IndelRealigner_IndelRealigner1.out.001
       error file: IndelRealigner_IndelRealigner1.err.001

      -------------------------------Task #1 - Summary--------------------------------

      site : PegasusVM
      hostname : unknown
      executable : /usr/java/jdk1.8.0_51/jre/bin/java
      arguments : -Xmx4g -jar /home/bcpipeline/software/gatk/default/GenomeAnalysisTK.jar -T IndelRealigner -R reference.fa -I EC000284.bam -targetIntervals EC000284.intervals_from_target_creator.intervals -o EC000284.realigned.bam -log bySample/EC000284/EC000284.indelRealigner.log -known knownIndels.0.vcf.gz -known knownIndels.1.vcf.gz
      exitcode : 1
      working dir : /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016

      -----Task #1 - bc::IndelRealigner:1.0 - IndelRealigner1 - Kickstart stderr------

      INFO 11:57:22,769 HelpFormatter - --------------------------------------------------------------------------------
      INFO 11:57:22,773 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
      INFO 11:57:22,774 HelpFormatter - Copyright (c) 2010 The Broad Institute
      INFO 11:57:22,775 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
      INFO 11:57:22,778 HelpFormatter - Program Args: -T IndelRealigner -R reference.fa -I EC000284.bam -targetIntervals EC000284.intervals_from_target_creator.intervals -o EC000284.realigned.bam -log bySample/EC000284/EC000284.indelRealigner.log -known knownIndels.0.vcf.gz -known knownIndels.1.vcf.gz
      INFO 11:57:22,784 HelpFormatter - Executing as bcpipeline@localhost.localdomain on Linux 2.6.32-504.30.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16.
      INFO 11:57:22,785 HelpFormatter - Date/Time: 2015/08/04 11:57:22
      INFO 11:57:22,785 HelpFormatter - --------------------------------------------------------------------------------
      INFO 11:57:22,786 HelpFormatter - --------------------------------------------------------------------------------
      INFO 11:57:23,495 GenomeAnalysisEngine - Strictness is SILENT
      INFO 11:57:23,639 GenomeAnalysisEngine - Downsampling Settings: No downsampling
      INFO 11:57:23,680 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
      WARNING: BAM index file /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016/EC000284.bam.bai is older than BAM /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016/EC000284.bam
      INFO 11:57:23,784 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.10
      WARN 11:57:24,090 IndexDictionaryUtils - Track knownAlleles doesn't have a sequence dictionary built in, skipping dictionary validation
      WARN 11:57:24,094 IndexDictionaryUtils - Track knownAlleles2 doesn't have a sequence dictionary built in, skipping dictionary validation
      INFO 11:57:24,307 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
      INFO 11:57:24,313 GenomeAnalysisEngine - Done preparing for traversal
      INFO 11:57:24,313 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
      INFO 11:57:24,313 ProgressMeter - | processed | time | per 1M | | total | remaining
      INFO 11:57:24,314 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
      INFO 11:57:25,643 GATKRunReport - Uploaded run statistics report to AWS S3
      ##### ERROR ------------------------------------------------------------------------------------------
      ##### ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
      ##### ERROR
      ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
      ##### ERROR The error message below tells you what is the problem.
      ##### ERROR
      ##### ERROR If the problem is an invalid argument, please check the online documentation guide
      ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
      ##### ERROR
      ##### ERROR Visit our website and forum for extensive documentation and answers to
      ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
      ##### ERROR
      ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
      ##### ERROR
      ##### ERROR MESSAGE: Couldn't read file /test_disk/shared-scratch/bcpipeline/pegasus/RecalibrateAndRealignDax/run0016/EC000284.intervals_from_target_creator.intervals because The interval file does not exist.
      ##### ERROR ------------------------------------------------------------------------------------------

      Attachments

        Activity

          People

            vahi Karan Vahi
            vahi Karan Vahi
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: