monitord replay against mysql with registration jobs

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Hi Karan,

      I'm running into an issue when I run pegasus-monitord in replay mode on some post-processing workflows. It looks like it's trying to add all of my output files to a database, but they've been added already:

      2017-08-23 12:55:39,852:ERROR:Pegasus.db.workflow_loader.WorkflowLoader(107): Insert failed for event <class 'Pegasus.db.schema.RCPFN'>:

      • wf_id : 5240
      • wf_uuid : f8419003-ec55-4f2f-81e5-8088c5f443a2
      • pfn_id : None
      • lfn : RotD_UCSB_4911_267_4.rotd
      • site : shock
      • pfn : gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_267_4.rotd
      • lfn_id : 1984029
      • event : stampede.rc.pfn
        : (IntegrityError) (1062, "Duplicate entry '1984029-gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberS' for key 'UNIQUE_PFN'") 'INSERT INTO rc_pfn (lfn_i
        d, pfn, site) VALUES (%s, %s, %s)' (1984029L, 'gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_267_4.rotd', 'shock
        ')
        2017-08-23 12:55:39,885:ERROR:Pegasus.db.workflow_loader.WorkflowLoader(107): Insert failed for event <class 'Pegasus.db.schema.RCPFN'>:
      • wf_id : 5240
      • wf_uuid : f8419003-ec55-4f2f-81e5-8088c5f443a2
      • pfn_id : None
      • lfn : RotD_UCSB_4911_254_19.rotd
      • site : shock
      • pfn : gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_254_19.rotd
      • lfn_id : 1984030
      • event : stampede.rc.pfn
        : (IntegrityError) (1062, "Duplicate entry '1984030-gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberS' for key 'UNIQUE_PFN'") 'INSERT INTO rc_pfn (lfn_i
        d, pfn, site) VALUES (%s, %s, %s)' (1984030L, 'gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_254_19.rotd', 'shoc
        k')

      The issue is that it does this for all ~28k files for each workflow, and as a result takes forever. Is there a way to disable this, since it seems like the files have been inserted already? Is it necessary for getting statistics, which is what I'm interested in? Thanks!

      -Scott

            Assignee:
            Karan Vahi
            Reporter:
            Scott Callaghan
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: