Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1218

monitord replay against mysql with registration jobs

XMLWordPrintable

      Hi Karan,

      I'm running into an issue when I run pegasus-monitord in replay mode on some post-processing workflows. It looks like it's trying to add all of my output files to a database, but they've been added already:

      2017-08-23 12:55:39,852:ERROR:Pegasus.db.workflow_loader.WorkflowLoader(107): Insert failed for event <class 'Pegasus.db.schema.RCPFN'>:

      • wf_id : 5240
      • wf_uuid : f8419003-ec55-4f2f-81e5-8088c5f443a2
      • pfn_id : None
      • lfn : RotD_UCSB_4911_267_4.rotd
      • site : shock
      • pfn : gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_267_4.rotd
      • lfn_id : 1984029
      • event : stampede.rc.pfn
        : (IntegrityError) (1062, "Duplicate entry '1984029-gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberS' for key 'UNIQUE_PFN'") 'INSERT INTO rc_pfn (lfn_i
        d, pfn, site) VALUES (%s, %s, %s)' (1984029L, 'gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_267_4.rotd', 'shock
        ')
        2017-08-23 12:55:39,885:ERROR:Pegasus.db.workflow_loader.WorkflowLoader(107): Insert failed for event <class 'Pegasus.db.schema.RCPFN'>:
      • wf_id : 5240
      • wf_uuid : f8419003-ec55-4f2f-81e5-8088c5f443a2
      • pfn_id : None
      • lfn : RotD_UCSB_4911_254_19.rotd
      • site : shock
      • pfn : gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_254_19.rotd
      • lfn_id : 1984030
      • event : stampede.rc.pfn
        : (IntegrityError) (1062, "Duplicate entry '1984030-gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberS' for key 'UNIQUE_PFN'") 'INSERT INTO rc_pfn (lfn_i
        d, pfn, site) VALUES (%s, %s, %s)' (1984030L, 'gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_254_19.rotd', 'shoc
        k')

      The issue is that it does this for all ~28k files for each workflow, and as a result takes forever. Is there a way to disable this, since it seems like the files have been inserted already? Is it necessary for getting statistics, which is what I'm interested in? Thanks!

      -Scott

            Assignee:
            vahi Karan Vahi
            Reporter:
            scottcal@usc.edu Scott Callaghan
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: