Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1218

monitord replay against mysql with registration jobs

    Details

      Description

      Hi Karan,

      I'm running into an issue when I run pegasus-monitord in replay mode on some post-processing workflows. It looks like it's trying to add all of my output files to a database, but they've been added already:

      2017-08-23 12:55:39,852:ERROR:Pegasus.db.workflow_loader.WorkflowLoader(107): Insert failed for event <class 'Pegasus.db.schema.RCPFN'>:
        * wf_id : 5240
        * wf_uuid : f8419003-ec55-4f2f-81e5-8088c5f443a2
        * pfn_id : None
        * lfn : RotD_UCSB_4911_267_4.rotd
        * site : shock
        * pfn : gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_267_4.rotd
        * lfn_id : 1984029
        * event : stampede.rc.pfn
       : (IntegrityError) (1062, "Duplicate entry '1984029-gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberS' for key 'UNIQUE_PFN'") 'INSERT INTO rc_pfn (lfn_i
      d, pfn, site) VALUES (%s, %s, %s)' (1984029L, 'gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_267_4.rotd&#39;, 'shock
      ')
      2017-08-23 12:55:39,885:ERROR:Pegasus.db.workflow_loader.WorkflowLoader(107): Insert failed for event <class 'Pegasus.db.schema.RCPFN'>:
        * wf_id : 5240
        * wf_uuid : f8419003-ec55-4f2f-81e5-8088c5f443a2
        * pfn_id : None
        * lfn : RotD_UCSB_4911_254_19.rotd
        * site : shock
        * pfn : gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_254_19.rotd
        * lfn_id : 1984030
        * event : stampede.rc.pfn
       : (IntegrityError) (1062, "Duplicate entry '1984030-gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberS' for key 'UNIQUE_PFN'") 'INSERT INTO rc_pfn (lfn_i
      d, pfn, site) VALUES (%s, %s, %s)' (1984030L, 'gsiftp://hpc-transfer.usc.edu/home/scec-04/tera3d/CyberShake/data/PPFiles/UCSB/4911/RotD_UCSB_254_19.rotd&#39;, 'shoc
      k')

      The issue is that it does this for all ~28k files for each workflow, and as a result takes forever. Is there a way to disable this, since it seems like the files have been inserted already? Is it necessary for getting statistics, which is what I'm interested in? Thanks!

      -Scott

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              scottcal@usc.edu Scott Callaghan
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: