Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Motivation

...

  • url=https://analytics.broadinstitute.org/Metrics?type=vvp
    where the file should be sent to

  • filepathRegex=(.*)/(.*AspAllOutputQC.csv)
    which file-names should be picked

  • outer_columns=field1=$1,field2=$2,field3=XYZ
    parse fields out of the filepathRegex and injects them in the JSON (useful when bits of data is encoded in the filename)

  • dryrun=true
    run all the delta capturing and regex-parsing and show the data without actually pushing the file

  • delta=<delta specification>
    specifies ”how” files to be picked

    • pick 1 (or multiple) specific files
      delta=FILES_CSV /seq/tableau_files/VVPVolumeQC/20221205_RACK_QC_083303_AspAllOutputQC.csv

    • pick files timestamped between this and that timestamp
      delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN 2019-10-19 13:11:46 AND 2019-10-19 13:11:46

    • pick files timestamped in last 10 hours
      delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN -10h AND NOW

    • pick files timestamped between persisted-in-file-timestamp and now (production setup)
      delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN /seq/tableau_files/VVPVolumeQC/VVP_etl_timestamp.txt AND NOW
      The file merely contains a timestamp(2019-10-19 13:11:46) and should be created manually

...

Alternative ways of automated “push” are also possible - for example push directly from within Perl script using the "Requests" library - all necessary parameters can be inferred from the curl commands above.

Caveats

  • JSON-queries appear to be sensitive to the Oracle 19.8.0 Bug 31532339 - ORA-600 [koksccda1]
    DBAs are working to address this by upgrading SEQPROD to v19.15.0

  • “analytics” VM is in our private network so it can’t be directly accessed from Google Cloud.
    However an “onprem”-script can easily read from GC and push to “analytics” VM