Motivation
...
url=https://analytics.broadinstitute.org/Metrics?type=vvp
where the file should be sent tofilepathRegex=(.*)/(.*AspAllOutputQC.csv)
which file-names should be pickedouter_columns=field1=$1,field2=$2,field3=XYZ
parse fields out of the filepathRegex and injects them in the JSON (useful when bits of data is encoded in the filename)dryrun=true
run all the delta capturing and regex-parsing and show the data without actually pushing the filedelta=<delta specification>
specifies ”how” files to be pickedpick 1 (or multiple) specific files
delta=FILES_CSV /seq/tableau_files/VVPVolumeQC/20221205_RACK_QC_083303_AspAllOutputQC.csvpick files timestamped between this and that timestamp
delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN 2019-10-19 13:11:46 AND 2019-10-19 13:11:46pick files timestamped in last 10 hours
delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN -10h AND NOWpick files timestamped between persisted-in-file-timestamp and now (production setup)
delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN /seq/tableau_files/VVPVolumeQC/VVP_etl_timestamp.txt AND NOW
The file merely contains a timestamp(2019-10-19 13:11:46) and should be created manuallyadjustment adjust delta so that you can mitigate clock-discrepancy problems
delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN /seq/tableau_files/VVPVolumeQC/VVP_etl_timestamp.txt MINUS 5 MINUTES AND NOW
Final production-grade command would look like this
...