Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

DEPRICATED - go to FilePusher - push tsv/csv metrics-files to Analytics
Main development work in https://gpinfojira.broadinstitute.org/jira/browse/RPT-5663

...

Code Block
/broad/software/free/Linux/redhat_6_x86_64/pkgs/jdk1.8.0_121/bin/java -cp /home/unix/analytics/TigerETL3/remote-assembly-1.0.jar analytics.tiger.remote.FilePusher


2. Add parameters:

dryrun

TRUE if you are just testing (will still show output)
FALSE if you want to push files to the metrics service.

url

This is the actual call to the metrics webservice that should be configured with your destination table and any “outer columns” (columns not being parsed out of your files, such as filename)

(More documentation TBD)

filepathRegex (optional; defaults to * if left blank)

Regex to pick which files to push (and also can extract data from filenames in capturing groups)

e.g. (.*)/(.*AspAllOutputQC.csv) or

(.*)/([^/]*)_tableau_metrics.txt

where the starting folder is specified in the delta parameter (below).

Capturing groups need to be specified in OUTER COLUMNS in the url parameter.

delta

tells the service where to look for files

Use FILES_CSV to manually list files

FILES_CSV /home/unix/analytics/SK-3PJO_tableau_metrics.txt,XYZ-File.txt

Use FILES IN FOLDER to specify a folder location (your regular expression above will target files more specifically)

FILES IN FOLDER /seq/tableau_files/VVPVolumeQC

TIMESTAMPED defines a relative or absolute time range (optional)

push files modified in the last 10 hours:

FILES IN FOLDER /home/unix/analytics TIMESTAMPED BETWEEN -10h AND NOW


push files modified after 2019-10-19:

FILES IN FOLDER /home/unix/analytics TIMESTAMPED BETWEEN 2019-10-19 13:11:46 AND NOW'

push files modified between the previous filepusher run and now. This requires a helper txt file to be created (with an initial timestamp), and file pusher will update it with the current time when run. It then serves as the start delta for the next time the file pusher is run.

FILES IN FOLDER /home/unix/analytics TIMESTAMPED BETWEEN /home/unix/analytics/mytimestamp.txt AND NOW

3. Put it in a Unix Script

...

Full examples:

Unix script (below) to look at production VVP files in the directory /seq/tableau_files/VVPVolumeQC using a timestamp helper . This script is in \\neon-cifs\home_unix\scripts\vvp_prod_push.sh and there is a scheduled cronjob for it. Same goes for the other File Pusher services - uploadDragenDemuxStats.sh, smartseq_push.sh

Code Block
#!/bin/bash
echo "Running production file pusher..."

/broad/software/free/Linux/redhat_6_x86_64/pkgs/jdk1.8.0_121/bin/java -cp /home/unix/analytics/TigerETL3/remote-assembly-1.0.jar analytics.tiger.remote.FilePusher 'dryrun=false'  'url=http://analytics:8090/Metrics?type=FlatFileMetrics&destination_table=ANALYTICS.VVP_QC&outer_columns=FILENAME=$1,RUNMODE=Production' 'filepathRegex=(.*)/(.*AspAllOutputQC.csv)' 'delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN /seq/tableau_files/VVPVolumeQC/VVPtimestamp.txt AND NOW'

...