Main development work in https://gpinfojira.broadinstitute.org/jira/browse/RPT-5663
This is a tool to help push files to the analytics Metrics web-service.
1. Start with:
/broad/software/free/Linux/redhat_6_x86_64/pkgs/jdk1.8.0_121/bin/java -cp /home/unix/analytics/TigerETL3/remote-assembly-1.0.jar analytics.tiger.remote.FilePusher
2. Add parameters:
dryrun | TRUE if you are just testing (will still show output) |
url | This is the actual call to the metrics webservice that should be configured with your destination table and any “outer columns” (columns not being parsed out of your files, such as filename) (More documentation TBD) |
filepathRegex (optional; defaults to * if left blank) | Regex to pick which files to push (and also can extract data from filenames in capturing groups) e.g.
where the starting folder is specified in the delta parameter (below). |
delta | tells the service where to look for files Use FILES_CSV to manually list files
Use FILES IN FOLDER to specify a folder location (your regular expression above will target files more specifically)
TIMESTAMPED defines a relative or absolute time range (optional) push files modified in the last 10 hours:
push files modified between the previous filepusher run and now. This requires a helper txt file to be created (with an initial timestamp), and file pusher will update it with the current time when run. It then serves as the start delta for the next time the file pusher is run.
|
3. Put it in a Unix Script
4. Optional - Get Nasko to encode it into a URL
Full examples:
Unix script to look at production VVP files in the directory /seq/tableau_files/VVPVolumeQC using a timestamp helper
#!/bin/bash echo "Running production file pusher..." /broad/software/free/Linux/redhat_6_x86_64/pkgs/jdk1.8.0_121/bin/java -cp /home/unix/analytics/TigerETL3/remote-assembly-1.0.jar analytics.tiger.remote.FilePusher 'dryrun=false' 'url=http://analytics:8090/Metrics?type=FlatFileMetrics&destination_table=ANALYTICS.VVP_QC&outer_columns=FILENAME=$1,RUNMODE=Production' 'filepathRegex=(.*)/(.*AspAllOutputQC.csv)' 'delta=FILES IN FOLDER /seq/tableau_files/VVPVolumeQC TIMESTAMPED BETWEEN /seq/tableau_files/VVPVolumeQC/VVPtimestamp.txt AND NOW'