...
Code Block | ||
---|---|---|
| ||
CREATE TABLE ANALYTICS.BQMS_EXPLODED_SAMPLES ( KEY VARCHAR2 (100), SUMMARY VARCHAR2 (100), STATUS VARCHAR2 (50), REPORTER VARCHAR2 (60), PRODUCT_AND_PROCESS VARCHAR2 (100), SAMPLE_ID VARCHAR2 (50), UPDATED DATE ) / CREATE INDEX ANALYTICS.BQMS_EXPLODED_SAMPLES_IDX1 ON BQMS_EXPLODED_SAMPLES(KEY ASC) / GRANT ALL ON BQMS_EXPLODED_SAMPLES TO analyticsetl / |
Step 3: Register a new JqlTask in
...
“/home/unix/analytics/TigerETL3/jqlTasks.conf” file
...
Step 4: Test your ETL in manual mode
Code Block | ||
---|---|---|
| ||
/home/unix/analytics/TigerETL3/runEtlAgent.sh Task db=analyticsetl 'task=jqlTask(taskName=bqms_exploded_samples)' 'delta=MillisDelta.manual(2021-mar-21 00:00:00,2021-mar-22 00:00:00)' |
RUN3 is variable you should have defined in your TigerEnvironment (if it gives you trouble you can just replace it with /home/unix/analytics/TigerETL3/runEtlAgent.sh
)
Step 5: Prepare a delta-tracker
...
Step 6: Test your ETL in delta-driven mode
Code Block | ||
---|---|---|
| ||
$RUN3 Task db=analyticsetl 'task=jqlTask(taskName=BQMS_ISSUE)' 'delta=MillisDelta.loadFromDb' |
Step 7: Schedule a cronjob to reach full automation
In order to avoid interference with production, you might want to put it 1st in your private crontab
crontab -e
Code Block |
---|
HOST=analytics
MAILTO=atanas@broadinstitute.org
PATH=/bin:/broad/software/free/Linux/redhat_7_x86_64/pkgs/jdk1.8.0_121/bin:$PATH
TIGER_HOME=/home/unix/analytics/TigerETL3
SPARK_HOME=/local/spark-2.3.1-bin-hadoop2.7
RUN=/home/unix/analytics/TigerETL/runEtlAgent2
RUN3=/home/unix/analytics/TigerETL3/runEtlAgent.sh
# +--------- Minute (0-59) | Output Dumper: >/dev/null 2>&1
# | +------- Hour (0-23) | Multiple Values Use Commas: 3,12,47
# | | +----- Day Of Month (1-31) | Do every X intervals: */X -> Example: */15 * * * * Is every 15 minutes
# | | | +--- Month (1 -12) | Aliases: @reboot -> Run once at startup; @hourly -> 0 * * * *;
# | | | | +- Day Of Week (0-6) (Sunday = 0) | @daily -> 0 0 * * *; @weekly -> 0 0 * * 0; @monthly ->0 0 1 * *;
# | | | | | | @yearly -> 0 0 1 1 *;
# * * * * * COMMAND
#
YOUR-CRONJOB-HERE |
Have it running like this for couple of days and then move your cronjob to production crontab.
HAPPY END
Some thoughts:
So called “JQL-explosion” (splitting given field - say “SampleIDs -“ into items and combining them with the rest of fields for given ticket) seems convenient however it is very wasteful - all non-exploded fields will be duplicated as many times as #samples are found. This could possibly lead to performance problems.
...