/
Jira Extraction API v2 (JQL-ETL Edition)

Jira Extraction API v2 (JQL-ETL Edition)

Motivation

Jira-data extracted via JqlTableauExtractService is currently being recorded into Tableau hyper-extracts. Unfortunately Tableau’s ability to deal properly with delta-driven updates is very limited (it can only work on datasets with timeUpdated field) which has encouraged Analytics engineers to implement these “grab-everything” kind of jobs which do NOT have any time-based filter - as a result these jobs keep growing in size and time every day for eternity and Jira might start showing signs of (unnecessary) overload.

It would be great if jira-extract is rerouted into a regular datamart into DB which offers many options for advanced delta-driven ETL-handling.

Step 1: Use Einstein to cook up your NormalizedUrl (aka jqlLink)

Step 2: Create a corresponding table in the DB

 

CREATE TABLE ANALYTICS.BQMS_EXPLODED_SAMPLES ( KEY VARCHAR2 (100), SUMMARY VARCHAR2 (100), STATUS VARCHAR2 (50), REPORTER VARCHAR2 (60), PRODUCT_AND_PROCESS VARCHAR2 (100), SAMPLE_ID VARCHAR2 (50), UPDATED DATE ) / CREATE INDEX ANALYTICS.BQMS_EXPLODED_SAMPLES_IDX1 ON BQMS_EXPLODED_SAMPLES(KEY ASC) / GRANT ALL ON BQMS_EXPLODED_SAMPLES TO analyticsetl /

Step 3: Register a new JqlTask in “/home/unix/analytics/TigerETL3/jqlTasks.conf” file

Step 4: Test your ETL in manual mode

 

/home/unix/analytics/TigerETL3/runEtlAgent.sh Task db=analyticsetl 'task=jqlTask(taskName=bqms_exploded_samples)' 'delta=MillisDelta.manual(2021-mar-21 00:00:00,2021-mar-22 00:00:00)'

RUN3 is variable you should have defined in your TigerEnvironment (if it gives you trouble you can just replace it with /home/unix/analytics/TigerETL3/runEtlAgent.sh)

Step 5: Prepare a delta-tracker

Run following INSERT-statement and commit. Make sure you plug your TASK_NAME in appropriate place. This tracker will drive your ETL in automatic delta-driven mode. Pick a timestamp from which you want your ETL to start off.

INSERT INTO cognos.etl_property VALUES('analytics.tiger.agents.JqlTask.YOUR_TASK_NAME_HERE','2000-Jan-01 00:00:00')

Step 6: Test your ETL in delta-driven mode

 

Step 7: Schedule a cronjob to reach full automation

In order to avoid interference with production, you might want to put it 1st in your private crontab

crontab -e

Have it running like this for couple of days and then move your cronjob to production crontab.

HAPPY END

Some thoughts:

  1. So called “JQL-explosion” (splitting given field - say “SampleIDs -“ into items and combining them with the rest of fields for given ticket) seems convenient however it is very wasteful - all non-exploded fields will be duplicated as many times as #samples are found. This could possibly lead to performance problems.

Alternative normalized approach worth looking into is “having 2 ETL tasks”

  • 1st ETL would take care of all non-exploded fields. There will be no explosion (resp. duplication) and “key” would naturally be a PrimaryKey

  • 2nd ETL would include only 2 fields (key and Sample). This would produce a very lean table having all (key → Sample) links.

both ETLs could be placed in the same task in jqlTasks.conf file

These 2 tables would be nicely equipped with indices/PKs so JOINs should perform fast.

2.

 

 

Related content

Jira Extraction API v2 (Tableau Edition)
Jira Extraction API v2 (Tableau Edition)
More like this
Tableau Jira interfaces
Tableau Jira interfaces
Read with this
SampleAnalyticsWorkflow(SAW) web-service
SampleAnalyticsWorkflow(SAW) web-service
More like this
FilePusher - push tsv/csv metrics-files to Analytics
FilePusher - push tsv/csv metrics-files to Analytics
More like this
ANALYTICS Web Server
ANALYTICS Web Server
More like this
Refresh Datamarts with runEtl tool
Refresh Datamarts with runEtl tool
More like this