Motivation
...
ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW table (in SEQPROD db) is defined like this
Column name | Data type | Description |
---|---|---|
PDO | VARCHAR2 | PDO |
PDO_SAMPLE | VARCHAR2 | PDO SAMPLE |
DATA_TYPE | VARCHAR2 | Exome, WGS |
REQUEST_TYPE | NUMBER | "RequestFinalAggregation", "RequestTopOff", "RequestTopOffGroup", "RequestPoolCreated", "RequestSendToRework", "RequestHoldForTopOff", "RequestWaitingForNewRwData", "RequestReset", "RequestPCHoldForTopOff","RequestVolumeFinal" |
USER_NAME | VARCHAR2 | who triggered this action/transition |
PARAMS | VARCHAR2 | This is a free text where Tableau developers can store additional info re this transition which later can be used. Usually it's semicolon-separated-list-of-key=value-pairs |
TIMESTAMP | DATE | when this action was triggered |
IS_LATEST | NUMBER | Boolean (1 or 0), Is this the latest transition a given (PDO, PDO_SAMPLE, DATA_TYPE) has gone through. Every time new transition is recorded it comes with IS_LATEST=1 whereas previous latest transition is reset IS_LATEST=0 |
IP_ADDRESS | VARCHAR2 | where (which IP) this request came from |
...
- requestType is the text-representation of this request (RequestFinalAggregation, RequestTopOff, etc)
- each individual item is a quintuple representing a single sample like this pdo:pdo_sample:data_type:user_name:parameters (greens are optional, trailing colons are not required)
parameters represents any piece of data you might want to attach to a given sample/transition. It is a semicolon-separated-list-of-key-=value-pairs. Gets recorded in PARAMS column.
Authentication
There are 3 alternative ways to provide a "user_name"
- repeat it in the quintuple. Very wasteful.
- standalone upfront URL-parameter "&user_name=JohnDoe" (economical and solid). Unfortunately as of now, Tableau's urlAction can't inject the username like this.
- 2-step the cookie-dance (economical but might be tricky to use from within Tableau)
- record the user_name-cookie in the browser environment by calling this URL http://analytics:8090/api/setCookie?cookies=sampleAnalyticsWorkflow.user_name=JohnDoe (you only need to do this once for your entire session). Cookie/credentials are configured to expire after 10 minutes.
- and then make the main SAW call as many times as you need
URL-length limitations
...
Two sides players are involved in this issue
...
RequestFinalAggregation is the only request which also sends message to Pipeline JMS queue. This is done by SAW in 2 steps
...
- JOIN cognos.slxre_readgroup_metadata ON (pdo, pdo_sample, data_type) and lookup (research_project_id,collaborator_sample_id) for a given sample
...
- send JSON message to appropriate JMS queue
Environments
2 separate environments are provided
artifact | Production | Development/Test | |
---|---|---|---|
table | ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW | ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW_TEST | |
web-service | http://analytics:8090/api/sampleAnalyticsWorkflow | http://analytics:8090/api/sampleAnalyticsWorkflowTest | |
JMS queue | https://picard-jms.broadinstitute.org:8161/admin/browse.jsp?JMSDestination=broad.pushtocloud.enqueue Pipeline is picking up messages from this queue. | No listener is registered on this queue. | |
JMS browse | https://picard-jms.broadinstitute.org:8161/admin/browse.jsp | https://picard-jms-dev.broadinstitute.org:8161/admin/browse.jsp |
...
SAW is not very secure at this time. Everyone who knows the URL can call the service. IP-based restrictions (relatively easy to implement) can vastly reduce the number of machines allowed to access SAW.
EXTRACT_PARAM
...
utility
PARAMS column may become very crowded and extraction of specific field would be tricky. There is a special custom function in Oracle which streamline this task. It allows you to ask for specific fields by "key" (not by position which is fragile)
Code Block | ||||
---|---|---|---|---|
| ||||
SELECT extract_param(a.params, 'source') source, extract_param(a.params, 'contamination') contamination, a.* FROM analytics.sample_analytics_workflow_test a ORDER BY a.TIMESTAMP desc |
...
Tricks are all variations of "how to make The Item (sample description) as short as possible".
- take user_name out of items (see authentication above)
- take out take PDO-, SM- prefixes out of resp from . PDO, PDO_SAMPLE. SAW will automatically add put them back as needed.
- parameters which are pertinent to all samples (for example source=RQC) should be moved to standalone extra_info parameter. SAW will disperse it to PARAMS column of all samples involved.
- regarding parameters which are sample-specific (for example xShort=123;yLong=567)
- move all keys to standalone params_fields url-parameter as colon-separated-list-of-keys
- in the items/samples parameter leave only the values as semicolon-separated-list-of-values. SAW will pair back keys/values accordingly and insert it into PARAMS column.
- In item's Use single-letter encoding for data_type, encode Exome with E (resp.WGS with W): (E)xome, (W)GS.
Following example employs all the tricks (implies cookie-dance 1st step has been completed):
SAW extensions
Exome Autopush ETL agent (click for more details)
Manual Push for Final Aggregation using pre-compiled list of samples
In EtlShell: copy/paste comma-separated list of (pdo:pdo_samples:datatype) triplets (note that within each triplet elements are separated by colon)
Code Block language scala import analytics.tiger.agents.SAW CognosDB("analytics.tiger.agents.SAW.PushDev.Manual", SAW.manual("PDO-111:SM-111:E,PDO-222:SM-222:E") flatMap SAW.finalAggDev("nasko"))
Make sure you provide appropriate user_name and Dev/Prod labels.
Manual Push for Final Aggregation using SQL to get list of samples
In EtlShell: load (pdo:pdo_samples:datatype) triplets as a single column via SQL
Code Block language scala import analytics.tiger.agents.SAW val sql = """SELECT m.pdo || ':' || m.pdo_sample || ':' || m.data_type data FROM cognos.rapidqc_agg m WHERE m.pdo='PDO-17350' AND m.pdo_sample IN ('SM-I3797','SM-I2NWY','SM-I371P','SM-I2P3C')""" CognosDB("analytics.tiger.agents.SAW.PushDev.viaSQL", SAW.viaSQL(sql) flatMap SAW.finalAggDev("nasko"))