Motivation
There have been situations where end users need to review a sample, check relevant metrics and trigger a specific Action - for example "for a given sample, IF RapidQC has completed AND LOD>0 THEN submit "FinalAggregation" request in Pipeline's JMS queue". All of these actions to be done/initiated from within Tableau. Beyond visualization, Tableau can't do much else in terms of maintaining a "workflow" in DB, talking to external web-services like JMS and so on. That's why these job is being outsourced to our "SampleAnalyticsWorkflow" web-service - it's written in Scala so there is virtually no limit what tasks it can do for you. Tableau utilize it's UrlAction gadget to make a URL call to SAW service when appropriate.
How sample workflow is maintained in the DB
ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW table (in SEQPROD db) is defined like this
Column name | Data type | Description |
---|---|---|
PDO | VARCHAR2 | PDO |
PDO_SAMPLE | VARCHAR2 | PDO SAMPLE |
DATA_TYPE | VARCHAR2 | Exome, WGS |
REQUEST_TYPE | NUMBER | "RequestFinalAggregation", "RequestTopOff", "RequestTopOffGroup", "RequestPoolCreated", "RequestSendToRework", "RequestHoldForTopOff", "RequestWaitingForNewRwData", "RequestReset", "RequestPCHoldForTopOff","RequestVolumeFinal" |
USER_NAME | VARCHAR2 | who triggered this action/transition |
PARAMS | VARCHAR2 | This is a free text where Tableau developers can store additional info re this transition which later can be used. Usually it's semicolon-separated-list-of-key=value-pairs |
TIMESTAMP | DATE | when this action was triggered |
IS_LATEST | NUMBER | Boolean (1 or 0), Is this the latest transition a given (PDO, PDO_SAMPLE, DATA_TYPE) has gone through. Every time new transition is recorded it comes with IS_LATEST=1 whereas previous latest transition is reset IS_LATEST=0 |
IP_ADDRESS | VARCHAR2 | where (which IP) this request came from |
The SAW web-service
In general, SAW looks like this
http://analytics:8090/api/sampleAnalyticsWorkflow?request_type={requestType}&items={comma-separated-list-of-items}&EOURL=1
where
- requestType is the text-representation of this request (RequestFinalAggregation, RequestTopOff, etc)
- each individual item is a quintuple representing a single sample like this pdo:pdo_sample:data_type:user_name:parameters (greens are optional, trailing colons are not required)
parameters represents any piece of data you might want to attach to a given sample/transition. It is a semicolon-separated-list-of-key=value-pairs. Gets recorded in PARAMS column.
Authentication
There are 3 alternative ways to provide a "user_name"
- repeat it in the quintuple. Very wasteful.
- standalone upfront URL-parameter "&user_name=JohnDoe" (economical and solid). Unfortunately as of now, Tableau's urlAction can't inject the username like this.
- 2-step cookie-dance (economical but might be tricky to use from within Tableau)
- record the user_name-cookie in the browser environment by calling this URL http://analytics:8090/api/setCookie?cookies=sampleAnalyticsWorkflow.user_name=JohnDoe (you only need to do this once for your entire session). Cookie/credentials are configured to expire after 10 minutes.
- and then make the main SAW call as many times as you need
URL-length limitations
Two players are involved in this issue
Your environment (browser, OS, versions, etc) implicitly impose a limit on how long a URL can be. On top of that, our web-server provider (Akka-http) also impose limits on what URL-length is acceptable. (Currently akka-http limit is configured to 8192 chars and can possibly be extended further but this should be done rather coservatively since it might lead to more memory consumption). So, best way to answer the question "What URL-size is safe to pass" is to try/test and see. Fortunately, if any truncation happens upstream the last EOURL parameter would definitely not make it and SAW-service can safely detect and report accordingly.
Special provisions for RequestFinalAggregation request
All SAW requests insert record into SAMPLE_ANALYTICS_WORKFLOW table.
RequestFinalAggregation is the only request which also sends message to Pipeline JMS queue. This is done by SAW in 2 steps
- JOIN cognos.slxre_readgroup_metadata ON (pdo, pdo_sample, data_type) and lookup (research_project_id,collaborator_sample_id) for a given sample
- send JSON message to appropriate JMS queue
Environments
2 separate environments are provided
Production | Development/Test | |
---|---|---|
table | ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW | ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW_TEST |
web-service | http://analytics:8090/api/sampleAnalyticsWorkflow | http://analytics:8090/api/sampleAnalyticsWorkflowTest |
JMS queue | https://picard-jms.broadinstitute.org:8161/admin/browse.jsp?JMSDestination=broad.pushtocloud.enqueue Pipeline is picking up messages from this queue. | No listener is registered on this queue. |
JMS browse | https://picard-jms.broadinstitute.org:8161/admin/browse.jsp | https://picard-jms-dev.broadinstitute.org:8161/admin/browse.jsp |
Security
SAW is not very secure at this time. Everyone who knows the URL can call the service. IP-based restrictions (relatively easy to implement) can vastly reduce the number of machines allowed to access SAW.
EXTRACT_PARAM utility
PARAMS column may become very crowded and extraction of specific field would be tricky. There is a special custom function in Oracle which streamline this task. It allows you to ask for specific fields by "key" (not by position which is fragile)
SELECT extract_param(a.params, 'source') source, extract_param(a.params, 'contamination') contamination, a.* FROM analytics.sample_analytics_workflow_test a ORDER BY a.TIMESTAMP desc
Tricks to maximize the number of samples submitted
Tricks are all variations of "how to make The Item (sample description) as short as possible".
- take user_name out of items (see authentication above)
- take PDO-, SM- prefixes out of resp. PDO, PDO_SAMPLE. SAW will automatically put them back as needed.
- parameters which are pertinent to all samples (for example source=RQC) should be moved to standalone extra_info parameter. SAW will disperse it to PARAMS column of all samples involved.
- regarding parameters which are sample-specific (for example xShort=123;yLong=567)
- move all keys to standalone params_fields url-parameter as colon-separated-list-of-keys
- in the items/samples parameter leave only the values as semicolon-separated-list-of-values. SAW will pair back keys/values accordingly and insert it into PARAMS column.
- Use single-letter encoding for data_type: (E)xome, (W)GS.
Following example employs all the tricks (implies cookie-dance 1st step has been completed):
Manual Push for Final Aggregation
In EtlShell: by copy/pasting the (pdo:pdo_samples:datatype) triplets
import analytics.tiger.agents.SAW CognosDB("analytics.tiger.agents.SAW.PushDev.Manual", SAW.manual("PDO-111:SM-111:E,PDO-222:SM-222:E") flatMap SAW.finalAggDev("nasko"))
In EtlShell: by loading (pdo:pdo_samples:datatype) triplets via SQL
import analytics.tiger.agents.SAW val sql = """SELECT m.pdo || ':' || m.pdo_sample || ':' || m.data_type data FROM cognos.rapidqc_agg m WHERE m.pdo='PDO-17350' AND m.pdo_sample IN ('SM-I3797','SM-I2NWY','SM-I371P','SM-I2P3C')""" CognosDB("analytics.tiger.agents.SAW.PushDev.viaSQL", SAW.viaSQL(sql) flatMap SAW.finalAggDev("nasko"))
Make sure you provide appropriate user_name and Dev/Prod labels