Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Motivation

...

ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW table (in SEQPROD db) is defined like this

Column nameData typeDescription
PDOVARCHAR2PDO
PDO_SAMPLEVARCHAR2PDO SAMPLE
DATA_TYPEVARCHAR2Exome, WGS
REQUEST_TYPENUMBER
"RequestFinalAggregation", "RequestTopOff", "RequestTopOffGroup", "RequestPoolCreated", "RequestSendToRework", "RequestHoldForTopOff", "RequestWaitingForNewRwData", "RequestReset", "RequestPCHoldForTopOff","RequestVolumeFinal"
1 encodes RequestFinalAggregation, 2 encodes RequestTopOff and so on
USER_NAMEVARCHAR2who triggered this action/transition
PARAMSVARCHAR2This is a free text where Tableau developers can store additional info re this transition which later can be used. Usually it's semicolon-separated-list-of-key=value-pairs
TIMESTAMPDATEwhen this action was triggered
IS_LATESTNUMBERBoolean (1 or 0), Is this the latest transition a given (PDO, PDO_SAMPLE, DATA_TYPE) has gone through. Every time new transition is recorded it comes with IS_LATEST=1 whereas previous latest transition is reset IS_LATEST=0
IP_ADDRESSVARCHAR2where (which IP) this request came from

...

  • requestType is the text-representation of this request (RequestFinalAggregation, RequestTopOff, etc)
  • each individual item is a quintuple representing a single sample like this pdo:pdo_sample:data_type:user_name:parameters (greens are optional, trailing colons are not required)

parameters represents any piece of data you might want to attach to a given sample/transition. It is a semicolon-separated-list-of-key=value-pairs. Gets recorded in PARAMS column.

...

  1. repeat it in the quintuple. Very wasteful.
  2. standalone upfront URL-parameter "&user_name=JohnDoe" (economical and solid). Unfortunately as of now, Tableau's urlAction can't inject the username like this.
  3. 2-step cookie-dance (economical but might be tricky to use from within Tableau)
    1. record the user_name-cookie in the browser environment by calling this URL http://analytics:8090/api/setCookie?cookies=sampleAnalyticsWorkflow.user_name=JohnDoe (you only need to do this once for your entire session). Cookie/credentials are configured to expire after 10 minutes.
    2. and then make the main SAW call as many times as you need

URL-length limitations

...

Two sides players are involved in this issue

...

  1. JOIN cognos.slxre_readgroup_metadata ON (pdo, pdo_sample, data_type) and lookup (research_project_id,collaborator_sample_id) for a given sample
  2. send JSON message to appropriate JMS queue

Environments

2 separate environments are provided


ProductionDevelopment/Test
tableANALYTICS.SAMPLE_ANALYTICS_WORKFLOW ANALYTICS.SAMPLE_ANALYTICS_WORKFLOW_TEST
web-servicehttp://analytics:8090/api/sampleAnalyticsWorkflowhttp://analytics:8090/api/sampleAnalyticsWorkflowTest
JMS queue

https://picard-jms.broadinstitute.org:8161/admin/browse.jsp?JMSDestination=broad.pushtocloud.enqueue

Pipeline is picking up messages from this queue.

https://picard-jms-dev.broadinstitute.org:8161/admin/browse.jsp?JMSDestination=broad.pushtocloud.analytics

No listener is registered on this queue.

JMS browsehttps://picard-jms.broadinstitute.org:8161/admin/browse.jsphttps://picard-jms-dev.broadinstitute.org:8161/admin/browse.jsp

...

Code Block
languagesql
titlePARAMS-extractEXTRACT_PARAM
SELECT 
	extract_param(a.params, 'source') source,
	extract_param(a.params, 'contamination') contamination,
	a.* 
FROM analytics.sample_analytics_workflow_test a 
ORDER BY a.TIMESTAMP desc

...

Tricks are all variations of "how to make The Item (sample description) as short as possible".

  1. take user_name out of items (see authentication above)
  2. take out take PDO-, SM- prefixes out of resp from . PDO, PDO_SAMPLE. SAW will automatically add put them back as needed.
  3. parameters which are pertinent to all samples (for example source=RQC) should be moved to standalone extra_info parameter. SAW will disperse it to PARAMS column of all samples involved.
  4. regarding parameters which are sample-specific (for example xShort=123;yLong=567)
    1. move all keys to standalone params_fields url-parameter as colon-separated-list-of-keys
    2. in the items/samples parameter leave only the values as semicolon-separated-list-of-values. SAW will pair back keys/values accordingly and insert it into PARAMS column.
  5. In item's Use single-letter encoding for data_type, encode Exome with E (resp.WGS with W): (E)xome, (W)GS.

Following example employs all the tricks (implies cookie-dance 1st step has been completed):

http://analytics:8090/api/sampleAnalyticsWorkflowTest?request_type=RequestFinalAggregation&extra_info=source=RQC&params_fields=zShort:xLong&items=16653:HP3LY:W::0.111;55,16653:HP3L9:W::0.222;66,16485:HE5F1:W::0.333;77,16485:HF32S:W::0.444;88&EOURL=1Image Removed

Image Added

SAW extensions

  1. Exome Autopush ETL agent (click for more details)

  2. Manual Push for Final Aggregation using pre-compiled list of samples

    1. In EtlShell: copy/paste comma-separated list of (pdo:pdo_samples:datatype) triplets (note that within each triplet elements are separated by colon)

      Code Block
      languagescala
      import analytics.tiger.agents.SAW
      CognosDB("analytics.tiger.agents.SAW.PushDev.Manual", SAW.manual("PDO-111:SM-111:E,PDO-222:SM-222:E") flatMap SAW.finalAggDev("nasko"))

      Make sure you provide appropriate user_name and Dev/Prod labels.

  3. Manual Push for Final Aggregation using SQL to get list of samples

    1. In EtlShell: load (pdo:pdo_samples:datatype) triplets as a single column via SQL

      Code Block
      languagescala
      import analytics.tiger.agents.SAW
      val sql = """SELECT m.pdo || ':' || m.pdo_sample || ':' || m.data_type data FROM cognos.rapidqc_agg m WHERE m.pdo='PDO-17350' AND m.pdo_sample IN ('SM-I3797','SM-I2NWY','SM-I371P','SM-I2P3C')"""
      CognosDB("analytics.tiger.agents.SAW.PushDev.viaSQL", SAW.viaSQL(sql) flatMap SAW.finalAggDev("nasko"))