Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 39 Next »


The problem

While PacBio’s native SMRTLink tool allows for looking at metrics for one specific run at a time, there is no easy way to query for metrics over time across multiple runs and consume it as a simple tabular dataset. The latter is very important since it would enable powerful analytics. Moreover, other systems/tools (datareview page, secondary re-analyses, etc) can also benefit from it since once data is properly structured in a “datamart” it can be used by anybody.

Challenges

We do know all these metrics are scattered in XML/JSON files all over the place in complicated folder structure in our onprem Linux system. So called “raw” metrics are relatively easy to be linked to a (run, cellWell)

The “cromwell” metrics however are particularly painful since the only way to link them back to (run, cellWell) is to track down the “symbolicLink” in relevant “inputs” folder riddled with random UUIDs all along. This requires fair amount of linux voodoo magic which significantly slows down new development.

Mission statement

It would be great if all teams (Analytics, lab, DSP, Mercury, etc) can query the metrics from our PACBIO datamart in streamlined way. Software engineers would merely use SQL/JSON to extract fields they need in very declarative way.

What will it take - the “mapping” process

For all this to work, we need to go through the “mapping” process - figuring out where all interesting SMRTLink fields are stored in the file system. Usually it goes like this: the Lab (our domain experts) would say “hey, we are interested in smrtlink field XYZ, screenshot attached, and we believe it’s stored in file …XYZ.json”. Then we (the software engineers) will implement a tiny sql/json extraction code and then all teams would be able to use it. So that for example fields which DSP has introduced will become available to other teams and vice versa.

It is very important that files digested by “metrics-flattener ETL” to be easily compared to Smrtlink-screens side by side. That’s why “Flattened metrics viewer“ was created.

PacBio metrics acceptable ranges - this is the document driving the mapping effort.

What is this “domain” field all about ?

domain” is synthetic field derived from the location of original file being captured. It’s basically the location where these random UUIDs are masked out.

CROMWELL/sl_collection_reports/*/call-pbreports_barcode/execution/barcode.report.json
CROMWELL/sl_collection_reports/*/call-pbreports_barcode/execution/per_barcode_reports.datastore.json
CROMWELL/sl_collection_reports/*/call-pbreports_barcode/execution/per_barcode_reports/*/dataset_stats.json*
CROMWELL/sl_collection_reports/*/call-pbreports_barcode/execution/task-report.json
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/adapter.report.json
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/ccs.report.json*
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/control.report.json
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/detect_cpg_methyl.report.json
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/loading.report.json
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/raw_data.report.json
CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/task-report.json*
DATAROOT/*/*/*.5mc_report.json
DATAROOT/*/*/*.ccs_reports.json
DATAROOT/*/*/*.consensusreadset.xml
DATAROOT/*/*/*.lima_guess.json
DATAROOT/*/*/*.metadata.xml
DATAROOT/*/*/*.run.metadata.xml
DATAROOT/*/*/*.sts.xml
DATAROOT/*/*/*.unbarcoded.consensusreadset.xml
DATAROOT/*/*/*/*.*.consensusreadset.xml*

As a result, all records for a given metrics-type can be easily filtered/grouped in SQL.
For example:

SELECT a.run_name, a.cell_well, c.*
FROM pacbio a,
json_table(DATA, '$[*]'
COLUMNS(
 "DNABarcode"           PATH '$.DNABarcode',
 "BioSample"            PATH '$.BioSample',
 "HiFi Reads"           PATH '$.attributes[*]?(@.id=="ccs2.number_of_ccs_reads").value',
 "HiFi Yield (bp)"      NUMBER   PATH '$.attributes[*]?(@.id=="ccs2.total_number_of_ccs_bases").value',
 "HiFi Read Length (mean, bp)"  NUMBER          PATH '$.attributes[*]?(@.id=="ccs2.mean_ccs_readlength").value',
 "HiFi Read Quality (median) accuracy"          PATH '$.attributes[*]?(@.id=="ccs2.median_accuracy").value',
 "HiFi Read Quality (median)"   NUMBER         PATH '$.attributes[*]?(@.id=="ccs2.median_qv").value'
)) AS c
WHERE site_id=3 AND a.domain='CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/ccs.report.json*''
AND a.run_name='r64386e_20220523_180557' AND a.cell_well='4_D01'

Metrics stored in “JSON-tables”

Bunch of interesting metrics (for example ccs2.hifi_length_summary.read_length) are stored in JSON-”tables”. Unfortunately they are organized in “column-based” fashion making it nearly impossible to extract metrics from DB later on. Therefore a new synthetic twin tables are created where metrics are organized in “row-based” fashion (in other words things are “transposed”)

As a result, straightforward JSON-extraction from DB becomes possible

SELECT a.run_name, a.cell_well, "etl.dataset", c."rowid", 
    REPLACE(c."Read Length (bp)", CHR(191), '>=') "Read Length (bp)", -- '>=' UTF8 e2 89 a5
    "Reads", "Reads (%)" ,"Yield (bp)", "Yield (%)" 
FROM pacbio a,
json_table(DATA, '$[*]'
COLUMNS(
    "etl.dataset" path '$."etl.dataset"',
    NESTED PATH '$."etl.ccs2.hifi_length_summary"[*]' COLUMNS(
        "rowid"  PATH '$.rowid',
        "Read Length (bp)"         PATH '$."ccs2.hifi_length_summary.read_length"',
        "Reads"             NUMBER PATH '$."ccs2.hifi_length_summary.n_reads"',
        "Reads (%)"         NUMBER PATH '$."ccs2.hifi_length_summary.reads_pct"',
        "Yield (bp)"        NUMBER PATH '$."ccs2.hifi_length_summary.yield"',
        "Yield (%)"         NUMBER PATH '$."ccs2.hifi_length_summary.yield_pct"'
    )
)) AS c
WHERE site_id=3 AND a.domain='CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/ccs.report.json*' 
AND a.run_name='r64020e_20220519_191246' AND a.cell_well='1_B01'

Metrics stored in “attributes“ JSON-array

Other metrics are stored in “attributes” JSON-array (on the left side). A new synthetic “etl.attributes“ JSON-object is added to allow more natural JSON-extraction from the DB.

SELECT a.run_name, a.cell_well, "etl.dataset", "HiFi Reads", "HiFi Yield (bp)", "HiFi Read Length (mean, bp)"
FROM pacbio a,
json_table(DATA, '$[*]'
COLUMNS(
    "HiFi Reads"                    NUMBER PATH '$."etl.attributes"."ccs2.number_of_ccs_reads".value',
    "HiFi Yield (bp)"               NUMBER PATH '$."etl.attributes"."ccs2.total_number_of_ccs_bases".value',
    "HiFi Read Length (mean, bp)"   NUMBER PATH '$."etl.attributes"."ccs2.mean_ccs_readlength".value',
    "etl.dataset" path '$."etl.dataset"'
)) AS c
WHERE site_id=3 AND a.domain='CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/ccs.report.json*' 
AND a.run_name='r64020e_20220519_191246' AND a.cell_well='1_B01'

The “superJSON” tool

Imagine you have SMRTLink screen in front of you saying “Longest Subread N50: 21250” for a given run/cell. How can you find out which metrics-file this number comes from ?
Open the “superJSON” tool (all files are merged in there), expand all nodes and search for this exact number (smile) https://analytics.broadinstitute.org/pacbioMetrics/3/r64386e_20220523_180557/4_D01/superjson

SELECT a.run_name, a.cell_well, a.movie, c."raw_data_report.insert_n50"
FROM pacbio a,
json_table(DATA, '$[*]'
COLUMNS(
 "raw_data_report.insert_n50"  NUMBER PATH '$."etl.attributes"."raw_data_report.insert_n50".value'
)) AS c
WHERE site_id=3 AND a.domain='CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/raw_data.report.json'
AND a.run_name='r64386e_20220523_180557' AND a.cell_well='4_D01'

Additionally, couple of JSON documents are synthetically generated by the ETL at the “root” level. These might be useful for cross-reference purposes and can be seen via the “root” super-JSON
https://analytics.broadinstitute.org/pacbioMetrics/3/r64386e_20220523_180557/root/superjson

To search for a specific label/number you have in mind, use the “searchFor” parameter - a “searchResults” will be generated along with JSON-path and domain.

“per-bacrode” support

“per-barcode” metrics are supported by converting multiple “consensusreadset.xml“ files into JSONs and then merging these into a single “synthetic JSON-array“. These can be recognized by checking for trailing “*” at the end of “domain” field.

For a given cell and domain, if ETL comes across multiple files then it will naturally merge these into JSON-array.
However this logic is not sufficient if there is only 1 barcode registered per cell - therefore a list of exemption file-types (ccs.report.json) is kept to instruct the ETL to always merge these into JSON-array regardless of number of files.

Metrics extracted through PacBio API

Turns out some information is not available the JSON/XML files but can be extracted through the SMRTLink endpoints. Few new domains have been added: “API/runs” and “API/collections

How files are scraped from the file system - the linux voodoo magic

Elaborate chain of linux “find” commands is launched by the ETL in order to track down both DATAROOT and CROMWELL files. The Cromwell ones are particularly painful since pathnames are riddled with random UUIDs and crossing over symbolic link is required. Below is log of all the commands launched for 1 specific run.

scala> analytics.tiger.utils.AnalyticsDB("analytics.tiger.agents.PacBio.Sodium", analytics.tiger.agents.PacBio.Sodium.perRunETL("r64386e_20220523_180557",Map("override"->"true","verbose"->"true")), toCommit=true)
TIGERETL_RUNID: 4534558
find /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557 -regex ".*\.\(json\|xml\)" => 48 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/bc2012--bc2012/m64386e_220526_091216.bc2012--bc2012.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.ccs_reports.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.lima_guess.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.sts.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/.m64386e_220526_091216.run.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.5mc_report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/bc2095--bc2095/m64386e_220526_091216.bc2095--bc2095.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/bc2090--bc2090/m64386e_220526_091216.bc2090--bc2090.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/.m64386e_220526_091216.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.unbarcoded.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/bc2011--bc2011/m64386e_220526_091216.bc2011--bc2011.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/bc2012--bc2012/m64386e_220527_172851.bc2012--bc2012.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.ccs_reports.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/.m64386e_220527_172851.run.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.unbarcoded.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/.m64386e_220527_172851.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/bc2095--bc2095/m64386e_220527_172851.bc2095--bc2095.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/bc2090--bc2090/m64386e_220527_172851.bc2090--bc2090.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.lima_guess.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.sts.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/bc2011--bc2011/m64386e_220527_172851.bc2011--bc2011.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.5mc_report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.unbarcoded.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/bc2012--bc2012/m64386e_220525_014545.bc2012--bc2012.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.lima_guess.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/bc2095--bc2095/m64386e_220525_014545.bc2095--bc2095.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/bc2090--bc2090/m64386e_220525_014545.bc2090--bc2090.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/.m64386e_220525_014545.run.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.5mc_report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/bc2011--bc2011/m64386e_220525_014545.bc2011--bc2011.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/.m64386e_220525_014545.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.ccs_reports.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.sts.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/bc2012--bc2012/m64386e_220523_181627.bc2012--bc2012.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/.m64386e_220523_181627.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/bc2095--bc2095/m64386e_220523_181627.bc2095--bc2095.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/bc2090--bc2090/m64386e_220523_181627.bc2090--bc2090.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/.m64386e_220523_181627.run.metadata.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.unbarcoded.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.lima_guess.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.5mc_report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/bc2011--bc2011/m64386e_220523_181627.bc2011--bc2011.consensusreadset.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.sts.xml
   /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.ccs_reports.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/*/inputs/*.consensusreadset.xml" -type l -ls | grep /r64386e_20220523_180557/ | cat => 15 files returned
   9256292134   32 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 09:52 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/inputs/480159576/m64386e_220526_091216.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.consensusreadset.xml
   9387650676   32 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 10:02 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/inputs/478310612/m64386e_220523_181627.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.consensusreadset.xml
   9294923607   32 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 10:07 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/inputs/479235094/m64386e_220525_014545.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.consensusreadset.xml
   9167679957   32 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 09:59 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/inputs/481084058/m64386e_220527_172851.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.consensusreadset.xml
   9088193591   24 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 10:07 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/inputs/479235094/m64386e_220525_014545.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.consensusreadset.xml
   9163607270   32 lrwxrwxrwx   1 pbprod   gppacbio      131 Jun  9 09:58 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/cfda101e-bd68-4ae3-a0d0-9e9491e60dd2/call-import_dataset_reports/inputs/481084058/m64386e_220527_172851.unbarcoded.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.unbarcoded.consensusreadset.xml
   9088194092   32 lrwxrwxrwx   1 pbprod   gppacbio      131 Jun  9 10:56 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/dde0dfad-e14c-47b9-b2da-b37c02c3ab1b/call-import_dataset_reports/inputs/480159576/m64386e_220526_091216.unbarcoded.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.unbarcoded.consensusreadset.xml
   9387713582   24 lrwxrwxrwx   1 pbprod   gppacbio      150 Jun  9 09:58 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/55615992-df0e-40e8-b131-5b45f7981a3a/call-import_dataset_reports/inputs/1130665717/m64386e_220527_172851.bc2012--bc2012.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/bc2012--bc2012/m64386e_220527_172851.bc2012--bc2012.consensusreadset.xml
   9294923510   32 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 09:59 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/inputs/481084058/m64386e_220527_172851.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/m64386e_220527_172851.consensusreadset.xml
   9390327733   32 lrwxrwxrwx   1 pbprod   gppacbio      131 Jun  9 10:57 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/76c55248-f8a2-4f04-9868-fa3fbe09fe65/call-import_dataset_reports/inputs/479235094/m64386e_220525_014545.unbarcoded.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/m64386e_220525_014545.unbarcoded.consensusreadset.xml
   9390326404   32 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 09:52 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/inputs/480159576/m64386e_220526_091216.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/3_C01/m64386e_220526_091216.consensusreadset.xml
   9390326842   32 lrwxrwxrwx   1 pbprod   gppacbio      131 Jun  9 10:14 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/05d863a1-8a18-4693-8faa-0153892341b7/call-import_dataset_reports/inputs/478310612/m64386e_220523_181627.unbarcoded.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.unbarcoded.consensusreadset.xml
   9167679974   32 lrwxrwxrwx   1 pbprod   gppacbio      150 Jun  9 10:01 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/13f78080-40dd-4bb3-bf7a-8a3bb21ff542/call-import_dataset_reports/inputs/1242954991/m64386e_220525_014545.bc2095--bc2095.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/2_B01/bc2095--bc2095/m64386e_220525_014545.bc2095--bc2095.consensusreadset.xml
   9387713609   32 lrwxrwxrwx   1 pbprod   gppacbio      150 Jun  9 09:59 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/b5e71499-f314-408e-974e-0231e36b7098/call-import_dataset_reports/inputs/-1356847117/m64386e_220527_172851.bc2011--bc2011.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/4_D01/bc2011--bc2011/m64386e_220527_172851.bc2011--bc2011.consensusreadset.xml
   9294923539   24 lrwxrwxrwx   1 pbprod   gppacbio      120 Jun  9 10:02 /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/inputs/478310612/m64386e_220523_181627.consensusreadset.xml -> /seq/gp_pacbio_prod/smrtlink/userdata/data_root/r64386e_20220523_180557/1_A01/m64386e_220523_181627.consensusreadset.xml
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/barcode.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/per_barcode_reports.datastore.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/per_barcode_reports/a4377b6f-5ed5-45b9-8c6e-da74f67b4719/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/per_barcode_reports/df9c94a6-92d6-4c89-950c-5b34958b6bc0/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/per_barcode_reports/42abca05-d793-43ce-b552-c18fe68ad0ef/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/per_barcode_reports/055d5d05-40d4-4441-ac45-5e05dee0a85d/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/a0823154-4bbd-4b0a-9817-f78742054619/call-pbreports_barcode/execution/task-report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/barcode.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/per_barcode_reports.datastore.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/per_barcode_reports/b7214b4a-f2c7-4a2b-883c-08a98585d239/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/per_barcode_reports/86ff6655-a08b-4222-aa4c-7fd132a2d2ec/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/per_barcode_reports/3739008b-2f93-4855-9ba1-f367c886034a/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/per_barcode_reports/613096b2-0df0-4c7b-8021-aedfdadcfcef/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/c58dc438-f021-426c-89ce-e82ee4728d62/call-pbreports_barcode/execution/task-report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/barcode.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/per_barcode_reports.datastore.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/per_barcode_reports/760e2d75-2397-4316-8a75-facd648bd127/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/per_barcode_reports/198bbcd8-56aa-4628-8241-fcccbcf7e8b7/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/per_barcode_reports/cb3377b8-7db7-430b-aa7e-27a8e7eae0dc/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/per_barcode_reports/eb7a0619-a212-4a0b-9019-be2995cfa6b0/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/b70fbab3-8f00-44b4-a0df-f8e0e607389e/call-pbreports_barcode/execution/task-report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/barcode.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/per_barcode_reports.datastore.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/per_barcode_reports/3fdb0ce5-4004-432c-9903-8ab90e067e35/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/per_barcode_reports/2dc32207-607f-4150-8443-8a3434f6b283/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/per_barcode_reports/f86226fa-e794-40d7-b8c1-476079643dfa/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/per_barcode_reports/22bb414b-6ea7-4060-90bf-a3fc06d395e9/dataset_stats.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_collection_reports/5675af6e-2370-41f2-b4bd-8b41454ed14e/call-pbreports_barcode/execution/task-report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/adapter.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/raw_data.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/control.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/loading.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/detect_cpg_methyl.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/8578251e-cf2b-4a64-bf60-934ea70bdf8c/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/cfda101e-bd68-4ae3-a0d0-9e9491e60dd2 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/cfda101e-bd68-4ae3-a0d0-9e9491e60dd2/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/cfda101e-bd68-4ae3-a0d0-9e9491e60dd2/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/cfda101e-bd68-4ae3-a0d0-9e9491e60dd2/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/dde0dfad-e14c-47b9-b2da-b37c02c3ab1b -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/dde0dfad-e14c-47b9-b2da-b37c02c3ab1b/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/dde0dfad-e14c-47b9-b2da-b37c02c3ab1b/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/dde0dfad-e14c-47b9-b2da-b37c02c3ab1b/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/55615992-df0e-40e8-b131-5b45f7981a3a -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/55615992-df0e-40e8-b131-5b45f7981a3a/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/55615992-df0e-40e8-b131-5b45f7981a3a/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/55615992-df0e-40e8-b131-5b45f7981a3a/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/adapter.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/raw_data.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/control.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/loading.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/detect_cpg_methyl.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/458fbc8c-f5d1-488c-982e-62dc87cfe4f2/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/76c55248-f8a2-4f04-9868-fa3fbe09fe65 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/76c55248-f8a2-4f04-9868-fa3fbe09fe65/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/76c55248-f8a2-4f04-9868-fa3fbe09fe65/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/76c55248-f8a2-4f04-9868-fa3fbe09fe65/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/adapter.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/raw_data.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/control.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/loading.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/detect_cpg_methyl.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/d59cd1d9-d7f0-4283-bcc2-f1f4ef02669c/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/05d863a1-8a18-4693-8faa-0153892341b7 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/05d863a1-8a18-4693-8faa-0153892341b7/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/05d863a1-8a18-4693-8faa-0153892341b7/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/05d863a1-8a18-4693-8faa-0153892341b7/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/13f78080-40dd-4bb3-bf7a-8a3bb21ff542 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/13f78080-40dd-4bb3-bf7a-8a3bb21ff542/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/13f78080-40dd-4bb3-bf7a-8a3bb21ff542/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/13f78080-40dd-4bb3-bf7a-8a3bb21ff542/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/b5e71499-f314-408e-974e-0231e36b7098 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/b5e71499-f314-408e-974e-0231e36b7098/*/execution/*.json" => 2 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/b5e71499-f314-408e-974e-0231e36b7098/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/b5e71499-f314-408e-974e-0231e36b7098/call-import_dataset_reports/execution/ccs.report.json
find /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34 -path "/seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/*/execution/*.json" => 7 files returned
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/adapter.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/raw_data.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/control.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/loading.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/task-report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/detect_cpg_methyl.report.json
   /seq/gp_pacbio_prod/smrtlink/userdata/jobs_root/cromwell-executions/sl_dataset_reports/16cab23f-16d9-4784-b544-4d4b1ea41b34/call-import_dataset_reports/execution/ccs.report.json

Technical caveats

  • This framework is tightly coupled to PacBio’s internal file-structure (unfortunately and inevitably). So, next time PacBio change their SMRTLink version, this solution may have to be fixed accordingly.

  • All metrics stored in PACBIO datamart are in JSON format. Metrics in XML files are converted into JSON

  • for each digested metrics file, a special “domain” field is generated - it allows for similar metrics to be grouped and queried via SQL later on

  • examples shown are for v11 installation on “sodium”. Once “skywalker” is operational switch over should be relatively easy.

  • ANALYTICS.PACBIO datamart (along with relevant views) is located in this Oracle instance

    db.analytics.url="jdbc:oracle:thin:@//seqprod.broadinstitute.org:1521/seqprod.broadinstitute.org"

    username: REPORTING

  • "ANALYTICS.PACBIO_STAR" view demonstrates how to merge together multiple files (ccs_report, loading, etc) in a flat per (run,cell_well) datasource. It is based on SmrtLink v10, hydrogen data (site_id=1) but techniques used are 100% legit.

  • Surgically extract fields from metrics-JSON via Oracle JSON

  • progress of Sodium PacBio flattened metrics ETL can be checked here ETL dashboard

  • rollback-protection is implemented so that ETL-run is cancelled if seen-before files are removed

  • No labels