Table of Contents | ||||
---|---|---|---|---|
|
...
Code Block |
---|
SELECT a.run_name, a.cell_well, "etl.dataset", c."rowid", c."Read Length (bp)" "Read Length (bp) RAW", -- for exploration purposes only --REPLACE(c."Read Length (bp)", CHR(191), '>=') "Read Length (bp)", -- '>=' UTF8 e2 89 a5 DECODE(rawtohex(c."Read Length (bp)"), 'BF2030' , '>= 0', 'BF20352C303030' , '>= 5000', 'BF2031302C303030', '>= 10000', 'BF2031352C303030', '>= 15000', 'BF2032302C303030', '>= 20000', 'BF2032352C303030', '>= 25000', 'BF2033302C303030', '>= 30000', 'BF2033352C303030', '>= 35000', 'BF2034302C303030', '>= 40000', rawtohex(c."Read Length (bp)") -- catch everything else ) "Read Length (bp)", "Reads", "Reads (%)" ,"Yield (bp)", "Yield (%)" FROM pacbio a, json_table(DATA, '$[*]' COLUMNS( "etl.dataset" path '$."etl.dataset"', NESTED PATH '$."etl.ccs2.hifi_length_summary"[*]' COLUMNS( "rowid" PATH '$.rowid', "Read Length (bp)" PATH '$.read_length', "Reads" NUMBER PATH '$.n_reads', "Reads (%)" NUMBER PATH '$.reads_pct', "Yield (bp)" NUMBER PATH '$.yield', "Yield (%)" NUMBER PATH '$.yield_pct' ) )) AS c WHERE site_id=6 AND a.domain='CROMWELL/sl_dataset_reports/*/call-import_dataset_reports/execution/ccs.report.json*' --AND rawtohex(c."Read Length (bp)") = 'BF2033302C303030' -- filter bucket >= 30000 AND a.run_name='r64218e_20221021_195314' AND a.cell_well='2_B01' |
...
Keep in mind that UTF8 characters (like ‘>=’) - nicely rendered in Chrome - may have variable-length bytes representation and therefore Oracle’s rawtohex function is necessary.
https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128&names=-
REPLACE(c."Read Length (bp)", CHR(191), ‘>=') "Read Length (bp)", -- '>=' UTF8 e2 89 a5
seems to do the trick but DECODE expression gives you more control.
UPDATE: this is even less cryptic way to deal with non UTF-8 characters
Code Block |
---|
ASCIISTR("HiFi LenSum read_length") = '\00BF 10,000' |
Metrics stored in “attributes“ JSON-array
...
“per-barcode” metrics are supported by converting multiple “consensusreadset.xml“ files into JSONs and then merging these into a single “synthetic JSON-array“. These can be recognized by checking for trailing “*” at the end of “domain” field.
...