Request:SUPPORT-9123
My samples are missing BAM file paths in PDO Tracker report. This translates into analysis are missing in the data source (cognos.pdo_star5_aux). The BAM file path is contructed in the report based on the aggregation analysis.
Troubleshooting process.
Reproduce the issue in the report and then in SQL.
SELECT a.aggregation_project, a.analysis_id, a.data_type, a.runs_end, a.bsp_collaborator_sample_id, a.* FROM pdo_star5_aux a WHERE a.pdo_sample ='SM-JVCPT'
;
-- returns NULL for aggregation_project and analysis_id. This means the PDO Star ETL didn't pick the aggregation. This data comes from aggregation data mart (cognos.slxre2_pagg_sample)Check the aggregation data
SELECT * FROM slxre2_pagg_sample a WHERE a.SAMPLE ='microbial_pos_1159'; -- this is the bsp_collaborator_sample_id from Q1.
-- returns no records.Check whether there are aggregations in the raw data - where the pipeline records the metrics.
SELECT * FROM metrics.aggregation a WHERE a.SAMPLE ='microbial_pos_1159' -- this is the bsp_collaborator_sample_id from Q1.
-- returns no records. This means that the aggregation has not been run yet. It can be queued (but we don't have visibility to that ) or the sequenced read groups could have been blacklisted. Blacklisted read groups are not included in the aggregation.
Check for sequencing data. Set a run_date filter that should capture the runs.
SELECT SYSDATE, SYSDATE - 2/24, a.* FROM slxre_readgroup_metadata a WHERE a.collaborator_sample_id ='microbial_pos_1159' AND a.run_date >= SYSDATE - 20 --'15-nov-2022'
-- returns the read groups that the sample was sequenced on.
5. Check whether the read groups were blacklisted.
5.1. By looking up the blacklist table:
SELECT * FROM seq20.sequence_data_blacklist b WHERE -- b.flowcell_barcode = 'H3GKKDSX5' AND b.molecular_barcode_name = 'Illumina_P5-Nahok_P7-Loten' --AND b.lane IN (1,2) (b.flowcell_barcode, b.lane, b.molecular_barcode_name) IN ( ('H3GKKDSX5', 1, 'Illumina_P5-Nahok_P7-Loten'), ('H3GKKDSX5', 2, 'Illumina_P5-Nahok_P7-Loten') );
-- returns no records. Read groups are not blacklisted.
5.2. By joining the metadata and blacklisting tables
SELECT b.flowcell_barcode, a.* FROM slxre_readgroup_metadata a LEFT JOIN seq20.sequence_data_blacklist b ON b.flowcell_barcode = a.flowcell_barcode AND b.lane = a.lane AND b.molecular_barcode_name = a.molecular_indexing_scheme WHERE a.collaborator_sample_id ='microbial_pos_1159' AND a.run_date >= SYSDATE - 20 --'15-nov-2022'
-- returns all sequenced read groups but the flowcell from blacklist table (b.flowcell_barcode) is NULL. This means that these read groups are not present in the blacklist table.
Turns out this sample was not aggregated and its read groups were not blacklisted.
A similar checks should be done for all samples in the request and make sure that the same explanation is valid for all of them. ( replace "a.pdo_sample ='SM-JVCPT'" with "a.pdo_sample in ('SM-JVCPT', 'SM-...', ...)" and similarly in other queries look up all samples in a list.
In this case the request to run aggregation should be passed to Pipeline Ops.
Usually, when they complete, the data is automatically being picked up in order - in aggregation DM and then in the PDO Star DM.