Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Request:SUPPORT-9123
My samples are missing BAM file paths in PDO Tracker report. This translates into analysis are missing in the data source (cognos.pdo_star5_aux). The BAM file path is contructed in the report based on the aggregation analysis.

Troubleshooting process.

  1. Reproduce the issue in the report and then in SQL.

    SELECT a.aggregation_project, a.analysis_id, a.data_type, a.runs_end, 
    a.bsp_collaborator_sample_id, a.*
    FROM pdo_star5_aux a
    WHERE a.pdo_sample ='SM-JVCPT'


    ;
    -- returns NULL for aggregation_project and analysis_id. This means the PDO Star ETL didn't pick the aggregation. This data comes from aggregation data mart (cognos.slxre2_pagg_sample)

  2. Check the aggregation data

    SELECT *
    FROM slxre2_pagg_sample a
    WHERE a.SAMPLE ='microbial_pos_1159';  -- this is the bsp_collaborator_sample_id from Q1.


    -- returns no records.

  3. Check whether there are aggregations in the raw data - where the pipeline records the metrics.

  4. SELECT *
    FROM metrics.aggregation a
    WHERE a.SAMPLE ='microbial_pos_1159' -- this is the bsp_collaborator_sample_id from Q1.

-- returns no records. This means that the aggregation has not been run yet. It can be queued (but we don't have visibility to that ) or the sequenced read groups could have been blacklisted. Blacklisted read groups are not included in the aggregation.

  1. Check for sequencing data. Set a run_date filter that should capture the runs.

SELECT SYSDATE, SYSDATE - 2/24,  a.*
FROM slxre_readgroup_metadata a

WHERE a.collaborator_sample_id ='microbial_pos_1159'
AND a.run_date >= SYSDATE - 20  --'15-nov-2022'

-- returns the read groups that the sample was sequenced on.

5. Check whether the read groups were blacklisted.
5.1. By looking up the blacklist table:

SELECT *
FROM seq20.sequence_data_blacklist b
WHERE -- b.flowcell_barcode = 'H3GKKDSX5' AND b.molecular_barcode_name = 'Illumina_P5-Nahok_P7-Loten'
--AND b.lane IN (1,2)
(b.flowcell_barcode, b.lane, b.molecular_barcode_name) IN (
('H3GKKDSX5',	1,	'Illumina_P5-Nahok_P7-Loten'),
('H3GKKDSX5',	2,	'Illumina_P5-Nahok_P7-Loten')
);

-- returns no records. Read groups are not blacklisted.

5.2. By joining the metadata and blacklisting tables

SELECT b.flowcell_barcode,  a.*
FROM slxre_readgroup_metadata a
LEFT JOIN seq20.sequence_data_blacklist b ON b.flowcell_barcode = a.flowcell_barcode AND b.lane = a.lane AND b.molecular_barcode_name = a.molecular_indexing_scheme
WHERE a.collaborator_sample_id ='microbial_pos_1159'
AND a.run_date >= SYSDATE - 20  --'15-nov-2022'

-- returns all sequenced read groups but the flowcell from blacklist table (b.flowcell_barcode) is NULL. This means that these read groups are not present in the blacklist table.

Turns out this sample was not aggregated and its read groups were not blacklisted.
A similar checks should be done for all samples in the request and make sure that the same explanation is valid for all of them. ( replace "a.pdo_sample ='SM-JVCPT'" with "a.pdo_sample in ('SM-JVCPT', 'SM-...', ...)" and similarly in other queries look up all samples in a list.

In this case the request to run aggregation should be passed to Pipeline Ops.
Usually, when they complete, the data is automatically being picked up in order - in aggregation DM and then in the PDO Star DM.

  • No labels