ETL troubleshooting recipes (To make changes permanent in DB don't forget to issue COMMIT at the end.)
Issue:This page contains a list of common ETL issues and its potential solutions.
Table of Contents |
---|
Missing BAM file path in PDO Tracker or WGS/WES Project Tracker report
Issue: Missing BAM file path in PDO Tracker or WGS/WES Project Tracker report (PDO_STAR5_Aux DM) (for on-prem only)
...
- None
- Check if Mercury DWH and Mercury Run API still do not agree about the plated sample. If they don't - seek help from Mercury team. If they match, a refresh of the flowcell should correct it. BSP exported samples might be associated with the wrong PDO in BSP_Plated_samples because the mapping in BSP is not accurate. If it is important to fix the data , BSP team could help and then request a refresh for the affected plated sample as instructed in the recipe on this page to update BSP Plating DM. Finally, make sure to refresh in PDO Star5 if it won't happen automatically.
- If there is data for Indexed or Unindexed on metrics.basecall_analysis, then the flowcell needs to be refreshed in the datamarts. Otherwise, make ticket for the Pipeline and refer to PO-13231 as being the same issue.
- PDOs created within past year will get refreshed on Friday 2:55 am or Analytics manually refresh for faster data delivery (put link to Checking Metadata Changes 5/5/19). PDOs older than 1 year MUST be manually refreshed. See instructions for PDOSTAR5 refresh in Checking metadata changes.
Readgroup Metadata refreshes with upstream data change
Issue: When upstream data change, some samples in ReadgroupMetadata might need to be refreshed. Instructions already on Checking Metadata Changes
Remedy: Request a refresh for RunMercuryAgent by issuing SQL like this the following query in SEQPROD.COGNOS:
Code Block | |
---|---|
|
BEGIN
...
| |
BEGIN cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>'XYZ1', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>'XYZ2', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>'XYZ3', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); ... cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>' |
...
XYZX', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); |
...
END; |
Note: No COMMIT needed in this particular call - it's implicitly done for you. Just replace the XYZ with a flowcell barcode. Leave all other parameters there and unchanged. (put link to Checking Metadata Changes 5/5/19)For more information, see instructions in Checking metadata changes.
PicardAggregator ETL fails with (COGNOS.SLXRE2_PAGG_SAMPLE_PK) violated error
Issue: PicardAggregator ETL fails with (COGNOS.SLXRE2_PAGG_SAMPLE_PK) violated error.
Cause:
- Most likely Is_latest flag is set for more than 1 aggregations.
- Another case is when samples are reprocessed on the cloud, sample renaming and research project renaming, where the old aggregation needs to be set to "not latest".
Remedy:
- There is an agent that looks for certain Is_latest flag violations (IsLatestCop) that specifies the list of metrics.aggregation records with the issues and automatically opens a PO ticket in the pipeline.
- Find the sample that causes it (run the query below) and make a blocker PO ticket + notify somebody from Pipeline team to address it ASAP because the ETL won't recover until this is fixed.
Code Block | ||
---|---|---|
| ||
SELECT DISTINCT SAMPLEs, agg_ids, proc_loc FROM ( SELECT rg.flowcell_barcode, rg.lane, rg.molecular_barcode_name, count(DISTINCT a.id ) n, concat_string_csv(DISTINCT a.SAMPLE ) samples, concat_string_csv(DISTINCT a.id) agg_ids , concat_string_csv(DISTINCT a.processing_location) proc_loc FROM metrics.aggregation a , metrics.aggregation_read_group rg WHERE a.id = rg.aggregation_id AND a.is_latest <> 0 AND a.SAMPLE IN ( SELECT DISTINCT SAMPLE FROM metrics.aggregation a WHERE a.SAMPLE <> 'NA12878' AND nvl(a.modified_at, a.workflow_end_date) >= <START DATE FROM ETL dashboard> AND nvl(a.modified_at, a.workflow_end_date) < <END DATE FROM ETL dashboard> ) GROUP BY rg.flowcell_barcode, rg.lane, rg.molecular_barcode_name HAVING count(DISTINCT a.id )>1 ) |
Designation ETL fails with "ORA-30926: unable to get a stable set of rows in the source tables" error
Issue: Designation ETL fails with "ORA-30926: unable to get a stable set of rows in the source tables" error
...