ETL troubleshooting recipes (To make changes permanent in DB don't forget to issue COMMIT at the end.)
Issue: When upstream data change, some BSP plating info in BSP_PLATED_SAMPLES might need to be refreshed.
Remedy: Request a refresh for BspPlatingRequestAgent by issuing SQL like this in GAP_PROD.ANALYTICS:
INSERT INTObsp_plating_special_refreshesVALUES('6E3K2', SYSDATE);
COMMIT;
Note: There is no "SM-" prefix,
COVERED HERE Tiger3 ETL Shell
Issue: When upstream data change, some samples in PDO_STAR5 might need to be refreshed.
Remedy: Follow the instructions for Manual refresh of PDO Star5 using a query. The query should have the format below:
SELECT DISTINCT a.pdo_name||','||a.pdo_sample_id FROM ... a WHERE ...
COVERED HERE: Tiger3 ETL Shell
Issue/Error: Once a week (1am on Fri) PdoStar5 goes through 1y-refresh. Sometime you might want to trigger it earlier.
Diagnosis:
Remedy: Connect to "analytics" server using your favorite SSH tool and run the following line:
/home/unix/analytics/TigerETL3/runEtlAgent.sh Task task=etlObject agentName=analytics.tiger.agents.PdoStar db=analyticsetl 'delta=MillisDelta.loadFromDb(365d,0d)'
COVERED HERE: Procedure to refresh Analytics DWH when Product Names change
Issue: How to make PDO_STAR5 ETL pick a new product ?
Remedy: Add a new "interesting" product by issuing SQL like this in SEQPROD.COGNOS. Provide part number.
INSERT INTO pdo_star_interesting_products values('P-123456');
Note: Also create new JIRA RPT ticket and assign to raymond to track the following: Assign PdM-approved minimum requirement(s) for coverage met. pdo_star5_aux will need product_goal assigned as well as sample_coverage_normalized and sample_coverage_unformatted set. DCFM_RULES must be updated. DCFM_Agent may need to be updated. PDO_SEQ_STAR needs "Sample Coverage" calculated field updated.
STILL RELEVANT??? should this get wrapped and added to Tiger3 ETL Shell page?
Issue:This page contains a list of common ETL issues and its potential solutions.
Table of Contents |
---|
Missing sample aggregation/BAM file path in PDO Tracker or WGS/WES Project Tracker report
Issue: Missing BAM file path in PDO Tracker or WGS/WES Project Tracker report (PDO_STAR5_Aux DM) (for on-prem only)
...
- Seq Only PDOs do not get BAM files in PDO_STAR5_Aux DM
- Plated sample in BSP_Plated_Samples is different than slxre_readgroup_metadata.sample_id
- Read group of the PDO Sample is missing in SLXRE2_RGHQS, most likely because there isn't a record in metrics.basecall_analysis for metrics_type Indexed or Unindexed.
- PM billed sample before PDOSTAR5 picked the most recent aggregation or before sample met full coverage
- Determine what is the root cause (using SUPPORT-9123 as an example)
Remedy:
- None
- Check if Mercury DWH and Mercury Run API still do not agree about the plated sample. If they don't - seek help from Mercury team. If they match, a refresh of the flowcell should correct it. BSP exported samples might be associated with the wrong PDO in BSP_Plated_samples because the mapping in BSP is not accurate. If it is important to fix the data , BSP team could help and then request a refresh for the affected plated sample as instructed in the recipe on this page to update BSP Plating DM. Finally, make sure to refresh in PDO Star5 if it won't happen automatically.
- Make If there is data for Indexed or Unindexed on metrics.basecall_analysis, then the flowcell needs to be refreshed in the datamarts. Otherwise, make ticket for the PIpeline Pipeline and refer to PO-13231 as being the same issue.
- PDOs created within past year will get refreshed on Friday 2:55 am or Analytics manually refresh for faster data delivery. PDOs older than 1 year MUST be manually refreshed.
STILL RELEVANT. should this get wrapped and added to Tiger3 ETL Shell page?
- See instructions for PDOSTAR5 refresh in Checking metadata changes.
- Steps to determine root cause of missing BAM file path
Info |
---|
To view Mercury Run API, add run_name to the end of this query (after 'runName='): https://mercury.broadinstitute.org:8443/Mercury/rest/IlluminaRun/query?runName=190725_SL-NVM_0088_AHKYJVDSXX |
Readgroup Metadata refreshes with upstream data change
Issue: When upstream data change, some samples in ReadgroupMetadata might need to be refreshed. For more information, see instructions in Checking metadata changes.
Remedy: Request a refresh for RunMercuryAgent by issuing SQL like this the following query in SEQPROD.COGNOS: RunETL tool
BEGIN
...
Code Block | ||
---|---|---|
| ||
BEGIN cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>'XYZ1', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>'XYZ2', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>'XYZ3', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); ... cognos.data_warehouse_api.request_refresh(P_FLOWCELL_BARCODE=>' |
...
XYZX', P_REQUESTED_BY=>'JohnDoe', P_COMMENTS=>'Test', P_DEFERRAL_ANALYSIS_DATE=>null); |
...
END; |
Note: No COMMIT needed in this particular call - it's implicitly done for you. Just replace the XYZ with a flowcell barcode. Leave all other parameters there and unchanged.
COVERED HERE Tiger3 ETL Shell
Issue: How to prevent an entire FC from getting into ReadgroupMetadata (Analytics Blacklisting).
Remedy: Follow the instructions for blacklist runfolder from RunMercury/RghqsLoader agents
Note: Blacklist_timestamp is not null for blacklisted flowcells.
COVERED HERE Tiger3 ETL Shell
Issue: How to whitelist a FC for processing by Run Mercury and RGHQS agents.
Remedy: Follow the instructions for whitelist runfolder from RunMercury/RghqsLoader agents
COVERED HERE Tiger3 ETL Shell
Issue: How to refresh Picard Aggregation DMs for a specific set of samples. Note, this will not refresh any metrics unless they are part of a new aggregation (new version). Only basecall metrics rolled up at library and sample level get updated
Remedy: Follow the instructions for Manual Refresh of Picard Aggregations for a list of samples.
For more information, see instructions in Checking metadata changes.
PicardAggregator ETL fails with (COGNOS.SLXRE2_PAGG_SAMPLE_PK) violated error
Issue: PicardAggregator ETL fails with (COGNOS.SLXRE2_PAGG_SAMPLE_PK) violated error.
Cause: Most
- Most likely Is_latest flag is set for more than 1 aggregations.
- Another case is when samples are reprocessed on the cloud, sample renaming and research project renaming, where the old aggregation needs to be set to "not latest".
Remedy:
- There is an agent that looks for certain Is_latest flag violations (IsLatestCop) that specifies the list of metrics.aggregation records with the issues and automatically opens a PO ticket in the pipeline.
- Find the sample that causes it (run the query below) and make a blocker PO ticket + notify somebody from Pipeline team to address it ASAP because the ETL won't recover until this is fixed.
Code Block |
---|
...
| ||
SELECT DISTINCT SAMPLEs, agg_ids, proc_loc FROM (
SELECT
rg.flowcell_barcode, rg.lane, rg.molecular_barcode_name,
count(DISTINCT a.id ) n,
concat_string_csv(DISTINCT a.SAMPLE ) samples,
concat_string_csv(DISTINCT a.id) agg_ids
, concat_string_csv(DISTINCT a.processing_location) proc_loc
FROM metrics.aggregation a ,
metrics.aggregation_read_group rg
WHERE
a.id = rg.aggregation_id
AND a.is_latest <> 0
AND a.SAMPLE IN (
SELECT DISTINCT SAMPLE
FROM metrics.aggregation a
WHERE
a.SAMPLE <> 'NA12878'
AND nvl(a.modified_at, a.workflow_end_date) >= <START DATE FROM ETL dashboard>
AND nvl(a.modified_at, a.workflow_end_date) < <END DATE FROM ETL dashboard>
)
GROUP BY rg.flowcell_barcode, rg.lane, rg.molecular_barcode_name
HAVING count(DISTINCT a.id )>1
) |
Designation ETL fails with "ORA-30926: unable to get a stable set of rows in the source tables" error
Issue: Designation ETL fails with "ORA-30926: unable to get a stable set of rows in the source tables" error
Diagnosis: This is usually caused by a flowcell with 2 runs valid run (uncanceled), one of which should have been cancelled. Run the query below to determine the trouble flowcell:
Code Block |
---|
...
| ||
SELECT r.flowcell_barcode, count(DISTINCT r.run_name)
FROM slxre2_organic_run r
WHERE
r.is_cancelled = 0
AND r.run_type <> 'UNDECIDED'
AND r.run_date >= <SOME DATE>
GROUP BY r.flowcell_barcode
HAVING count(DISTINCT r.run_name)>1 |
Remedy: There is an agent (OrganicRun) that reviews if a flowcell has 2 valid runs, and emails the Sequencing team to clear the bad run from their dashboard. Designation ETL will recover as soon as the run is cleared.
OBSOLETE? IPI-61224 hasn't been active in years.
Issue: Scala ETL ERROR: analytics.etl.DeckEventAgent$ (Most likely 'lcSet' tag is missing)
Diagnosis: Sometimes this Squid web-service returns no LcSet. Comment about this incident is automatically posted inIPI-61224and ticket reopened.
Remedy: Just follow up with LIMS team. OurDeckEventAgent will catch up as soon as LIMS service is fixed. No other intervention is needed.
Issue: analytics.etl.PkViolationWarning. unique constraint (COGNOS.RGHQS_METADATA_PK) violated
Diagnosis: This happens when 2+ readgroups having same (fc, lane, mol_index) try to get into datamart. Picard calls it "CLASH". RunMercuryAgent would ignore abusing readgroups and move on with the rest of 'good' ones. Email would give you detailed troubleshooting info.
Remedy: There isn't much for us to do. If upstream issues are resolved you can request a ReadgroupMetadata refresh for the whole run/fc (see above). You can search for the RG in GPINFOJIRA by searching the FC ID to see if it has been blacklisted and already handled upstream.
OBSOLETE?
Issue/Error: ETL: DB error, server: (thorium or vorarestore), Error: ORA-12008: error in materialized view refresh path, ORA-25153: Temporary Tablespace is Empty
Diagnosis: This happens when the DB team is doing some kind of backup or transfer.
Remedy: These errors can be ignored. Everything is actually fine.
Want to see only the latest run for each ETL agent ? Just pass the "latest_run_only=true" parameter.
...
Synchronizing data types between metrics tables and Agg DM
Remedy:
- Compare the data types in Sample and Library level aggregations:
Code Block language sql SELECT sa.data_type, a.* FROM metrics.aggregation a , slxre2_pagg_sample sa WHERE a.id = sa.analysis_id AND (a.project, a.SAMPLE) IN ( ('P', 'S'), ... ) AND a.library IS NULL AND a.is_latest <>0 ; SELECT sa.data_type, a.* FROM metrics.aggregation a , slxre2_pagg_library sa WHERE a.id = sa.analysis_id AND (a.project, a.SAMPLE) IN ( ('P', 'S'), ... ) AND a.library IS NOT NULL AND a.is_latest <>0 ;
Update Analytics DMs
Code Block language sql MERGE INTO slxre2_pagg_sample sa USING ( SELECT a.id, a.data_type FROM metrics.aggregation a WHERE (a.project, a.SAMPLE) IN ( ('P', 'S'), ... ) AND a.library IS NULL AND a.is_latest <>0 ) d ON (sa.analysis_id = d.id) WHEN MATCHED THEN UPDATE SET sa.data_type = d.data_type ; MERGE INTO slxre2_pagg_library sa USING ( SELECT a.id, a.data_type FROM metrics.aggregation a WHERE (a.project, a.SAMPLE) IN ( ('P', 'S'), ... ) AND a.library IS NOT NULL AND a.is_latest <>0 ) d ON (sa.analysis_id = d.id) WHEN MATCHED THEN UPDATE SET sa.data_type = d.data_type ; MERGE INTO slxre2_library_agg_qc sa USING ( SELECT a.id, a.data_type FROM metrics.aggregation a WHERE (a.project, a.SAMPLE) IN ( ('P', 'S'), ... ) AND a.library IS NOT NULL AND a.is_latest <>0 ) d ON (sa.analysis_id = d.id) WHEN MATCHED THEN UPDATE SET sa.data_type = d.data_type ; MERGE INTO slxre2_sample_agg_qc sa USING ( SELECT a.id, a.data_type FROM metrics.aggregation a WHERE (a.project, a.SAMPLE) IN ( ('P', 'S'), ... ) AND a.library IS NOT NULL AND a.is_latest <>0 ) d ON (sa.analysis_id = d.id) WHEN MATCHED THEN UPDATE SET sa.data_type = d.data_type ;
Review queries in step 1 to confirm data_type changes.