/
De-duplicate Covid samples in the Analytics Data Warehouse

De-duplicate Covid samples in the Analytics Data Warehouse

Objective: Sometimes Covid test samples get registered twice with different HL7_Message_ID. Only one of these orders progresses through the process and the other one is left hanging and captured in Sample Tie-out report.

First step is to de-duplicate in PostgreSQL DB, this is done by somebody in TAO team.

Second step is to de-duplicate in the Analytics DWH, which will clear the Pod Tie out report.

Steps to de-duplicate in Analytics DWH:

  1. Check if de-duplication is done in PostgreSQL

    select a.sample_id, a.time_delivered, a.hl7_message_id from samples a WHERE a.sample_id IN ( 'CR-03633985', 'CR-03751106', 'CR-03751120' )

This query should return as many samples as there are in the filter.

2. Check if duplication exists in the Analytics DWH

SELECT * FROM analytics.beacon_samples a WHERE a.sample_id IN ( -- list of samples from the ticket 'CR-03633985', 'CR-03751106', 'CR-03751120' ) ORDER BY a.sample_id ; SELECT * FROM analytics.covid_pod_samples a WHERE a.sample_id IN ( -- list of samples from the ticket 'CR-03633985', 'CR-03751106', 'CR-03751120' ) ORDER BY a.sample_id

These queries should return 2 records per sample, one with a result, one without a result.

3. Identify the HL7_Message_IDs of the orders without a result:

 

SELECT * FROM analytics.beacon_samples a WHERE a.sample_id IN ( -- list of samples from the ticket 'CR-03633985', 'CR-03751106', 'CR-03751120' ) AND a.RESULT IS NULL

4. Delete the records with these HL7_Message_ID from ANALYTICS.BEACON_SAMPLES and ANALYTICS.COVID_POD_SAMPLES

 

5. Run the first query from this workflow to check if the records with results are still there and only those without results were deleted. If all looks good, then commit the transaction.

 

6. Check if Pod Tie out report has cleared out.