/
Analytics DWH Infrastructure

Analytics DWH Infrastructure

The table below describes the Analytics data marts related to each step in the Lab.

Lab Process

Table Name and Primary key

Description

Lab Process

Table Name and Primary key

Description

Sample Receipt (BSP)(informatical)

analytics.bsp_sample

PK: sample_barcode

Metadata about the samples registered in BSP.

Sample Receipt (Mercury)(informatical)

No Analytics DM.

mercurydw.sample_fact

PK: sample_barcode

Metadata about samples registered in Mercury. Data may overlap with analytics.bsp_sample for some controls and clinical samples.

Sample Receipt (physical accessioning)

 

analytics.sample_accession

mercurydw.event_fact

where lab_event_type = ‘SampleAccession’

Accessioned samples with PDO if available.

Extraction

analytics.extraction

PK: (pdo, a.source_sample, a.extracted_sample)

Extracted samples related to certain events from mercurydw.event_fact.

Initial Quantification

analytics.sample_qc_initial_quant

mercurydw.lab_metric

quant types: INITIAL_PICO, INITIAL_RIBO, INITIAL_SIZE_PEAK, INITIAL_SIZE_SMEAR

Most recent quant of each type per (PDO, PDO sample).

  • Data is not captured if message associated with the metric doesn’t point to a PDO. May need to be revised.

Plating

cognos.bsp_plated_samples

PK: (PDO, PDO_sample_id, Plated_Sampel_id)

Includes plating information for Sequencing and Arrays from Mercury (work_request_item_id = -1, work_request_id = -1). The work_request_item_type is ‘Mercury Seq Import’ for Seq plating.

The work_request_item_type = ‘Mercury Array Import’ for Array plating.

This table includes legacy data for plating done in BSP (work_request_id points to a BSP work request for plating, and work_request_item_id - BSP work request item id for plating WR)

The work_request_item_type = ‘SEQ_PLATING_REQUEST’ for Seq plating

The work_request_item_type = ‘GAP_PLATING_REQUEST’ for Array plating

  • Plating might change when BSP is decommissioned. Not sure if Mercury would keep info about the actual aliquots Even if it does, whether this will be propagated to Run API.

LC

cognos.slxre_library_lcset

PK: (library_name, plated_sample_id)

Metadata of all libraries generated - type, plated sample (aliquot), PDO, PDO sample, LCSET, etc.

Library QC

cognos.sample_qc_metrics

PK: (content_name, Plated_sample_id, quant_type

QC metrics and metadata for libraries of all types (pond, pools) that went through some sort of QC - Pico, qPCR, etc.

content_name corresponds to cognos.slxre_library_lcset.library_name

quant_type is converted to a user-friendly label but matches the mercurydw.lab_metric.quant_type.

Library QC

cognos.slxre2_pico_quants

PK: (library_name, quant_type)

Only Pico QC metrics and metadata. Libraries that contain only 1 sample usually go through Pico.

Library QC

cognos.slxre2_library_qpcr

PK: library_name

qPCR QC metrics and metadata for the libraries that were qPCR’d. In the past only pools were qPCR’d but now days some processes run qPCR QC on single sample libraries.

Library ancestry

cognos.slxre2_library_ancestor

PK:

Every library associated with its direct and indirect parents

Library ancestry

cognos.slxre2_rg_ancestry

PK:

Every read group linked with the key library types. Used to link to library QC data.

Sequencing Read Group Metadata

cognos.slxre_readgroup_metadata

PK: (Flowcell_Barcode, Lane, Molecular_indexing_Scheme)

Metadata for each read group.

Populates from Mercury Run API (https://mercury.broadinstitute.org:8443/Mercury/rest/IlluminaRun/query?runName=230303_SL-NVU_0674_AHFKG2DSX5)

Populates as a step in Rghqs ETL agent , when the agent finds sequencing run analysis for a flowcell.

  • Molecular_indexing_Scheme aka molecular barcode

Run metadata

cognos.slxre2_organic_run

PK: Run_name

Metadata of each sequencing run. Populated from Illumina Run Dashboard data (slxasync.runs and slxasync.run_events)

Sequencing Run Metrics

cognos.slxre2_rghqs (read group level metrics)

PK: (Flowcell_Barcode, Lane, Molecular_indexing_scheme )

cognos.slxre2_lane_hqs (lane level metrics)

PK: (Flowcell_Barcode, Lane)

cognos.slxre2_run_hqs (run level metrics)

PK: Flowcell_Barcode

analytics.illumina_read (data parsed from run’s summary files)

PK: (Run_name, lane, read)

*** read refers to forward template, forward barcode, reverse barcode and reverse template

analytics.illumina_lane_aggregate (lane level base calling metrics aggregated from analytics.illumina_read)

PK: (Runa_name, lane)

Metrics generated by Picard and DRAGEN pipeline + metadata, or from run’s summary.csv files.

Run summary.csv file - Illumina runs generate a summary.csv file at run completion and it is stored in the run folder. Mercury team has setup the run folders to push summary files to the Analytics Metrics ETL, which processes them with the plugin called “Summary”.

Aggregation metrics (Picard pipeline)

cognos.slxre2_pagg_library (Library level aggregation)

PK: (Project, Sample, Library, Data_Type)

cognos.slxre2_pagg_sample (Sample level aggregation)

PK: (Project, Sample, Data_Type)

Aggregation metrics from Picard pipeline + metadata from the aggregated read groups.

Aggregation metrics (DRAGEN pipeline)

analytics.dragen_sample_agg (Sample level aggregation)

PK: PDO_Sample

Aggregation metrics from DRAGEN pipeline + metadata for the aggregated read groups.

Aggregation metrics linked to Library QC quants

cognos.slxre2_library_agg_qc

PK: (Project, Sample, Library, Data_Type)

cognos.slxre2_sample_agg_qc

PK: (Project, Sample, Data_Type)

Library / Sample level aggregation associated with key library QC quants

PDO Sample status (sequencing orders)

cognos.PDO_Star5 - staging table

cognos.PDO_STAR5_AUX

Shows where in the process is each PDO sample. Not all products are included here, only products listed in COGNOS.pdo_star_interesting_products table.
Statuses tracked here - whether samples was sequenced, analyzed and billed. Includes metrics from picard sample aggregations. Products requiring DRAGEN analysis have only deliverable DRAGEN metrics.

  • We should revisit what to include when more product move to DRAGEN analysis

PDO Sample status (Arrays)

cognos.PDO_STAR_ARRAYS

Same idea and structure as PDO_STAR5_AUX but for Arrays PDOs

Pacbio Sequencing metrics

analytics.pacbio - raw data from SmrtLink

PK: (run_name, cell,_well, domain)

analytics.pacbio2_run - pacbio run metadata from SmrtLink API

PK: run_name

analytics.pacbio2_barcoded_metrics - SmrtLink generated metrics per read group

PK: (Site_id, Run_name, Cell_well, molecular_barcode)

analytics.pacbio2_cell_metrics - SmrtLink generated metrics per cell

PK: (Site_id, Run_name, Cell_well)

Pacbio table contains the raw data from SmrtLink in JSON format. Data is collected via Pacbio APIs.

Site_id refers to various sources based in SmrtLink software version , data output, etc. Details about Site IDs can be found in analytics.pacbio_sites_table

A number of PACBIO2_* views are created for various entities.

NOTE: A number of pacbio views are created strictly for Mercuries data review - PACBIO_DTR_* and PACBIO_MRC_*

Pacbio LIMS Metadata

analytics.pacbio2_mercury_metadata

Read group metadata from LIMS provided by Mercury API:

https://mercury.broadinstitute.org:8443/Mercury/rest/pacbiorun/query?runName=r84063_20230620_200050

QC metrics generated by instruments or TAG analysis

analytics.metrics

PK: (site_id, filename)

analytics.metrics_sites - site description

Data from files is converted to JSPN format at stored in metrics table. Processes captured are described in metrics._sites table and include VVP, smartseq, qPCr standard curve.