Analytics DWH Infrastructure

The table below describes the Analytics data marts related to each step in the Lab.

Lab Process	Table Name and Primary key	Description

Lab Process	Table Name and Primary key	Description
Sample Receipt (BSP)(informatical)	analytics.bsp_sample PK: sample_barcode	Metadata about the samples registered in BSP.
Sample Receipt (Mercury)(informatical)	No Analytics DM. mercurydw.sample_fact PK: sample_barcode	Metadata about samples registered in Mercury. Data may overlap with analytics.bsp_sample for some controls and clinical samples.
Sample Receipt (physical accessioning)	analytics.sample_accession mercurydw.event_fact where lab_event_type = ‘SampleAccession’	Accessioned samples with PDO if available.
Extraction	analytics.extraction PK: (pdo, a.source_sample, a.extracted_sample)	Extracted samples related to certain events from mercurydw.event_fact.
Initial Quantification	analytics.sample_qc_initial_quant mercurydw.lab_metric quant types: INITIAL_PICO, INITIAL_RIBO, INITIAL_SIZE_PEAK, INITIAL_SIZE_SMEAR	Most recent quant of each type per (PDO, PDO sample). Data is not captured if message associated with the metric doesn’t point to a PDO. May need to be revised.
Plating	cognos.bsp_plated_samples PK: (PDO, PDO_sample_id, Plated_Sampel_id)	Includes plating information for Sequencing and Arrays from Mercury (work_request_item_id = -1, work_request_id = -1). The work_request_item_type is ‘Mercury Seq Import’ for Seq plating. The work_request_item_type = ‘Mercury Array Import’ for Array plating. This table includes legacy data for plating done in BSP (work_request_id points to a BSP work request for plating, and work_request_item_id - BSP work request item id for plating WR) The work_request_item_type = ‘SEQ_PLATING_REQUEST’ for Seq plating The work_request_item_type = ‘GAP_PLATING_REQUEST’ for Array plating Plating might change when BSP is decommissioned. Not sure if Mercury would keep info about the actual aliquots Even if it does, whether this will be propagated to Run API.
LC	cognos.slxre_library_lcset PK: (library_name, plated_sample_id)	Metadata of all libraries generated - type, plated sample (aliquot), PDO, PDO sample, LCSET, etc.
Library QC	cognos.sample_qc_metrics PK: (content_name, Plated_sample_id, quant_type	QC metrics and metadata for libraries of all types (pond, pools) that went through some sort of QC - Pico, qPCR, etc. content_name corresponds to cognos.slxre_library_lcset.library_name quant_type is converted to a user-friendly label but matches the mercurydw.lab_metric.quant_type.
Library QC	cognos.slxre2_pico_quants PK: (library_name, quant_type)	Only Pico QC metrics and metadata. Libraries that contain only 1 sample usually go through Pico.
Library QC	cognos.slxre2_library_qpcr PK: library_name	qPCR QC metrics and metadata for the libraries that were qPCR’d. In the past only pools were qPCR’d but now days some processes run qPCR QC on single sample libraries.
Library ancestry	cognos.slxre2_library_ancestor PK:	Every library associated with its direct and indirect parents
Library ancestry	cognos.slxre2_rg_ancestry PK:	Every read group linked with the key library types. Used to link to library QC data.
Sequencing Read Group Metadata	cognos.slxre_readgroup_metadata PK: (Flowcell_Barcode, Lane, Molecular_indexing_Scheme)	Metadata for each read group. Populates from Mercury Run API (https://mercury.broadinstitute.org:8443/Mercury/rest/IlluminaRun/query?runName=230303_SL-NVU_0674_AHFKG2DSX5) Populates as a step in Rghqs ETL agent , when the agent finds sequencing run analysis for a flowcell. Molecular_indexing_Scheme aka molecular barcode
Run metadata	cognos.slxre2_organic_run PK: Run_name	Metadata of each sequencing run. Populated from Illumina Run Dashboard data (slxasync.runs and slxasync.run_events)
Sequencing Run Metrics	cognos.slxre2_rghqs (read group level metrics) PK: (Flowcell_Barcode, Lane, Molecular_indexing_scheme ) cognos.slxre2_lane_hqs (lane level metrics) PK: (Flowcell_Barcode, Lane) cognos.slxre2_run_hqs (run level metrics) PK: Flowcell_Barcode analytics.illumina_read (data parsed from run’s summary files) PK: (Run_name, lane, read) *** read refers to forward template, forward barcode, reverse barcode and reverse template analytics.illumina_lane_aggregate (lane level base calling metrics aggregated from analytics.illumina_read) PK: (Runa_name, lane)	Metrics generated by Picard and DRAGEN pipeline + metadata, or from run’s summary.csv files. Run summary.csv file - Illumina runs generate a summary.csv file at run completion and it is stored in the run folder. Mercury team has setup the run folders to push summary files to the Analytics Metrics ETL, which processes them with the plugin called “Summary”.
Aggregation metrics (Picard pipeline)	cognos.slxre2_pagg_library (Library level aggregation) PK: (Project, Sample, Library, Data_Type) cognos.slxre2_pagg_sample (Sample level aggregation) PK: (Project, Sample, Data_Type)	Aggregation metrics from Picard pipeline + metadata from the aggregated read groups.
Aggregation metrics (DRAGEN pipeline)	analytics.dragen_sample_agg (Sample level aggregation) PK: PDO_Sample	Aggregation metrics from DRAGEN pipeline + metadata for the aggregated read groups.
Aggregation metrics linked to Library QC quants	cognos.slxre2_library_agg_qc PK: (Project, Sample, Library, Data_Type) cognos.slxre2_sample_agg_qc PK: (Project, Sample, Data_Type)	Library / Sample level aggregation associated with key library QC quants
PDO Sample status (sequencing orders)	cognos.PDO_Star5 - staging table cognos.PDO_STAR5_AUX	Shows where in the process is each PDO sample. Not all products are included here, only products listed in COGNOS.pdo_star_interesting_products table. Statuses tracked here - whether samples was sequenced, analyzed and billed. Includes metrics from picard sample aggregations. Products requiring DRAGEN analysis have only deliverable DRAGEN metrics. We should revisit what to include when more product move to DRAGEN analysis
PDO Sample status (Arrays)	cognos.PDO_STAR_ARRAYS	Same idea and structure as PDO_STAR5_AUX but for Arrays PDOs
Pacbio Sequencing metrics	analytics.pacbio - raw data from SmrtLink PK: (run_name, cell,_well, domain) analytics.pacbio2_run - pacbio run metadata from SmrtLink API PK: run_name analytics.pacbio2_barcoded_metrics - SmrtLink generated metrics per read group PK: (Site_id, Run_name, Cell_well, molecular_barcode) analytics.pacbio2_cell_metrics - SmrtLink generated metrics per cell PK: (Site_id, Run_name, Cell_well)	Pacbio table contains the raw data from SmrtLink in JSON format. Data is collected via Pacbio APIs. Site_id refers to various sources based in SmrtLink software version , data output, etc. Details about Site IDs can be found in analytics.pacbio_sites_table A number of PACBIO2_* views are created for various entities. NOTE: A number of pacbio views are created strictly for Mercuries data review - PACBIO_DTR_* and PACBIO_MRC_*
Pacbio LIMS Metadata	analytics.pacbio2_mercury_metadata	Read group metadata from LIMS provided by Mercury API: https://mercury.broadinstitute.org:8443/Mercury/rest/pacbiorun/query?runName=r84063_20230620_200050
QC metrics generated by instruments or TAG analysis	analytics.metrics PK: (site_id, filename) analytics.metrics_sites - site description	Data from files is converted to JSPN format at stored in metrics table. Processes captured are described in metrics._sites table and include VVP, smartseq, qPCr standard curve.

Analytics DWH Infrastructure

Related content