Analytics DWH Infrastructure
The table below describes the Analytics data marts related to each step in the Lab.
Lab Process | Table Name and Primary key | Description |
---|---|---|
Sample Receipt (BSP)(informatical) | analytics.bsp_sample PK: sample_barcode | Metadata about the samples registered in BSP. |
Sample Receipt (Mercury)(informatical) | No Analytics DM. mercurydw.sample_fact PK: sample_barcode | Metadata about samples registered in Mercury. Data may overlap with analytics.bsp_sample for some controls and clinical samples. |
Sample Receipt (physical accessioning)
| analytics.sample_accession mercurydw.event_fact where lab_event_type = ‘SampleAccession’ | Accessioned samples with PDO if available. |
Extraction | analytics.extraction PK: (pdo, a.source_sample, a.extracted_sample) | Extracted samples related to certain events from mercurydw.event_fact. |
Initial Quantification | analytics.sample_qc_initial_quant mercurydw.lab_metric quant types: INITIAL_PICO, INITIAL_RIBO, INITIAL_SIZE_PEAK, INITIAL_SIZE_SMEAR | Most recent quant of each type per (PDO, PDO sample).
|
Plating | cognos.bsp_plated_samples PK: (PDO, PDO_sample_id, Plated_Sampel_id) | Includes plating information for Sequencing and Arrays from Mercury (work_request_item_id = -1, work_request_id = -1). The work_request_item_type is ‘Mercury Seq Import’ for Seq plating. The work_request_item_type = ‘Mercury Array Import’ for Array plating. This table includes legacy data for plating done in BSP (work_request_id points to a BSP work request for plating, and work_request_item_id - BSP work request item id for plating WR) The work_request_item_type = ‘SEQ_PLATING_REQUEST’ for Seq plating The work_request_item_type = ‘GAP_PLATING_REQUEST’ for Array plating
|
LC | cognos.slxre_library_lcset PK: (library_name, plated_sample_id) | Metadata of all libraries generated - type, plated sample (aliquot), PDO, PDO sample, LCSET, etc. |
Library QC | cognos.sample_qc_metrics PK: (content_name, Plated_sample_id, quant_type | QC metrics and metadata for libraries of all types (pond, pools) that went through some sort of QC - Pico, qPCR, etc. content_name corresponds to cognos.slxre_library_lcset.library_name quant_type is converted to a user-friendly label but matches the mercurydw.lab_metric.quant_type. |
Library QC | cognos.slxre2_pico_quants PK: (library_name, quant_type) | Only Pico QC metrics and metadata. Libraries that contain only 1 sample usually go through Pico. |
Library QC | cognos.slxre2_library_qpcr PK: library_name | qPCR QC metrics and metadata for the libraries that were qPCR’d. In the past only pools were qPCR’d but now days some processes run qPCR QC on single sample libraries. |
Library ancestry | cognos.slxre2_library_ancestor PK: | Every library associated with its direct and indirect parents |
Library ancestry | cognos.slxre2_rg_ancestry PK: | Every read group linked with the key library types. Used to link to library QC data. |
Sequencing Read Group Metadata | cognos.slxre_readgroup_metadata PK: (Flowcell_Barcode, Lane, Molecular_indexing_Scheme) | Metadata for each read group. Populates from Mercury Run API (https://mercury.broadinstitute.org:8443/Mercury/rest/IlluminaRun/query?runName=230303_SL-NVU_0674_AHFKG2DSX5) Populates as a step in Rghqs ETL agent , when the agent finds sequencing run analysis for a flowcell.
|
Run metadata | cognos.slxre2_organic_run PK: Run_name | Metadata of each sequencing run. Populated from Illumina Run Dashboard data (slxasync.runs and slxasync.run_events) |
Sequencing Run Metrics | cognos.slxre2_rghqs (read group level metrics) PK: (Flowcell_Barcode, Lane, Molecular_indexing_scheme ) cognos.slxre2_lane_hqs (lane level metrics) PK: (Flowcell_Barcode, Lane) cognos.slxre2_run_hqs (run level metrics) PK: Flowcell_Barcode analytics.illumina_read (data parsed from run’s summary files) PK: (Run_name, lane, read) *** read refers to forward template, forward barcode, reverse barcode and reverse template analytics.illumina_lane_aggregate (lane level base calling metrics aggregated from analytics.illumina_read) PK: (Runa_name, lane) | Metrics generated by Picard and DRAGEN pipeline + metadata, or from run’s summary.csv files. Run summary.csv file - Illumina runs generate a summary.csv file at run completion and it is stored in the run folder. Mercury team has setup the run folders to push summary files to the Analytics Metrics ETL, which processes them with the plugin called “Summary”. |
Aggregation metrics (Picard pipeline) | cognos.slxre2_pagg_library (Library level aggregation) PK: (Project, Sample, Library, Data_Type) cognos.slxre2_pagg_sample (Sample level aggregation) PK: (Project, Sample, Data_Type) | Aggregation metrics from Picard pipeline + metadata from the aggregated read groups. |
Aggregation metrics (DRAGEN pipeline) | analytics.dragen_sample_agg (Sample level aggregation) PK: PDO_Sample | Aggregation metrics from DRAGEN pipeline + metadata for the aggregated read groups. |
Aggregation metrics linked to Library QC quants | cognos.slxre2_library_agg_qc PK: (Project, Sample, Library, Data_Type) cognos.slxre2_sample_agg_qc PK: (Project, Sample, Data_Type) | Library / Sample level aggregation associated with key library QC quants |
PDO Sample status (sequencing orders) | cognos.PDO_Star5 - staging table cognos.PDO_STAR5_AUX | Shows where in the process is each PDO sample. Not all products are included here, only products listed in COGNOS.pdo_star_interesting_products table.
|
PDO Sample status (Arrays) | cognos.PDO_STAR_ARRAYS | Same idea and structure as PDO_STAR5_AUX but for Arrays PDOs |
Pacbio Sequencing metrics | analytics.pacbio - raw data from SmrtLink PK: (run_name, cell,_well, domain) analytics.pacbio2_run - pacbio run metadata from SmrtLink API PK: run_name analytics.pacbio2_barcoded_metrics - SmrtLink generated metrics per read group PK: (Site_id, Run_name, Cell_well, molecular_barcode) analytics.pacbio2_cell_metrics - SmrtLink generated metrics per cell PK: (Site_id, Run_name, Cell_well) | Pacbio table contains the raw data from SmrtLink in JSON format. Data is collected via Pacbio APIs. Site_id refers to various sources based in SmrtLink software version , data output, etc. Details about Site IDs can be found in analytics.pacbio_sites_table A number of PACBIO2_* views are created for various entities. NOTE: A number of pacbio views are created strictly for Mercuries data review - PACBIO_DTR_* and PACBIO_MRC_* |
Pacbio LIMS Metadata | analytics.pacbio2_mercury_metadata | Read group metadata from LIMS provided by Mercury API: https://mercury.broadinstitute.org:8443/Mercury/rest/pacbiorun/query?runName=r84063_20230620_200050 |
QC metrics generated by instruments or TAG analysis | analytics.metrics PK: (site_id, filename) analytics.metrics_sites - site description | Data from files is converted to JSPN format at stored in metrics table. Processes captured are described in metrics._sites table and include VVP, smartseq, qPCr standard curve.
|