Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
2016_01_28__patch
2016_01_28__patch

 

Panel
titleBGColorlightblue
titlePatch: October 2016

The LIHC RPPA data submitted by MDACC were discovered to be mislabelled MESO samples. The analyses and standard data pipelines for LIHC have been re-run using the corrected samples submitted in March, and the nozzle reports now contain notices of the discrepancies.

 

The following table shows the changes in sample counts for RPPA data as a result of this patch:

HNSC

+145

(357 total)

LIHC

+121

(184 total)

THCA

+146

(368 total)

 

Anchor
2016_01_28
2016_01_28

Panel
bgColorwhite
titleBGColorlightblue
titleJanuary 2016 data run
  1. Summary of sample changes (see the comprehensive samples report for more details):

    Clinical

    +8

    (11196 total)

    MAF

    +122

    (7099 total)

    mRNASeq

    +214

    (10267 total)

    rawMAF

    +1576

    (6322 total)

    RPPA

    +626

    (7429 total)

  2. Significant changes to clinical data processing, to accommodate new XSD 2.7 format adopted by TCGA:
    1. Per TCGA this new format is not backwards compatible with the previous XML, so if you use a parser customized to the previous XSD, it may not work correctly.  This change is confined to the XML, no other TCGA data is affected by this change.
    2. These changes are detailed at https://wiki.nci.nih.gov/display/TCGA/Release+1.39.1#Release1.39.1-XSD2.7Implementation
    3. In the new format, 'CQCF' parameters are removed. Instead, they are provided in new XML format, SSF.
    4. According to this change, 'cqcf' parameters are removed and "patient.tumor_samples" and "patient.normal_controls" parameters are added in the clinical merged data. 
    5. Added parameters of two new XML formats, OMF and SSF in the clinical merged data 
  3. Fix a bug in Methylation_Preprocess that caused duplicate entries in methylation data
  4. Added outlinks to TCGA Encyclopedia to the About and Documentation menus of the Broad GDAC Firehose website
  5. FireBrowse v1.1.24:
    1. Loaded this data run
    2. Numerous internal updates to further simplify deployment, which e.g. will help provision AWG-specific databases
    3. The HeartBeat API function now includes: the directory from and time at which the API server was launched
    4. Dropped beta clause from version

...

Panel
bgColorwhite
titleBGColorlightblue
titleAugust 2015 stddata Run
  1. Summary of sample changes (see the comprehensive samples report for more details):

    BCR

    +15

    (11367 total)

    Clinical

    +106

    (11164 total)

    CN

    -2

    (10985 total)

    LowP

    +120

    (1211 total)

    MAF

    +48

    (6786 total)

    Methylation

    +16

    (10971 total)

    miRSeq

    -4

    (10154 total)

    mRNASeq

    +8

    (10103 total)

    RPPA

    +650

    (6802 total)

  2. Extensive improvements to clinical data, in that Clinical_Pick_Tier1 archives now bundle 2 forms of values:

    1. Entire set of TCGA CDEs, verbatim (in new All_CDES.txt file): adding over 700 additional clinical parameters

    2. In addition to the CDE subset normalized by Firehose for downstream analyses (in <cohort>.clin.merged.picked.txt file)

    3. For example, to date the ACC picked file has contained less than 20 CDEs while All_CDEs.txt now contains more than 100.
    4. Followup values are merged, when available, to yield the most up-to-date values per CDE
    5. Corrected problem wherein some True/False values for regimen_indication CDE were erroneously swapped
    6. Created an interactive table CDEs, which on a single page shows exactly what CDEs are selected for analyses in Firehose for all disease cohorts. Updated the FireBrowse clinical samples API to reference this new CDE table
    7. Enhanced Merge_Clinical pipeline to leverage auxiliary CDEs when available (COAD, READ, ESCA): for all primary CDEs that also have a value in the aux CDE file (e.g. MSI), we now replace the primary value if it is NA and the aux value is not NA
  3. Extensive improvements to TCGA mutation data:
    1. New MAF for Diffuse Large B-cell Lymphoma (DLBC, 48 mutation samples)
    2. Oncotator now included in Firehose mutation pipelines, to standardize TCGA MAFs to a common format:
      1. This substantially improves the consistency and utility of TCGA MAFs
      2. hg18 MAFs lifted over to hg19
      3. All MAFs re-annotated against Gencode v19
      4. Oft-requested custom columns, such as amino acid change, now present in all MAFs
      5. Oncotated MAFs are available in 2 pipeline output archives
        1. Mutation_Packager_Oncotated_Calls
        2. Mutation_Packager_Oncotated_Raw_Calls
        reflecting the separation of MAFs into two sets (raw/automated and curated, per the Spring 2015 analysis run)
  4. Extensive improvements to RPPA data, fostering more robust automated processing and downstream analysis:
    1. Merge_protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data
      1. Validation and animal source suffixes now stripped off of antibody name to account for several new batches that no longer include them
      2. Now returns a union of antibody names from all samples, rather than failing when all samples don't have the exact same antibodies
        1. This does not fix the below RPPA issues when different names for the same antibodies are used (e.g. Acetyl-a-Tubulin-Lys40/Acetyl-a-Tubulin(Lys40), ARHI/DIRAS3) - in these cases each antibody name will appear on separate lines of the merged files until a fix is made at MDACC.
        2. This enables RPPA analysis for aggregate cohorts such as STES, KIPAN, and GBMLGG
    2. RPPA_AnnotateWithGene: normalizes the antibody reference files provided by MDACC into a two-column tsv with standardized header, gene name, and antibody name (stripped of suffixes). This file is now provided in the archive.
  5. FireBrowse v1.1.17 beta:
    1. Ingest this August 2015 data run:
      1. API through which Firehose-picked and normalized clinical data are accessible has been renamed to Samples/Clinical_FH

      2. Verbatim TCGA clinical data may be accessed through the new Samples/Clincal API

      3. A CDE will be reflected in either API only when it has a value other than NA for at least 1 patient case in any disease cohort.

      4. For backward compatibility, the Samples/ClinicalTier1 remains available (as a synonym for Samples/Clinical_FH)
      5. fbget Python and UNIX CLI bindings have been suitably updated
      6. Which makes it extremely easy to determine for what patients and cohorts any given CDE is defined: e.g.
        fbget clinical cde=gleason_score
        will show each patient case with a valid gleason_score, across all cohorts
    2. viewGene: now enforces rendering of at most 1 gene per Submit; if multiple genes are given the first is selected
    3. iCoMut:
      1. Popup tooltips for mutation panels now show total # of mutations AND fractional % of a given type (e.g. missense)
      2. New search feature, enabling one to see into which clusters/panels/etc a given patient (or set of patients) falls
    4. After extensive testing, upgraded backend database from Mongo2.x to Mongo3.x: v3 dramatically reduces the memory footprint and data storage sizes, which yields greater performance and also clears considerable breathing room to add more data APIs (e.g. for methylation, RPPA, etc) as well as AWG-specific databases
  6. Corrected item # 9 from Spring 2015 data run: missing RPKM aliqouts in COAD 
  7. Issues: 
    1. RPPA issues due to changes in new data file row names: these have been previously reported to MDACC.
      1. KIRP: ARHI-M-E -> DIRAS3-M-E (batch 2.0.0)
      2. LGG: Acetyl-a-Tubulin-Lys40-R-C -> Acetyl-a-Tubulin(Lys40)-R-C (batch 2.0.0)
      3. PAAD: DIRAS3-M-E -> ARHI-M-E (batch 1.2.0)
      4. STES: ARHI-M-E -> DIRAS3-M-E (batch 2.0.0)
    2. RPPA issue with KIRC antibody file:
      1. Gene names were missing for 15 antibodies. The gene names were located in an online supplement file, and manually added:

        CA9

        CA9

        SDHB

        Complex-II_subunit30

        GYG1

        GYG-Glycogenin1

        GYS1

        GYS

        GYS1

        GYS_pS641

        HIF1A

        HIF-1_alpha

        LDHA

        LDHA

        LDHB

        LDHB

        MTCO2

        Mitochondria

        ATP5A1

        Oxphos-complex-V_subunitb

        PKM2

        PKM2

        PYGB

        PYGB

        PYGB

        PYGB-AB2

        PYGL

        PYGL

        PYGM

        PYGM

...

Panel
bgColorwhite
titleBGColorlightblue
title2014_10_17
  • Summary of sample changes (see the comprehensive samples report for more details):

    Clinical

    +458

    (9593 total)

    CN

    +1152

    (10986 total)

    MAF

    -12

    (6061 total)

    miRSeq

    +1031

    (10051 total)

    mRNASeq

    +829

    (9730 total)

  • FireBrowse Updates:
    • Updated to reflect 2014_10_17
    • Added beta Analyses/FeatureTables api
    • Properly terminate final row of returned TSV and CSV files, with newline (\n)

Anchor
2014_09_02
2014_09_02

Panel
bgColorwhite
titleBGColorlightblue
title2014_09_02
  • Summary of sample changes (see the comprehensive samples report for more details):

    Clinical

    +441

    (9135 total)

    CN

    +360

    (9834 total)

    MAF

    +155

    (6073 total)

    Methylation

    +513

    (10422 total)

    miRSeq

    +145

    (9020 total)

    mRNASeq

    +353

    (8901 total)

    RPPA

    +303

    (4336 total)

  • mRNASeq data:
    • Z-score now computed for RSEM/RPKM mRNAseq, calculated as
      z = (expression in tumor sample - mean expression in all tumor samples) / standard deviation of expression in all tumor samples
  • mirSeq data:  
    • miRseq_Mature_Preprocess pipeline updated to miRbase v21, will now generate same mature miR calls as BCGSC
  • Clinical data updates: 
    • corrected value of dccuploaddate CDE to be an actual date value instead of text showing how the date would be computed
    • normalized CDEs to contain underscores for readability: e.g. dccuploaddate became dcc_upload_date, etc
      • this is evident in Merge_Clinical and Clinical_Pick_Tier1 pipelines
      • as well as in the results returned by the FireBrowse clinical samples API
    • 2 formerly PHI clinical data elements, race and ethnicity are now available in Clinical_Pick_Tier1 pipeline output
  • FireBrowse Updates:
    • Samples/mRNASeq API: 
      • new Z-scores (see above) added to results
      • geneID now reflected in results, because in some cases the gene name is absent
    • Samples/clinical samples API:
      • tier1_cde_name parameter added, to allow pinpoint retrieval of specific CDEs
      • date parameter removed, for consistency with other Samples APIs (only latest available data is served)
      • Added Implementation Notes documentation at top of clinical API section
    • New Metatdata/ClinicalTier1 API: lists names of all tier 1 clinical data elements (CDEs), unioned over all disease cohorts
    • New comprehensive file download dialogue, per disease cohort: 
      • See for example this example for adrenocortical carcinoma (ACC)
      • This enables one to see all files for any disease cohort, in a single view
      • And will be enhanced in the near future to have checkboxes to include/exclude files for aggregate download
      • Also linked to the data column for each disease cohort on main GDAC site

Anchor
2014_07_15
2014_07_15

...

Panel
title2013_02_03

  • Sample count changes:

    Clinical

    +90

    (6039 total)

    CN

    +304

    (6856 total)

    LowP

    +35

    (636 total)

    Methylation

    +118

    (6736 total)

    miRseq

    +435

    (6062 total)

    mRNAseq

    +143

    (4933 total)

  • Methylation data newly available for ESCA

  • These data are already reflected in IGV (File->Load From Server menu) 
  • Note that firehose_get v0.3.10 was released on Jan 31, to dynamically reflect the range of disease cohort names used in runs

...

Panel
title2012_12_06

  • Sample Changes (Negative values reflect recent redactions - please see the Samples Summary Report)

    BCR

    +111

    (7234 total)

    Clinical

    +111

    (5909 total)

    CN

    +206

    (6418 total)

    MAF

    -2

    (3181 total)

    Methylation

    +58

    (6522 total)

    miRseq

    -17

    (5627 total)

    mRNAseq

    -14

    (4357 total)

  • New curated MAF from the GBM AWG - contains 291 sample (an increase of 15 samples).
  • Level 3 Clinical Parameters updated:
    1. Addition of radiation exposure (corresponds to patient.personlifetimeriskradiationexposureindicator in Level 2 Data)
    2. Deletion of neoadjuvanttherapy (corresponded to patient.drugs.drug.regimenindication in Level 2 Data)
  • Reflects the following Level 2 clinical parameters related to smoking to the extent available (for BLCA, CESC, HNSC, LUAD, LUSC, and PANCAN8 cohorts) 
    1. patient.numberpackyearssmoked

    2. patient.stoppedsmokingyear

    3. patient.tobaccosmokinghistoryindicator

    4. patient.yearoftobaccosmokingonset

...

Panel
title2012_09_13

 

  • No significant sample changes 26 Aug - 06 Sept (see Spreadsheet).
  • To assist the melanoma AWG we delayed the first stddata run of Sept 2012 to incorporate pending submission of RPPA samples for SKCM, which appeared in our mirror as of 9/13.  As this was essentially the midpoint of September stddata__2012_09_13 WAS THE ONLY STDDATA RUN for SEPT 2012.  This simplified our work somewhat, without appreciably reducing the sample flow, while also allowing us to re-sync to our desired target stddata run schedule of the 1st and 15th of each month.

  • Sample Changes:

    BCR

    +71

    (6952 total)

    Clinical

    +8

    (5679 total)

    CN

    +4

    (5814 total)

    Methylation

    +118

    (5589 total)

    miRseq

    +684

    (4793 total)

    mRNAseq

    +36

    (3563 total)

    RPPA

    +442

    (3173 total)

  • The past 5 months of Standardized Data have been loaded into IGV:
    • Partitioned by reference genome - When choosing "Load from Server...", only the data for the currently selected reference (Human hg18/Human hg19) will be available via the menu.
    • Copy Number data available for both hg18 and hg19, both with and without germline samples
    • Meth450 data now available
    • To access, open IGV, and with Human hg18/hg19 selected as the reference, navigate:
      File -> Load from Server... -> The Cancer Genome Atlas -> TCGA GDAC -> Firehose Standard Data

  • RPPA samples newly available for three tumor types:
    • BLCA
    • SKCM
    • THCA
  • Potential RPPA issues (waiting on confirmation from M.D. Anderson)
    • Results may be reporting KDR rather than XIAP for KIRC and UCEC when converting using the supplied antibody annotations
    • LKB1 antibody may be reported as LBK1 for BRCA and OV
  • PANCANCER aggregate (containing 20+ disease types) replaced with PANCAN8 aggregate (containing only 8: BRCA COAD READ KIRC GBM LUSC OV UCEC)
  • COAD and READ reintroduced as separate types by request of the PANCAN8 AWG (COADREAD aggregate continues to be available)

    Image Modified

Anchor
2012_08_25
2012_08_25

...

Panel
title2012_08_04
  •  Sample Changes:

    BCR

    +35

    (6881 total)

    Clinical

    +7

    (5640 total)

    CN

    +101

    (5487 total)

    Methylation

    -1

    (5464 total)

    miR

    -1

    (1054 total)

    miRseq

    +113

    (4089 total)

    mRNA

    -1

    (2217 total)

    mRNAseq

    +38

    (3498 total)

    RPPA

    +237

    (2324 total)

    The -1's in this table were caused by the removal of several samples for two OV patients, due to an unintended clash between improved aliquot selection (as described in our FAQ) and a legacy internal blacklist from the TCGA pilot. The legacy blacklist has now been deprecated, in favor of relying solely on the transparency of redactions recorded in the DCC annotations database, and the samples will be present in future releases. The OV samples in question are:

    • TCGA-29-1704
      • CN
    • TCGA-23-1023
      • mRNA
      • miR
      • Methylation

    102 COADREAD WIGs missing from prior run should now be available. This issue is described in our email archive.

...

Panel
title2012_07_25
  • New Samples:

    BCR+65(6846 total)
    Clinical+6(5633 total)
    Methylation+140(5465 total)
    mRNAseq+170(3460 total)
    miRseq+195(3976 total)
  • Pipelines now recognize WashU as submitting BCR where applicable, which e.g. adds LAML clinical samples to our data stream.
  • The WIGs we fabricate to simulate coverage for MutSig have been updated to hg19 for:
    • BLCA
    • BRCA
    • CESC
    • KIRC
    • LUSC
    • LUAD
    • PRAD
    • STAD
    • UCEC
  • 102 WIGs missing from COADREAD. Please see explanation in our email archive.
  • to address item (9) in our 2012_06_23 Analysis Run Release Notes

...