For each data run a new panel should be added here, describing the significant functional or data changes in that analysis run. Mike will cut them from here and paste to the public page upon releasing that run
If sparse, do diffs with last analysis run, to see new tasks/data added (and optionally ask team, via email, to verify). The /wiki/spaces/GDAC/pages/844334194
This is likely to be either the penultimate or perhaps even final standard Firehose analysis run of the TCGA project. Custom AWG runs will continue for TCGA as needed.
Summary of sample changes (see the comprehensive samples report for more details):
BCR
+1
(11368 total)
Clinical
+32
(11196 total)
CN
+2
(10987 total)
MAF
+313
(7099 total)
Methylation
+1
(10972 total)
miRSeq
+2
(10156 total)
mRNASeq
+164
(10267 total)
rawMAF
+2072
(6322 total)
RPPA
+627
(7429 total)
- APOBEC pipelines updated:
- used median filtering in primary APOBEC analysis
- in downstream clinical correlations, corrected names of categorical variables and descriptions of how they were utilized
- cNMF clustering improvement: new criteria used to select best cluster, identical to that describe in Summer 2014 run (see below) for consensus hierarchical clustering:
- The cophenetic correlation coefficients and average silhouette values are used to determine the k with the most robust clusterings. From the plot of cophenetic correlation versus k, we select modes and the point preceding the greatest decrease in cophenetic correlation coefficient, and from these choose the k with the highest average silhouette value.
- Survival analysis: for all clinical correlations
- Modified the p-value calculation of survival analysis with continuous data. It now uses the quantile interval categorical values instead of continuous values.
Previously it had one hazard ratio value for one continuous value, but now has multiple hazard ratio values for quantile interval curves (and are now reflected in the plot legends)
- FireBrowse:
- updated to reflect these run results
- iCoMut:
- loaded 4 additional disease cohorts: DLBC, ESCA, SARC, and THYM
- Completed most of work for major new release, stay tuned for announcement next week, incorporating many graphical and data exploration enhancements
- Migrated implementation of our clustering codes away from GenePattern into FH native jobs, consolidating and simplifying along the way (needs more description/tailoring)
- The spearman correlation was used in the pipeline of Correlate_mRNAseq_vs_Mutation_APOBEC.
Table from 2013_09_23 analysis run; keep until next run is posted which corrects the GAF 3.0 issues
THCA | SKCM | LGG | KIRP | |||||||||||||||
GAF 2.1 323 Samples 6806 Mutations | GAF 3.0 401 Samples 6736 Mutations | GAF 2.1 228 Samples 189759 Mutations | GAF 3.0 228 Samples 189948 Mutations | GAF 2.1 217 Samples 25172 Mutations | GAF 3.0 220 Samples 23947 Mutations | GAF 2.1 111 Samples 7907 Mutations | GAF 3.0 112 Samples 7367 Mutations | |||||||||||
Rank | Gene | 2.1 Rank | Rank | Gene | 2.1 Rank | Rank | Gene | 2.1 Rank | Rank | Gene | 2.1 Rank | |||||||
1 | NRAS | NRAS | 1 | 1 | C15orf23 | C15orf23 | 1 | 1 | IL32 | TEAD3 | 5275 | 1 | IL32 | KCNK5 | 354 | |||
2 | BRAF | BRAF | 2 | 2 | CDKN2A | POLDIP2 | 6212 | 2 | IDH2 | IL32 | 1 | 2 | CDC27 | CDC27 | 2 | |||
3 | HRAS | HRAS | 3 | 3 | NRAS | NUDT11 | 16950 | 3 | IDH1 | ATRX | 5 | 3 | NF2 | IL32 | 1 | |||
4 | EMG1 | OTUD4 | 793 | 4 | BRAF | CDKN2A | 2 | 4 | TP53 | PRCP | 950 | 4 | PPARGC1B | NF2 | 3 | |||
5 | PTTG1IP | EIF1AX | 14 | 5 | OXA1L | NRAS | 3 | 5 | ATRX | IDH2 | 2 | 5 | SFRS2IP | PPARGC1B | 4 | |||
6 | RPTN | NUP93 | 500 | 6 | TP53 | BRAF | 4 | 6 | CIC | IDH1 | 3 | 6 | MET | PCDHGC5 | 12984 | |||
7 | TG | NLRP6 | 26 | 7 | STK19 | OXA1L | 5 | 7 | FUBP1 | TP53 | 4 | 7 | ELF3 | MET | 6 | |||
8 | TMCO2 | PPM1D | 13 | 8 | PTEN | TTN | 18 | 8 | NOTCH1 | HEATR3 | 239 | 8 | PCF11 | PLAC4 | 36 | |||
9 | R3HDM2 | MUC7 | 17 | 9 | DSG1 | UGT2B15 | 17 | 9 | PIK3R1 | CIC | 6 | 9 | LGI4 | PCF11 | 8 | |||
10 | PRB2 | OR56A1 | 43 | 10 | PPP6C | TP53 | 6 | 10 | PIK3CA | FUBP1 | 7 | 10 | RAB27B | LGI4 | 9 |