Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


titleArchive Nomenclature

As of 2017, our archives follow the new nomenclature given below:



Description of Permissible Values


A string of the form


for example: TCGA-ACC-TP.

The <disease_specification> most often refers to a single disease study given by its disease abbreviation , such as GBM for Glioblastoma Multiforme;  but may also refer to an aggregate of multiple diseases, such as PANCAN12 (which refers to a cohort of 12 diseases created to study pan-cancer trends) or COADREAD (which combines the single diseases COAD and READ into one cohort).

The optional <sample_type> suffix consists of a literal dash followed by a sample type code designating the tissue sample type; for example, the suffix "-TP" indicates that the given archive contains results based upon primary tumor data.  As a final example, here's how sample type codes  would most commonly map to sample sets in Firehose, for a single disease study:

Sample Set NameDescription
BLCAall tumor and normal samples for Bladder Urothelial Carcinoma (union of everything below)
BLCA-TPonly primary tumor samples
BLCA-TMonly metastatic tumor samples (if any)
BLCA-TRonly tumor recurrence samples (if any)

only tissue normal samples (if any)

BLCA-NBonly blood normal samples (if any)


Tasks should be named as


For example: CopyNumber_Gistic2. The datatypes correspond to columns 2-12 in any of our sample data tables

with several types spelled out in longer form for clarity as follows:

Short FormLong FormDescription
CNCopyNumberSNP6 copy number data
LowPCopyNumberLowPassLow pass DNASeqC copy number data
MAFMutationmutation calls


Eight numeric characters representing the date that the data is was mirrored from GDC. For example, 20170807 indicates August 7, 2017.


A small integer (usually single digit) indicating how many times the given <TaskName> was successfully run in the given pass.