Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel

The page is a central reference of information and notes for computational biologists and software engineers in the CGA group, as we transition from the TCGA era of GDAC to the GDC/GDAN era. As of July 2016, the Genomics Data Commons has replaced the TCGA Data Coordination Center as the repository of not only TCGA data but also for other existing genomics projects (such as TARGET), as well as future genomics projects.

Table of Contents

Reference Data

  1. For GDAN pipelines we will store on-premises reference data in /xchip/cga/reference/GDAN

    The first entry in this directory

  2. This is analogous  to the TCGA reference directory /xchip/cga/reference/tcga but is not TCGA-centric.  In addition to having hg38 reference data, the GDAN reference tree will gather the bits and pieces of "hidden data" that for expedience has been accidentally squirreled away in less than ideal locations.

  3. For example, the first entry in the GDAN reference tree was taken from /cga/tcga-gdac/hailei/FH/miRSeqpreprocess and moved to
    to ./GDAN/miR/miRSeqpreprocess/mature.21.fa.gz because reference data should persist in locations free of individual usernamesuser identities.
    This data is used by the miRSeq preprocessing pipeline to filter miRs.

  4.  Ideally the reference directory will be migrate to a cloud bucket and referenced in cloud-based analysis pipelines, but that will take time.