Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. For GDAN pipelines we will store on-premises reference data in /xchip/cga/reference/GDAN. 

  2. This is analogous  to the TCGA reference directory /xchip/cga/reference/tcga but is not TCGA-centric.  In addition to having hg38 reference data, the GDAN reference tree will gather the bits and pieces of "hidden data" that for expedience has been accidentally squirreled away in less than ideal locations.

  3. For example, the The first entry in the GDAN reference tree was taken from /cga/tcga-gdac/hailei/FH/miRSeqpreprocess and moved to is ./GDAN/miR/miRSeqpreprocess/mature.21.fa.gz because reference data should persist in locations free of individual user identities.
    This data is used by the miRSeq preprocessing pipeline to filter miRsgz, which is used in the miRSeq preprocessor.

  4.  Ideally the reference directory will be migrate to a cloud bucket and referenced in cloud-based analysis pipelines, but that will take time.