Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated link to firehose_get

...

...

...

...

...

...

...

...

...

...

...

...

...

...


Style
.page-metadata UL LI {
display: none;
}
 
h1.page-title
{
display:none;
}

...

Panel

To help simplify retrieval of TCGA data and analysis results we've introduced firehose_get.  To use it, download the latest zip file from here, perform these 2 steps from a Unix-compatible command line

        unix%   unzip firehose_get_latest.zip
  unix%  ./firehose_get 

and follow the instructions (documentation excerpt below).   If you are missing wget, please look here for links to pre-built versions for your system, or just Google it. Finally, rather than keeping firehose_get in the directory in which you downloaded and unzipped it, it's better to put it somewhere on your system where it can be found along your $PATH any time you might want to use it again, no matter what directory you might be working within.


Panel
titleExamples
  • firehose_get analyses latest
    Retrieves: every result, for every disease cohort, in the latest GDAC Firehose run
  • firehose_get -tasks mutsig gistic  analyses latest brca ucec
    Retrieves: only Gistic and MutSig results for breast and uterine cancer
  • firehose_get -tasks mut analyses latest prad
    Retrieves: all results which have "mut" in their name, such as MutSig, Mutation_Assessor, and any correlations to mutation data

  • firehose_get -tasks rna clinical stddata 2013_05_23
    Retrieves: any data package with (case-insensitive) "rna" or "clinical" in its name, from the May 23, 2013 data run


Panel
titleDocumentation


No Format
%  firehose_get --help

firehose_get : retrieve open-access results of Broad Institute TCGA GDAC runs
Version: 0.4.1 (Author: Michael S. Noble)

Usage: firehose_get [flags]  RunType  Date  [disease_cohort, ... ]

Two arguments are required; the first must be one of
	analyses  awg_gbm  awg_hnsc  awg_lgg  
	awg_luad  awg_pancan8  awg_skcm  awg_stad  
	awg_test  awg_thca  stddata  

while the second must EITHER be a date (in YYYY_MM_DD form) of an
existing GDAC run of the given type OR 'latest'; use the -runs flag
to discern what RunType+Date combinations are available.  An optional
3rd, 4th etc argument may be specified to prune the retrieval, given
as a subset of these case-insensitive TCGA disease cohort names:

	ACC  BLCA  BRCA  CESC  COAD  COADREAD  DLBC  ESCA  
	GBM  HNSC  KICH  KIRC  KIRP  LAML  LGG  LIHC  
	LUAD  LUSC  OV  PAAD  PANCANCER  PANCAN8  PANCAN12  PRAD  
	READ  SARC  SKCM  STAD  THCA  UCEC  UCS  

(taken from https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm)
Note that as a convenience 'analysis' and 'data' are accepted as
synonyms for the 'analyses' and 'stddata' run types

Flags:
  -a | -auth [cred]   authorize the retrieval of password-protected
                      results; the optional cred[entials] parameter
                      must be one of
                              1) a username:password string
                              2) /a/path/to/a/wgetrc/file
                              3) the empty string
                      If no credentials are supplied (empty string),
                      then FHGETRC will be used if it is set in the
                      environment and points to a regular file (which
                      must be in WGETRC-conformant syntax); otherwise
                      a username:password prompt will be issued.  If
                      both $FHGETRC is set in the environment AND a
                      username:password parameter is specified here,
                      then $FHGETRC will be ignored
  -b | -batch         do not prompt: assume YES to all YES/NO queries
  -c | -cohorts       list available disease cohorts
  -e | -echo          show commands that would be run, but do nothing
  -h | -help | --help this message
  -l | -log           write output to log file, instead of stdout
  -o | -only <list>   further prune the set of archives retrieved, by
                      INCLUDING ONLY results of pipelines whose names
                      names match any of the given space-delimited list
                      of patterns; matching is performed with glob-style
                      wildcards, and is case-insensItive; prepending
                      a tilde (i.e. ~) to a task name will cause it
                      to be EXCLUDED from download; when no pattern
                      list is given firehose_get will display all tasks in
                      the selected run.
                      NOTE: not all tasks will execute for all disease
                            cohorts; what tasks are run depends upon the
                            data available for that disease cohort
  -p | -platforms     list data platforms available in Firehose runs
                      (not implemented yet)
  -r | -runs          list available Firehose runs
  -t | -tasks <list>  same as -o|-only flag (kept for back-compatibility)
  -v                  display the version of firehose_get
  -x                  debugging: turn on bash set -x (warning: very verbose)
Broad GDAC website:   http://gdac.broadinstitute.org
Broad GDAC email  :   gdac@broadinstitute.org


...