firehose_get version 0.4.13 (released 2018_07_31)
Please note that downloading data from the Broad TCGA GDAC site constitutes agreement to this data usage policy.
To help simplify retrieval of TCGA data and analysis results we've introduced firehose_get
. To use it, download the latest zip file from here, perform these 2 steps from a Unix-compatible command line
unix% unzip
unix% ./firehose_get
and follow the instructions (documentation excerpt below). If you are missing wget
, please look here for links to pre-built versions for your system, or just Google it. Finally, rather than keeping firehose_get
in the directory in which you downloaded and unzipped it, it's better to put it somewhere on your system where it can be found along your $PATH any time you might want to use it again, no matter what directory you might be working within.
firehose_get analyses latest
Retrieves: every result, for every disease cohort, in the latest GDAC Firehose run
firehose_get -tasks mutsig gistic analyses latest brca ucec
Retrieves: only Gistic and MutSig results for breast and uterine cancer
firehose_get -tasks mut analyses latest prad
Retrieves: all results which have "mut" in their name, such as MutSig, Mutation_Assessor, and any correlations to mutation data
firehose_get -tasks rna clinical stddata 2013_05_23
Retrieves: any data package with (case-insensitive) "rna" or "clinical" in its name, from the May 23, 2013 data run
% firehose_get --help firehose_get : retrieve open-access results of Broad Institute TCGA GDAC runs Version: 0.4.1 (Author: Michael S. Noble) Usage: firehose_get [flags] RunType Date [disease_cohort, ... ] Two arguments are required; the first must be one of analyses awg_gbm awg_hnsc awg_lgg awg_luad awg_pancan8 awg_skcm awg_stad awg_test awg_thca stddata while the second must EITHER be a date (in YYYY_MM_DD form) of an existing GDAC run of the given type OR 'latest'; use the -runs flag to discern what RunType+Date combinations are available. An optional 3rd, 4th etc argument may be specified to prune the retrieval, given as a subset of these case-insensitive TCGA disease cohort names: ACC BLCA BRCA CESC COAD COADREAD DLBC ESCA GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD LUSC OV PAAD PANCANCER PANCAN8 PANCAN12 PRAD READ SARC SKCM STAD THCA UCEC UCS (taken from Note that as a convenience 'analysis' and 'data' are accepted as synonyms for the 'analyses' and 'stddata' run types Flags: -a | -auth [cred] authorize the retrieval of password-protected results; the optional cred[entials] parameter must be one of 1) a username:password string 2) /a/path/to/a/wgetrc/file 3) the empty string If no credentials are supplied (empty string), then FHGETRC will be used if it is set in the environment and points to a regular file (which must be in WGETRC-conformant syntax); otherwise a username:password prompt will be issued. If both $FHGETRC is set in the environment AND a username:password parameter is specified here, then $FHGETRC will be ignored -b | -batch do not prompt: assume YES to all YES/NO queries -c | -cohorts list available disease cohorts -e | -echo show commands that would be run, but do nothing -h | -help | --help this message -l | -log write output to log file, instead of stdout -o | -only <list> further prune the set of archives retrieved, by INCLUDING ONLY results of pipelines whose names names match any of the given space-delimited list of patterns; matching is performed with glob-style wildcards, and is case-insensItive; prepending a tilde (i.e. ~) to a task name will cause it to be EXCLUDED from download; when no pattern list is given firehose_get will display all tasks in the selected run. NOTE: not all tasks will execute for all disease cohorts; what tasks are run depends upon the data available for that disease cohort -p | -platforms list data platforms available in Firehose runs (not implemented yet) -r | -runs list available Firehose runs -t | -tasks <list> same as -o|-only flag (kept for back-compatibility) -v display the version of firehose_get -x debugging: turn on bash set -x (warning: very verbose) Broad GDAC website: Broad GDAC email :