Download
firehose_get version 0.4.13 (released 2018_07_31)
Please note that downloading data from the Broad TCGA GDAC site constitutes agreement to this data usage policy.
To help simplify retrieval of TCGA data and analysis results we've introduced firehose_get. To use it, download the latest zip file from here, perform these 2 steps from a Unix-compatible command line
unix% unzip firehose_get_latest.zip unix% ./firehose_get
and follow the instructions (documentation excerpt below). If you are missing wget, please look here for links to pre-built versions for your system, or just Google it. Finally, rather than keeping firehose_get in the directory in which you downloaded and unzipped it, it's better to put it somewhere on your system where it can be found along your $PATH any time you might want to use it again, no matter what directory you might be working within.
firehose_get analyses latest
Retrieves: every result, for every disease cohort, in the latest GDAC Firehose run
firehose_get -tasks mutsig gistic analyses latest brca ucec
Retrieves: only Gistic and MutSig results for breast and uterine cancer
firehose_get -tasks mut analyses latest prad
Retrieves: all results which have "mut" in their name, such as MutSig, Mutation_Assessor, and any correlations to mutation data
firehose_get -tasks rna clinical stddata 2013_05_23
Retrieves: any data package with (case-insensitive) "rna" or "clinical" in its name, from the May 23, 2013 data run
% firehose_get --help
firehose_get : retrieve open-access results of Broad Institute TCGA GDAC runs
Version: 0.4.1 (Author: Michael S. Noble)
Usage: firehose_get [flags] RunType Date [disease_cohort, ... ]
Two arguments are required; the first must be one of
analyses awg_gbm awg_hnsc awg_lgg
awg_luad awg_pancan8 awg_skcm awg_stad
awg_test awg_thca stddata
while the second must EITHER be a date (in YYYY_MM_DD form) of an
existing GDAC run of the given type OR 'latest'; use the -runs flag
to discern what RunType+Date combinations are available. An optional
3rd, 4th etc argument may be specified to prune the retrieval, given
as a subset of these case-insensitive TCGA disease cohort names:
ACC BLCA BRCA CESC COAD COADREAD DLBC ESCA
GBM HNSC KICH KIRC KIRP LAML LGG LIHC
LUAD LUSC OV PAAD PANCANCER PANCAN8 PANCAN12 PRAD
READ SARC SKCM STAD THCA UCEC UCS
(taken from https://tcga-data.nci.nih.gov/datareports/codeTablesReport.htm)
Note that as a convenience 'analysis' and 'data' are accepted as
synonyms for the 'analyses' and 'stddata' run types
Flags:
-a | -auth [cred] authorize the retrieval of password-protected
results; the optional cred[entials] parameter
must be one of
1) a username:password string
2) /a/path/to/a/wgetrc/file
3) the empty string
If no credentials are supplied (empty string),
then FHGETRC will be used if it is set in the
environment and points to a regular file (which
must be in WGETRC-conformant syntax); otherwise
a username:password prompt will be issued. If
both $FHGETRC is set in the environment AND a
username:password parameter is specified here,
then $FHGETRC will be ignored
-b | -batch do not prompt: assume YES to all YES/NO queries
-c | -cohorts list available disease cohorts
-e | -echo show commands that would be run, but do nothing
-h | -help | --help this message
-l | -log write output to log file, instead of stdout
-o | -only <list> further prune the set of archives retrieved, by
INCLUDING ONLY results of pipelines whose names
names match any of the given space-delimited list
of patterns; matching is performed with glob-style
wildcards, and is case-insensItive; prepending
a tilde (i.e. ~) to a task name will cause it
to be EXCLUDED from download; when no pattern
list is given firehose_get will display all tasks in
the selected run.
NOTE: not all tasks will execute for all disease
cohorts; what tasks are run depends upon the
data available for that disease cohort
-p | -platforms list data platforms available in Firehose runs
(not implemented yet)
-r | -runs list available Firehose runs
-t | -tasks <list> same as -o|-only flag (kept for back-compatibility)
-v display the version of firehose_get
-x debugging: turn on bash set -x (warning: very verbose)
Broad GDAC website: http://gdac.broadinstitute.org
Broad GDAC email : gdac@broadinstitute.org