fbget
Python and UNIX CLI wrappers for FireBrowse API (version 0.1.11)
fbget is a toolset for programmatically interacting with TCGA data and analyses in FireBrowse, through Python and the UNIX command line. It offers 3 sets of features:
1) Low-level object-oriented Python wrappers to the FireBrowse RESTful API. Methods in the low level interface by default return 1 page of query results per call, in JSON form. TSV, CSV, and Python dicts may be returned by specifying a suitable format=
parameter, and multiple pages may be selected with a suitable page=
parameter. See the set_codec()
and param_values()
functions for more details on these and other parameters.
python> import firebrowse python> print firebrowse.Samples().mRNASeq(gene="egfr", cohort="ucs") { "mRNASeq": [ { "cohort": "UCS", "expression_log2": 7.06162500904694, "gene": "EGFR", "geneID": 1956, "protocol": "RSEM", "sample_type": "TP", "tcga_participant_barcode": "TCGA-QN-A5NN", "z-score": -0.598993525060403 }, ...
2) A higher level Python interface that provides simplified access for common bioinformatic use cases. For example, objects do not need to be explicitly instantiated and functions by default return all pages of query results per call, in TSV format.
python> import fbget python> print fbget.mrnaseq("egfr", cohort="ucs") tcga_participant_barcode gene expression_log2 z-score cohort sample_type protocol geneID TCGA-QN-A5NN EGFR 7.06162500905 -0.59899352506 UCS TP RSEM 1956 TCGA-QM-A5NM EGFR 8.16734387649 -0.298443593752 UCS TP RSEM 1956 TCGA-NG-A4VW EGFR 8.93092623547 0.0932667888031 UCS TP RSEM 1956
3) And the fbget
tool, for accessing the high level interface directly from the UNIX command line
linux% fbget mrnaseq egfr cohort=ucs tcga_participant_barcode gene expression_log2 z-score cohort sample_type protocol geneID TCGA-QN-A5NN EGFR 7.06162500905 -0.59899352506 UCS TP RSEM 1956 TCGA-QM-A5NM EGFR 8.16734387649 -0.298443593752 UCS TP RSEM 1956 TCGA-NG-A4VW EGFR 8.93092623547 0.0932667888031 UCS TP RSEM 1956
Most of the code in fbget is automatically generated, by discovery-based inspection of the FireBrowse RESTful API. In addition to the standard Python help() command, documentation for almost all class methods and functions can be obtained by invoking the function with zero arguments. This is better than an inscrutable stack trace, don't you think?
python> fbget.mrnaseq() mrnaseq() call has missing/None arg value(s), need at least one of: gene OR barcode Help on function mrnaseq in module fbget: mrnaseq(gene=None, barcode=None, **kwargs) High level wrapper for the FireBrowse Samples.mRNASeq method. By default it returns ALL pages of data, in TSV format. This service returns sample-level log2 mRNASeq expression values. Results may be filtered by gene, cohort, barcode, sample type or characterization protocol, but at least one gene OR barcode must be supplied. For more details consult the interactive documentation at http://firebrowse.org/api-docs/#!/Samples OR use help(param_values) to see the range of values accepted for each parameter, the defaults for each (if any), and the degrees of optionality/requiredness offered by the API. Parameters: format (str) Format of result. gene (str) Comma separated list of gene name(s). cohort (str) Narrow search to one or more TCGA disease cohorts. barcode (str) Comma separated list of TCGA participant barcodes (e.g. TCGA-GF-A4EO). sample_type (str) Narrow search to one or more TCGA sample types. protocol (str) Narrow search to one or more sample characterization protocols. page (int) Which page (slice) of entire results set should be returned. page_size (int) Number of records per page of results. Maximum is 2000. sort_by (str) Which column in the results should be used for sorting paginated results?
The fbget --examples option provides a wealth of usage ideas:
# Every line of these examples can be cut and directly pasted to your # UNIX-like command line. Comments will be ignored, while everything # not beginning with the # comment character will be executed, as long # as fbget is in your $PATH # Get the RNASeq expression level of the POLE gene, for all TCGA samples # (both tumors and normals, in RSEM form, saved to file) fbget --outfile=fbget-test-pole.tsv mrnaseq pole # Similar query, but constrained to just the DLBC disease cohort fbget mrnaseq pole cohort=dlbc # Now constrained to single patient, and showing case insensitivity fbget mrnaseq pOlE baRcOdE=TCGA-RQ-A6JB # What is the DLBC cohort, anyway? fbget cohort dlbc # DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma # List all the disease cohorts offered by FireBrowse (note that aggregate # cohorts like COADREAD,KIPAN,GBMLGG,STES are not available at the TCGA DCC) fbget cohorts # Display help (docstring) for the function which retrieves clinical data fbget help clinical # Calling functions with no arguments also displays help (docstring) fbget clinical # Now get some actual clinical data, but only for thyroid (THCA) cohort fbget clinical tHcA # Display the complete list of clinical data element names (CDEs) fbget clinical_names # List the functions may be called through fbget (-l does the same thing) fbget --list # Get the 10 most significantly mutated ovarian cancer genes (per MutSig2CV) fbget smg OV rank=10 # Union of the names of parameters admitted by any FireBrowse function fbget param_names # Show the kinds of values that may be supplied via the cohort parameter # (applies to any function which admits the cohort parameter) fbget param_values cohort # Ditto, for the barcode and clinical data element (CDE) names fbget param_values barcode fbget param_values fh_cde # The documentation for param_values is helpful in its own right, too fbget help param_values # Levels of copy number alteration in TERT gene for 3 disease cohorts, # as computed by GISTIC2, redirected to file fbget cn_levels tert cohort=acc,KICH,LaMl,UCS > fbget-test-tert-cn.tsv # Which genes had significant copy number deletion (per GISTIC2 q values) # in BRCA & UCEC cohorts? (alternate method of saving to file, CLI option) fbget --outfile=fbget-test-cn-del.tsv cn_genes_del cohort=BRCA,ucec # Retrieve mature strand microRNASeq from UVM, as comma-separated values fbget mirseq hsa-let-7b-5p cohort=uvm format=csv # Repeat the same call to show that bogus parameters induce failure fbget mirseq has-let-7b-5p cohort=uvm format=Dum_De_Dum_Dumb
Each of these map very naturally to the low- and high-level wrappers; in addition, here are actual excerpts of code using the low-level wrappers and high-level wrappers. The fbget --list option shows the entire set of high level functions that may be called:
barcode2type centers clinical clinical_fh clinical_names clinical_names_fh cn_genes_all cn_genes_amp cn_genes_del cn_genes_focal cn_levels cohort cohorts counts dates featuretable heartbeat help maf mirseq mrnaseq mrnaseq_quartiles param_names param_values patients platforms reports samplecode2type sampletype2code sampletypes smg stddata tssites
Their documentation may be obtained by invoking with zero arguments (shown above) or explicitly invoking help: linux% fbget help smg
Help on function smg in fbget: fbget.smg = smg(*args, **kwargs) High level wrapper for the FireBrowse Analyses.MutationSMG method. By default it returns ALL pages of data, in TSV format. This service provides a list of significantly mutated genes, as scored by MutSig. It may be filtered by cohort, rank, gene, tool and/or Q-value threshold, but at least one cohort must be supplied. For more details consult the interactive documentation at http://firebrowse.org/api-docs/#!/Analyses OR use help(param_values) to see the range of values accepted for each parameter, the defaults for each (if any), and the degrees of optionality/requiredness offered by the API. Parameters: format (str) Format of result. cohort (str) Narrow search to one or more TCGA disease cohorts. tool (str) Narrow search to include only data/results produced by the selected Firehose tool. rank (int) Number of significant genes to return. gene (str) Comma separated list of gene name(s). q (float) Only return results with Q-value <= given threshold. page (int) Which page (slice) of entire results set should be returned. page_size (int) Number of records per page of results. Maximum is 2000. sort_by (str) Which column in the results should be used for sorting paginated results?
The fbget --help option shows additional options that may be applied at runtime:
usage: fbget.py [-h] [-d] [-e] [-l] [-o OUTFILE] [-s SERVER] [-V] [-v] function [arg [arg ...]] Python & UNIX CLI wrappers for the FireBrowse RESTful API fbget simplifies use and extends the power of FireBrowse, by providing: low- and high-level Python wrappers to its RESTful API; an interface through which the high level functions may be called directly from the UNIX command line, without writing any Python code; and enabling the results of such to be immediately streamed to UNIX tools for further processing or analysis. In addition, both the fbget CLI tool and the high level wrappers will by default retrieve all pages of data returned by the FireBrowse RESTful API, in TSV form that is most commonly used for bioinformatics analysis. For more information visit http://firebrowse.org positional arguments: function name of the function to be called arg arguments to pass to function optional arguments: -h, --help show this help message and exit -d, --docs emit documentation for entire api -e, --examples show usage examples -l, --list list all callable functions -o OUTFILE, --outfile OUTFILE Specify output file (will be overwritten if already exists) [sys.stdout] -s SERVER, --server SERVER the server hosting the FireBrowse instance [firebrowse.org] -V, --verbose Verbose: emit to stderr RESTful calls made, etc [False] -v, --version show program's version number and exit -x, --debug debugging: trace RESTful calls as they are issued
Public Use (v0.1.11 released Oct 31 2017)
- Obtain Python 2.7 and the Requests package, both of which are likely to already be on your system. The internal versions used at the Broad Institute are Python 2.7.9 and Requests 2.5.1. Finally, we recommend installing to a Python virtual environment.
- Download and install the PyPI package automatically with
linux%
pip install firebrowse
OR from this page with
This fbget zipfile, followed by these steps:
linux% unzip fbget-<version>.zip
linux% cd fbget-<version>
linux% sh install.sh
linux% python smoketest.py
The smoketest is optional, but if the installation worked then it should return something like
{ "HeartBeat": [ "FireBrowse API at firebrowse.org:8000 is alive\n" ] } { "Reports": [ { "cohort": "ACC", "data_type": [ "CopyNumber" ], "date": "Fri, 17 Oct 2014 00:00:00 GMT", "file_type": "report", "report_name": "CopyNumber_Gistic2", "report_type": "CopyNumber", "report_uri": "http://gdac.broadinstitute.org/runs/analyses__2014_10_17/reports/cancer/ACC-TP/CopyNumber_Gistic2/nozzle.html", "sample_type": "TP" } ] } ...
Internal to Broad Institute
- Ensure that /xchip/tcga/Tools/gdac/bin is in your $PATH. This is sufficient for using the fbget UNIX CLI tool.
- If you also want to use the Python bindings:
- Ensure you are configured to use Python 2.7, then do either
linux% eval setenv `gdac_pythonify` (t)csh
or
linux% export `gdac_pythonify` (ba)sh - Then run python
linux% python - and import EITHER the low level wrappers
python> import firebrowse - OR the high level wrappers
python> import fbget