fbget

Python and UNIX CLI wrappers for FireBrowse API (version 0.1.11)

fbget is a toolset for programmatically interacting with TCGA data and analyses in FireBrowse, through Python and the UNIX command line.  It offers 3 sets of features:

1) Low-level object-oriented Python wrappers to the FireBrowse RESTful API. Methods in the low level interface by default return 1 page of query results per call, in JSON form. TSV, CSV, and Python dicts may be returned by specifying a suitable format= parameter, and multiple pages may be selected with a suitable page= parameter. See the set_codec() and param_values() functions for more details on these and other parameters.

python>  import firebrowse
python>  print  firebrowse.Samples().mRNASeq(gene="egfr", cohort="ucs")
{
  "mRNASeq": [
    {
      "cohort": "UCS",
      "expression_log2": 7.06162500904694,
      "gene": "EGFR",
      "geneID": 1956,
      "protocol": "RSEM",
      "sample_type": "TP",
      "tcga_participant_barcode": "TCGA-QN-A5NN",
      "z-score": -0.598993525060403

    },
    ...

2) A higher level Python interface that provides simplified access for common bioinformatic use cases. For example, objects do not need to be explicitly instantiated and functions by default return all pages of query results per call, in TSV format.

python>  import fbget
python>  print fbget.mrnaseq("egfr", cohort="ucs")
tcga_participant_barcode	gene	expression_log2	z-score	cohort	sample_type	protocol	geneID
TCGA-QN-A5NN	EGFR	7.06162500905	-0.59899352506	UCS	TP	RSEM	1956
TCGA-QM-A5NM	EGFR	8.16734387649	-0.298443593752	UCS	TP	RSEM	1956
TCGA-NG-A4VW	EGFR	8.93092623547	0.0932667888031	UCS	TP	RSEM	1956

3) And the fbget tool, for accessing the high level interface directly from the UNIX command line

 

linux%   fbget mrnaseq egfr cohort=ucs

tcga_participant_barcode	gene	expression_log2	z-score	cohort	sample_type	protocol	geneID
TCGA-QN-A5NN	EGFR	7.06162500905	-0.59899352506	UCS	TP	RSEM	1956
TCGA-QM-A5NM	EGFR	8.16734387649	-0.298443593752	UCS	TP	RSEM	1956
TCGA-NG-A4VW	EGFR	8.93092623547	0.0932667888031	UCS	TP	RSEM	1956

Most of the code in fbget is automatically generated, by discovery-based inspection of the FireBrowse RESTful API. In addition to the standard Python help() command, documentation for almost all class methods and functions can be obtained by invoking the function with zero arguments. This is better than an inscrutable stack trace, don't you think?

 

python>  fbget.mrnaseq()

mrnaseq() call has missing/None arg value(s), need at least one of: gene OR barcode
Help on function mrnaseq in module fbget:

mrnaseq(gene=None, barcode=None, **kwargs)

    High level wrapper for the FireBrowse Samples.mRNASeq method.
    By default it returns ALL pages of data, in TSV format.

    This service returns sample-level log2 mRNASeq expression
    values. Results may be filtered by gene, cohort, barcode,
    sample type or characterization protocol, but at least one
    gene OR barcode must be supplied.

    For more details consult the interactive documentation at
        http://firebrowse.org/api-docs/#!/Samples
    OR use help(param_values) to see the range of values accepted
    for each parameter, the defaults for each (if any), and the
    degrees of optionality/requiredness offered by the API.

    Parameters: 
        format      (str)  Format of result.
        gene        (str)  Comma separated list of gene name(s).
        cohort      (str)  Narrow search to one or more TCGA disease cohorts.
        barcode     (str)  Comma separated list of TCGA participant barcodes (e.g. TCGA-GF-A4EO).
        sample_type (str)  Narrow search to one or more TCGA sample types.
        protocol    (str)  Narrow search to one or more sample characterization protocols.
        page        (int)  Which page (slice) of entire results set should be returned. 
        page_size   (int)  Number of records per page of results.  Maximum is 2000.
        sort_by     (str)  Which column in the results should be used for sorting paginated results?

Examples

The fbget --examples option provides a wealth of usage ideas:

    # Every line of these examples can be cut and directly pasted to your
    # UNIX-like command line.  Comments will be ignored, while everything
    # not beginning with the # comment character will be executed, as long
    # as fbget is in your $PATH

    # Get the RNASeq expression level of the POLE gene, for all TCGA samples
    # (both tumors and normals, in RSEM form, saved to file)
    fbget --outfile=fbget-test-pole.tsv mrnaseq pole

    # Similar query, but constrained to just the DLBC disease cohort
    fbget mrnaseq pole cohort=dlbc

    # Now constrained to single patient, and showing case insensitivity
    fbget mrnaseq pOlE baRcOdE=TCGA-RQ-A6JB

    # What is the DLBC cohort, anyway?
    fbget cohort dlbc
    # DLBC    Lymphoid Neoplasm Diffuse Large B-cell Lymphoma

    # List all the disease cohorts offered by FireBrowse (note that aggregate
    # cohorts like COADREAD,KIPAN,GBMLGG,STES are not available at the TCGA DCC)
    fbget cohorts

    # Display help (docstring) for the function which retrieves clinical data
    fbget help clinical

    # Calling functions with no arguments also displays help (docstring)
    fbget clinical

    # Now get some actual clinical data, but only for thyroid (THCA) cohort
    fbget clinical tHcA

    # Display the complete list of clinical data element names (CDEs)
    fbget clinical_names

    # List the functions may be called through fbget (-l does the same thing)
    fbget --list

    # Get the 10 most significantly mutated ovarian cancer genes (per MutSig2CV)
    fbget smg OV rank=10

    # Union of the names of parameters admitted by any FireBrowse function
    fbget param_names

    # Show the kinds of values that may be supplied via the cohort parameter
    # (applies to any function which admits the cohort parameter)
    fbget param_values cohort

    # Ditto, for the barcode and clinical data element (CDE) names
    fbget param_values barcode
    fbget param_values fh_cde
 
    # The documentation for param_values is helpful in its own right, too
    fbget help param_values

    # Levels of copy number alteration in TERT gene for 3 disease cohorts,
    # as computed by GISTIC2, redirected to file
    fbget cn_levels tert cohort=acc,KICH,LaMl,UCS > fbget-test-tert-cn.tsv

    # Which genes had significant copy number deletion (per GISTIC2 q values)
    # in BRCA & UCEC cohorts? (alternate method of saving to file, CLI option)
    fbget --outfile=fbget-test-cn-del.tsv cn_genes_del cohort=BRCA,ucec

    # Retrieve mature strand microRNASeq from UVM, as comma-separated values
    fbget mirseq hsa-let-7b-5p cohort=uvm format=csv

    # Repeat the same call to show that bogus parameters induce failure
    fbget mirseq has-let-7b-5p cohort=uvm format=Dum_De_Dum_Dumb

Each of these map very naturally to the low- and high-level wrappers; in addition, here are actual excerpts of code using the low-level wrappers and high-level wrappers. The fbget --list option shows the entire set of high level functions that may be called:

     barcode2type
     centers
     clinical
     clinical_fh
     clinical_names
     clinical_names_fh
     cn_genes_all
     cn_genes_amp
     cn_genes_del
     cn_genes_focal
     cn_levels
     cohort
     cohorts
     counts
     dates
     featuretable
     heartbeat
     help
     maf
     mirseq
     mrnaseq
     mrnaseq_quartiles
     param_names
     param_values
     patients
     platforms
     reports
     samplecode2type
     sampletype2code
     sampletypes
     smg
     stddata
     tssites

Their documentation may be obtained by invoking with zero arguments (shown above) or explicitly invoking help: linux% fbget help smg

Help on function smg in fbget:
fbget.smg = smg(*args, **kwargs)

    High level wrapper for the FireBrowse Analyses.MutationSMG method.
    By default it returns ALL pages of data, in TSV format.

    This service provides a list of significantly mutated genes,
    as scored by MutSig.  It may be filtered by cohort, rank,
    gene, tool and/or Q-value threshold, but at least one cohort
    must be supplied.

    For more details consult the interactive documentation at
        http://firebrowse.org/api-docs/#!/Analyses
    OR use help(param_values) to see the range of values accepted
    for each parameter, the defaults for each (if any), and the
    degrees of optionality/requiredness offered by the API.

    Parameters: 
        format  (str)  Format of result.
        cohort  (str)  Narrow search to one or more TCGA disease cohorts.
        tool  (str)  Narrow search to include only data/results produced by the selected Firehose tool.
        rank  (int)  Number of significant genes to return.
        gene  (str)  Comma separated list of gene name(s).
        q  (float)  Only return results with Q-value <= given threshold.
        page  (int)  Which page (slice) of entire results set should be returned. 
        page_size  (int)  Number of records per page of results.  Maximum is 2000.
        sort_by  (str)  Which column in the results should be used for sorting paginated results?

The fbget --help option shows additional options that may be applied at runtime:

usage: fbget.py [-h] [-d] [-e] [-l] [-o OUTFILE] [-s SERVER] [-V] [-v] function [arg [arg ...]]

Python & UNIX CLI wrappers for the FireBrowse RESTful API

fbget simplifies use and extends the power of FireBrowse, by
providing: low- and high-level Python wrappers to its RESTful
API; an interface through which the high level functions may be
called directly from the UNIX command line, without writing any
Python code; and enabling the results of such to be immediately
streamed to UNIX tools for further processing or analysis.  In
addition, both the fbget CLI tool and the high level wrappers
will by default retrieve all pages of data returned by the
FireBrowse RESTful API, in TSV form that is most commonly used
for bioinformatics analysis.

For more information visit http://firebrowse.org

positional arguments:
  function              name of the function to be called
  arg                   arguments to pass to function

optional arguments:
  -h, --help            show this help message and exit
  -d, --docs            emit documentation for entire api
  -e, --examples        show usage examples
  -l, --list            list all callable functions
  -o OUTFILE, --outfile OUTFILE
                        Specify output file (will be overwritten if already exists) [sys.stdout]
  -s SERVER, --server SERVER
                        the server hosting the FireBrowse instance                  [firebrowse.org]
  -V, --verbose         Verbose: emit to stderr RESTful calls made, etc             [False]
  -v, --version         show program's version number and exit
  -x, --debug           debugging: trace RESTful calls as they are issued

Download and Configuration

Public Use (v0.1.11 released Oct 31 2017)

  1. Obtain Python 2.7 and the Requests package, both of which are likely to already be on your system. The internal versions used at the Broad Institute are Python 2.7.9 and Requests 2.5.1. Finally, we recommend installing to a Python virtual environment.
  2. Download and install the PyPI package automatically with

linux% pip install firebrowse

  OR from this page with

This fbget zipfile, followed by these steps:

linux% unzip fbget-<version>.zip
linux% cd fbget-<version>
linux% sh install.sh
linux% python smoketest.py

  The smoketest is optional, but if the installation worked then it should return something like

 

{
  "HeartBeat": [
    "FireBrowse API at firebrowse.org:8000 is alive\n"
  ]
}
{
  "Reports": [
    {
      "cohort": "ACC",
      "data_type": [
        "CopyNumber"
      ],
      "date": "Fri, 17 Oct 2014 00:00:00 GMT",
      "file_type": "report",
      "report_name": "CopyNumber_Gistic2",
      "report_type": "CopyNumber",
      "report_uri": "http://gdac.broadinstitute.org/runs/analyses__2014_10_17/reports/cancer/ACC-TP/CopyNumber_Gistic2/nozzle.html",
      "sample_type": "TP"
    }
  ]
}
...

Internal to Broad Institute

  1. Ensure that /xchip/tcga/Tools/gdac/bin is in your $PATH. This is sufficient for using the fbget UNIX CLI tool. 

  2. If you also want to use the Python bindings:

    1. Ensure you are configured to use Python 2.7, then do either

      linux% eval setenv `gdac_pythonify` (t)csh
      or
      linux% export `gdac_pythonify` (ba)sh

    2. Then run python

      linux% python

    3. and import EITHER the low level wrappers

      python> import firebrowse

    4. OR the high level wrappers

      python> import fbget