Multi-Center Mutation Calls & MAFs

Below are notes from 2013_11_26 telecon between H. Sofia and M. Noble, regarding multi-center mutation calling benchmarks, the file products generated by such, what to do with the latter, and how.  The telecon was called to obtain clarity on these, but is not intended to be a definitive policy about any of such.


Summary

  • Since a multi-center MAF is a richer source of data, leading to better science, ideally the ultimate goal would be for them to supplant single-center MAFs
  • As of this writing KICH is among the earliest, along with e.g. COADREAD, to submit a multi-center MAF to the DCC (through BCM)
  • THCA is an example where the multi-center mutation calling benchmark results have not yet been merged back into a single MAF for DCC submission
  • There is a seemingly implicit understanding that an analyst or programmer from the primary sequencing center has the responsibility of creating the merged multi-center MAF, by incorporating the calls from secondary centers in the benchmark exercise

Actions

  • Mike/Broad will immediately ingest KICH multi-center MAF;  this MAF will be reflected in the Dec stddata run and next Analysis run
  • Mike will contact appropriate parties at Broad to generate THCA multi-center MAF
  • Mike will gauge (or raise) awareness at Broad, of need for multi-center merge of MAFs for other disease studies sequenced at Broad
  • Heidi will advocate for:
    • clearer nomenclature across the TCGA to identify multi-center MAFs;  beginning with contacting BCM, so that name of a multi-center MAF not exactly match the single-center MAF (even though MD5 and file size can disambiguate, using same names is VERY unfriendly)
    • standardization of multi-center VCF/MAFs across the AWGs;  getting the format nailed down is most important, but tools can be shared, too, if possible starting with the BCM tool that was likely written to submit the KICH multi-center MAF
      QUESTION 2013_12_02:  how is filtering going to happen during the merge process (e.g. what tool, again, did Baylor use for this in KICH, if any, or was it manual?)
    • broader understanding in primary sequencing centers of need/responsibility to merge/aggregate multi-center calling results into single VCF/MAF