Multi-Center Mutation Calls & MAFs
Below are notes from 2013_11_26 telecon between H. Sofia and M. Noble, regarding multi-center mutation calling benchmarks, the file products generated by such, what to do with the latter, and how. Â The telecon was called to obtain clarity on these, but is not intended to be a definitive policy about any of such.
Summary
- Since a multi-center MAF is a richer source of data, leading to better science, ideally the ultimate goal would be for them to supplant single-center MAFs
- As of this writing KICH is among the earliest, along with e.g. COADREAD, to submit a multi-center MAF to the DCC (through BCM)
- THCA is an example where the multi-center mutation calling benchmark results have not yet been merged back into a single MAF for DCC submission
- There is a seemingly implicit understanding that an analyst or programmer from the primary sequencing center has the responsibility of creating the merged multi-center MAF, by incorporating the calls from secondary centers in the benchmark exercise
Actions
- Mike/Broad will immediately ingest KICH multi-center MAF; Â this MAF will be reflected in the Dec stddata run and next Analysis run
- Mike will contact appropriate parties at Broad to generate THCA multi-center MAF
- Mike will gauge (or raise) awareness at Broad, of need for multi-center merge of MAFs for other disease studies sequenced at Broad
- Heidi will advocate for:
- clearer nomenclature across the TCGA to identify multi-center MAFs; Â beginning with contacting BCM, so that name of a multi-center MAF not exactly match the single-center MAF (even though MD5 and file size can disambiguate, using same names is VERY unfriendly)
- standardization of multi-center VCF/MAFs across the AWGs; Â getting the format nailed down is most important, but tools can be shared, too, if possible starting with the BCM tool that was likely written to submit the KICH multi-center MAF
QUESTION 2013_12_02: Â how is filtering going to happen during the merge process (e.g. what tool, again, did Baylor use for this in KICH, if any, or was it manual?) - broader understanding in primary sequencing centers of need/responsibility to merge/aggregate multi-center calling results into single VCF/MAF