Merging Multi-

Summary

Since a multi-center MAF is a richer source of data, leading to better science, the ultimate goal should be for them to supplant single-center MAFs
As of this writing KICH is among the earliest, if not the first, to submit a multi-center MAF to the DCC (through BCM)
THCA is an example where the multi-center mutation calling benchmark results have not yet been merged back into a single MAF
There is a seemingly implicit understanding that an analyst or programmer from the primary sequencing center has the responsibility of creating the merged multi-center MAF, by incorporating the calls from secondary centers in the benchmark exercise

Actions

Mike/Broad will immediately ingest KICH multi-center MAF; this MAF will be reflected in the Dec stddata run and next Analysis run
Mike will contact appropriate parties at Broad to generate THCA multi-center MAF
Mike will gauge (or raise) awareness at Broad, of need for multi-center merge of MAFs for other disease studies sequenced at Broad
Heidi will advocate for:

clearer nomenclature across the TCGA to identify multi-center MAFs; beginning with contacting BCM, so that name of a multi-center MAF not exactly match the single-center MAF (even though MD5 and file size can disambiguate, using same names is VERY unfriendly)
standardization of multi-center VCF/MAFs across the AWGs; getting the format nailed down is most important, but tools can be shared, too, if possible starting with the BCM tool that was likely written to submit the KICH multi-center MAF
broader understanding in primary sequencing centers of need/responsibility to merge/aggregate multi-center calling results into single VCF/MAF