PVCA Batch Effects Discussion 09-16-2011

 

Meeting Minutes 09-16-2011

Telecon attendees: Sheila Reynolds & Adam Norberg (ISB), Dan Dicara & Mike Noble (Broad)

  1. Discussed the appropriate placement of the PVCA Pipeline in Firehose
    1. Three schemes were discussed as described here: BatchEffects.pdf
    2. Scheme 3 was determined optimal
      1. Place PVCA in the Normalizer Workflow after pertinent Merge Pipelines
      2. One pipeline per technology (i.e. array or sequencing platform)
      3. Merge individual reports for individual technologies into a single report
        1. This report could be added as an annotation in Firehose that can be referred to in downstream analysis pipelines
        2. One report per data type (i.e. expression and methylation)
    3. Perhaps add PVCA in other places (i.e. after aggregation/centering pipelines such as mRNA_Preprocess_Median)
  2. Discussed adding information to the data based on the PVCA results
    1. Create a new meta-data file
      1. Sample list for each tumor type with a column indicating if batch effects were discovered
      2. Downstream pipelines could add columns of information to this file
      3. Redactions could be entered at the end
      4. Adam mentioned he may have a python script for doing this - follow up with him
    2. Creating a new file will prevent the adverse effects of adding columns to the preexisting data that could possibly break downstream parsers
    3. Package this with the normalized results
  3. Talk to Nils about allowing linking between reports (this is being tracked as GDAC-80)
  4. Discussed how to correct for batch effects
    1. This would be difficult to automate and should be decided by downstream pipelines
  5. Finally, we discussed maintaining a Batch Effects page on our TCGA-GDAC website (this is the first entry)