How to process clinical parameters of XML data in GDAC clinical picker pipeline
: The clinical parameter names are concatenated with '.' with it's parent node names in the XML data.
For all CDEs,
- truncated each name with '.' and took the last element as a parameter name.
- defined a parameter list to process.
- took all CDEs starting with 'patient.*' and filter out parameters starting with 'admin.*', 'patient.samples.*' , 'patient.clinical_cqcf.*' and 'patient.biospecimen_cqcf.*' parameters.
- for a parameter name, generated a matrix having all clinical data having the parameter name.
- They are saved under /each_param/ and they are useful to locate a related parameter set having the same name.
- for each parameter name under /each_param/, processed and saved them in the All_CDEs_*.txt
- if the parameter has multiple followup data, it is processed to one parameter having the latest values.
- if the parameter has multiple but not followup data, it's additional event data are saved under /EXTRA/.
- The All_CDEs_*.txt is used as an input for generating a *.clin.merged.picked.txt by the selectionFileGenerator.
*.clin.merged.picked.txt sill has a small set of parameters suggested by pathologist for clinical correlation analysis. However, More parameters, which are not in *.clin.merged.picked.txt, are available in All_CDEs_*.txt and you can add them to the *.clin.merged.picked.txt for your clinical correlation test.