„Integration of various ‘omics data for breast cancer subtyping and personalized treatment“


A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, transcriptomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different genes active in a cell. Nevertheless, the actual functional landscape of the cellular system is defined on the protein level. In spite of the fact that the number of large-scale proteomics studies increased over the years, the cost and complexity of proteomics profiling does not allow to use these approaches widely in clinic. On the other hand transcriptomics based studies are much more straightaway and affordable for the medical use at this point. Thus, building comprehensive models linking transcriptomics and proteomics level data is still a challenging task.


Since breast tumor cells have large variance within the same cancer type, breast cancer nosology is one of the most difficult in terms of outcome and treatment response prediction. This is especially true for the most variable and hence the most lethal estrogen receptor negative (ERN) breast cancer types (HER2 positive and HER2 negative). Thus, traditional methods for transcriptomic data analysis are not sufficient in this particular case. Breast cancer is also the second most common cancer in the US after skin cancer and second leading cause of cancer death in women after lung cancer. Hence there is strong demand for the development of a new generation of highly robust methods for breast cancer subtyping and treatment response prediction based on integrated ‘omics data approaches.

Actual Challenge

  1. Using publicly available transcriptomics and proteomics datasets for breast cancer develop an integrated model for deriving proteomics state of a given breast cancer sample based on it’s transcriptomic profile with focus on several breast cancer related signalling networks including p38, TGF-beta and ErbB signalling.
  2. Validate the approach on several pure transcriptomics datasets trying to predict breast cancer type and treatment response endpoints based on model proteomics data.
  3. Extrapolate the transcriptomics-proteomics link model for use within a broad set of signalling networks.

Skills recommended for team members (not obligatory):

  1. Strong biological, medical, programming or mathematics background.
  2. Interest in applying big data analysis techniques in biology.
  3. Python or R programming backgrounds are highly recommended.
  4. Experience in machine learning or kinetic modeling.
Can you find a solution to this challenge?
Challenge owner: Ivan Ozerov, PhD, Insilico Medicine Inc