Methods

Datasets

358 cancer fusions of 337 genes from the study of Mitelman et al as a training set and the Breakpoints Collection of the ChiTaRS-3.1 database as a test set.
COSMIC datasets of all point mutations in human genes in cancers.
16,102 genes/proteins and 218,979 molecular interactions from BioGrid database of PPI.
cBioPortal for studying cancer-associated mutated genes.

Comparison is done considering a number of proteins involved in a pathway with the total number of proteins in that pathway, size of the original list, and total number of all proteins.
Determining which pathways are over-represented with a significant p-value.
A proteome-wide database of interactions between discreet protein domains in created.
For all human proteins their protein domains are identified using PFAM , ELM , UniProt , CDD , SMART , Negatome , PROSITE , Interpro , PRINTS , Superfamily and PIRSF .
A domain-domain co-occurrence score is calculated using all PPI networks with observed frequencies.
The PPI events for each domain of the parental proteins analyzed to build PPI networks for parental and fusion proteins.
For each PPI event, it is determined whether the interaction is maintained or lost upon fusion of the two parental proteins.

PPI-pathway associations were ordered as per threshold and FDR.
Random pairs of real PPI-pathway interactors and non-interactors are generated.
A network is built that best predicts the protein interactions of a fusion protein by uniting all interactors for parental proteins.
For every fusion network, the interaction graph is constructed and compared to the refined network having “missing” as well as “gained” interactions.
Structure information for fusion protein-protein interactions maps included.