Methods
Datasets
-
358 cancer fusions of 337 genes from the study of Mitelman et al as a training set and the Breakpoints Collection
of the ChiTaRS-3.1 database as a test set.
-
COSMIC datasets of all point mutations in human genes in cancers.
-
16,102 genes/proteins and 218,979 molecular interactions from BioGrid database of PPI.
-
cBioPortal for studying cancer-associated mutated genes.
Domain-Domain Co-occurrence
-
Comparison is done considering a number of proteins involved in a pathway with the total number of proteins
in that pathway, size of the original list, and total number of all proteins.
-
Determining which pathways are over-represented with a significant p-value.
-
A proteome-wide database of interactions between discreet protein domains in created.
-
For all human proteins their protein domains are identified using
PFAM
,
ELM
,
UniProt
,
CDD
,
SMART
,
Negatome
,
PROSITE
,
Interpro
,
PRINTS
,
Superfamily
and
PIRSF
.
-
A domain-domain co-occurrence score is calculated using all PPI networks with observed frequencies.
-
The PPI events for each domain of the parental proteins analyzed to build PPI networks for parental and fusion proteins.
-
For each PPI event, it is determined whether the interaction is maintained or lost upon fusion of the two parental proteins.
Constructing Fusion Network
-
PPI-pathway associations were ordered as per threshold and FDR.
-
Random pairs of real PPI-pathway interactors and non-interactors are generated.
-
A network is built that best predicts the protein interactions of a fusion protein by uniting all interactors for parental proteins.
-
For every fusion network, the interaction graph is constructed and compared to the refined network having “missing”
as well as “gained” interactions.
-
Structure information for fusion protein-protein interactions maps included.