About

Mechanistic Inference of Node-Edge Relationships (MINER) pipeline

Figure 1. MINER applies a gene expression clustering algorithm and gold-standard gene interaction databases to infer sets of co-regulated genes called regulons. Each regulon has an associated regulator and edge direction (i.e., activation or repression), as well as an activity in each sample (e.g., over-expressed, under-expressed, etc.). When the coordinated activity of regulons changes in the context of a mutation, a causal relationship is inferred through the associated regulator. A causal and mechanistic transcriptional regulatory network is generated by evaluating the influence of all potential causes (e.g., mutations, translocations, copy-number abnormalities, etc.) on all regulons. Prediction of phenotypes such as responsiveness to therapy or risk of disease progression is achieved by training machine-learning algorithms on regulon activities. The predictive signatures are then placed into a meaningful biological context by identifying their associated mechanisms and putative causes within the network.

We developed the MINER pipeline to infer transcriptional regulatory networks from gene expression data and apply them to the characterization and prediction of phenotypes. MINER builds upon our previous work with the SYstems Genetics Network AnaLysis (SYGNAL) pipeline insofar as it enables the same core functionalities of mechanistic and causal inference, but does so with a new suite of algorithms that enable new applications in the network-based prediction of clinical outcomes (Fig. 1). Inference of the transcriptional regulatory network (TRN) begins by clustering gene expression data into coherent sets of genes that share a binding site for a transcription factor or miRNA according to gold-standard binding-site database information (e.g., Transcription Factor Binding Site Database – TFBSDB). By default, the cluster of genes and the corresponding regulator must also be correlated (or anticorrelated) to one another, however, this restriction can be lifted if it proves too stringent, for example in single-cell analysis. The combination of a coherently expressed set of genes and the associated regulator whose binding site they share represent discrete units, called regulons, from which the TRN is assembled. Once the regulons have been discovered, a novel causal inference algorithm (see Methods) identifies statistically significant links between putative causal events (somatic mutations, chromosomal translocations, etc.) and the activity levels of regulators and co-regulated gene sets. Once a MINER TRN has been inferred from the data of a patient cohort, new samples can be analyzed to uncover the disease-relevant modules that are over- or under-active in an individual patient.

Source Code

Source code is available at https://github.com/baliga-lab/miner2/

Documentation

Documentation is available at https://baliga-lab.github.io/miner2/

Dependencies