Bayesian latent class analysis of heterogeneous epidemiological cohorts, with multiple risks and informative censoring, and of data from clinical trials, with possibly inhomogeneous treatment associations. Clinical outcomes may take the form of time-to-event variables (e.g. overall or disease-free survival), or ordinal class variables (e.g. treatment response).
Detailed statistical cohort analysis of covariates and outcomes
The Bayes-optimal latent substructure of the cohort: the most probable number and sizes of sub-classes, and their covariate associations and base hazard rates, for each active risk
Risk-specific and covariate-conditioned survival curves, decontaminated for the effects of informative censoring
Comparisons with outcomes of standard regression methods
Associations between covariates and sub-class membership
Retrospective sub-class membership probabilities for all samples in the cohort (for biomarker discovery)
Prospective outcome prediction for unseen data
A multi-core implementation of a pipeline designed primarily for predictive multivariate medical data analytics in the regime of high-dimensional covariates and/or undersampling.
The pipeline implements different outcome data types, including time-to-event outcomes, ordinal class outcomes, and arbitrary real-valued ordinal outcomes, and includes functionality for data visualisation and for the generation of controlled synthetic data. It uses nested multiple iterative multivariate regressions, with bootstrapping and Bayesian probabilistic protocols and adaptive parameter priors.
Identifying the optimal covariate selection for predictive regression (without overfitting)
Ranking the covariates in the optimal set
Quantifying the outcome prediction performance of the optimal inferred model on unseen data via cross-validations
Constructing optimal predictive multivariate models
Generating formulas for optimised personal risk scores and treatment response scores
Most tools for discriminant analysis use Maximum Likelihood (ML) or Maximum a Posteriori (MAP) approximations, and therefore require the number of samples to be much larger than the number of covariates. They fail for very high dimensional data, such as imaging or genomic ones. SaddlePoint Discriminant is based on exact analytical evaluation of Bayesian parameter integrals, as opposed to point estimates, and hence has no such limitations.
LOOCV-based estimates of training and validation errors
Estimates of training and validation errors derived from separated training and validation sets
Full classification confusion tables, for training and validation sets
Fast probabilistic class membership predictions for unseen data
Statistical description of the cohort, in terms of class-conditioned covariate characteristics and Bayesian hyper-parameters
Generation of rigorously unbiased null models for directed and non-directed cellular signalling networks is vital for the correct interpretation of experimental observations, and the reliable identification of clinically significant modules. Many researchers apply ad-hoc randomisation algorithms, which have been shown to induce mobility-related biases that invalidate tests.
Node-specific quantifiers, e.g. degree distribution and degree-degree correlations.
Distance related quantifiers, e.g. path length statistics.
Modularity measures and closed path (loop) statistics.
Eigenvalue spectra of adjacency and Laplacian matrices.
Hierarchically constrained null models are generated, obtained by MCMC randomisation with canonical move acceptance probabilities that target tailored maximum entropy network ensembles.