Hemant Ishwaran

Hemant Ishwaran

Professor, Graduate Program Director,
Director of Statistical Methodology,
Division of Biostatistics, University of Miami

My New Book

Book icon

Research Interests

Machine Learning | Random Forests and Trees | Boosting | Survival | Cancer Staging | Causal Inference | Missing data | Nonparametric Bayes | Variable Selection

Google Scholar Profile
Link to Papers

Short Bio

I am a Professor of Public Health Sciences at the University of Miami whose research develops machine learning methods for complex biomedical and time-to-event data and turns them into practical open-source tools for investigators. I created Random Survival Forests and the R package randomForestSRC, which are widely used for survival, regression, and classification. My work has been applied in cardiovascular disease, heart transplantation, cancer, and genomics, and as a member of the AJCC expert panel I helped develop data-driven methods that informed modern cancer staging.

Education

  • Harvard University,
    Postdoctoral Fellow, 1995
  • Yale University,
    PhD Statistics, 1993
  • Oxford University,
    MSc Applied Statistics, 1988
  • University of Toronto,
    BSc Mathematical Statistics, 1987

randomForestSRC

Unified treatment of random forests.

R package (CRAN build)
Github (beta builds)
randomForestSRC vignettes

randomForestSRC

Unified random forests for regression, classification, survival analysis, competing risks, multivariate, unsupervised, quantile regression, and class imbalanced q classification. Missing data, includes missForest, multivariate missForest and test-time imputation. Fast subsampling random forests. Confidence intervals for variable importance. Minimal depth variable selection.

varPro

Model independent variable selection using rule based variable priority for regression, classification and survival.

R package (CRAN)
Github (beta builds)

varPro

Permutation importance evaluates a variable's contribution by shuffling its values and measuring the effect on prediction error. However, this approach can introduce bias due to the artificial nature of the permutations. Variable Priority (VarPro), a rule based method, addresses this by comparing estimates within a rule's region to those from a release region, where variable constraints are removed. By relying on observed data, VarPro offers a robust and flexible alternative that effectively filters out noise variables and avoids problems of permutation and other artificial data techniques. [pdf]

randomForestSGT

Super greedy trees and forests for regression.

Github (beta builds)

Super Greedy Trees logo

Implements Super Greedy Trees (SGTs), a decision tree method that generalizes CART by using lasso penalized parametric models to define multivariate geometric splits including hyperplanes, ellipsoids, and hyperboloids. Each node split is selected using best split first (BSF) to prioritize empirical risk reduction. Fast coordinate descent is used for lasso fitting with regularization tuned via cross validation.

randomForestRHF

Random hazard forests for survival analysis with time varying covariates.

Github (beta builds)

RHF logo

Random Hazard Forest (RHF) is a tree ensemble method that targets the continuous time hazard directly. Trees use likelihood based-splitting; longitudinal records are represented in counting process format ensuring splits use only information available at risk time. Returns case-specific hazard and cumulative hazard estimates, out of bag risk, and time varying variable importance using varPro.