Hemant Ishwaran

Professor, Graduate Program Director, Director of Statistical Methodology, Division of Biostatistics, University of Miami

My New Book

Book icon

Research Interests

Machine Learning | Random Forests and Trees | Boosting | Survival | Cancer Staging | Causal Inference | Missing data | Nonparametric Bayes | Variable Selection

Google Scholar Profile
Link to Papers

Short Bio

For the past 15 years, I have applied machine learning to public health, medical, and informatics settings, focusing on CVD, heart transplantation, cancer staging, and gene therapy resistance. I developed open-source software, including the popular random survival forest method. As an Expert Panel Member for the American Joint Committee on Cancer (AJCC), I created a data-driven machine learning procedure for cancer staging, now featured in the AJCC Cancer Staging Manuals.

Education

  • Harvard University,
    Postdoctoral Fellow, 1995
  • Yale University,
    PhD Statistics, 1993
  • Oxford University,
    MSc Applied Statistics, 1988
  • University of Toronto,
    BSc Mathematical Statistics, 1987

randomForestSRC

Unified treatment of random forests.

R package (CRAN build)
Github (beta builds)
randomForestSRC vignettes

randomForestSRC

Unified random forests for regression, classification, survival analysis, competing risks, multivariate, unsupervised, quantile regression, and class imbalanced q classification. Missing data imputation, includes missForest and multivariate missForest. Fast subsampling random forests. Confidence intervals for variable importance. Minimal depth variable selection.

varPro

Model independent variable selection using rule based variable priority for regression, classification and survival.

R package (CRAN)
Github (beta builds)

varPro

Permutation importance is widely used for variable selection in methods such as random forests, but artificial permutations can introduce bias. Variable Priority (VarPro) avoids this by comparing estimates from a rule's region to its release region, where variable constraints are removed. Because it relies only on observed data, VarPro provides a robust and flexible alternative that consistently filters noise variables and mitigates permutation bias. [pdf]

randomForestSGT

Super greedy trees and forests for regression.

Github (beta builds)

Super Greedy Trees logo

Implements Super Greedy Trees (SGTs), a decision tree method that generalizes CART by using lasso penalized parametric models to define multivariate geometric splits including hyperplanes, ellipsoids, and hyperboloids. Each node split is selected using best split first (BSF) to prioritize empirical risk reduction. Fast coordinate descent is used for lasso fitting with regularization tuned via cross validation.

randomForestRHF

Random hazard forests for survival prediction with time varying covariates.

Github (beta builds)

RHF logo

Random Hazard Forest (RHF) is a tree ensemble method that targets the continuous time hazard directly. Trees use likelihood based-splitting; longitudinal records are represented in counting process format ensuring splits use only information available at risk time. Returns case-specific hazard and cumulative hazard estimates, out of bag risk, and time varying variable importance using varPro.