Hemant Ishwaran

Professor, Graduate Program Director, Director of Statistical Methodology, Division of Biostatistics, University of Miami

Google Scholar Profile
Papers

My New Book

Book icon

Research Interests

Machine Learning | Random Forests and Trees | Boosting | Survival | Cancer Staging | Causal Inference | Missing data | Nonparametric Bayes | Variable Selection

Me

For the past 15 years, I have applied machine learning to public health, medical, and informatics settings, focusing on CVD, heart transplantation, cancer staging, and gene therapy resistance. I developed open-source software, including the popular random survival forest method. As an Expert Panel Member for the American Joint Committee on Cancer (AJCC), I created a data-driven machine learning procedure for cancer staging, now featured in the AJCC Cancer Staging Manuals.

Education

Harvard University, Postdoctoral Fellow, 1995
Yale University, PhD Statistics, 1993
Oxford University, MSc Applied Statistics, 1988
U of Toronto, BSc Mathematical Statistics, 1987

Selected Papers

Lu, M. and Ishwaran, H., (2024). Model-independent variable selection via the rule-based variable priority, arXiv 2409.09003 [pdf]

Lee D.K., Chen N. and Ishwaran H. (2021). Boosted nonparametric hazards with time-dependent covariates. Ann. Statist, 49(4), 2101-2128. [pdf]

O'Brien R. and Ishwaran H. (2019). A random forests quantile classifier for class imbalanced data. Pattern Recognit., 90, 232-249. [pdf] [html]

Tang F. and Ishwaran H. (2017). Random forest missing data algorithms. Stat. Anal. Data Mining, 10, 363–377. [pdf]

Ishwaran H., Kogalur U.B., Gorodeski E.Z., Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Stat. Assoc, 105, 205-217. [pdf]

Ishwaran H., Blackstone E.H., Hansen. C.A. and Rice T.W. (2009). A novel approach to cancer staging: application to esophageal cancer. Biostatistics, 10, 603-620. [pdf]

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist., 2, 841-860. [pdf]

Ishwaran H. and James L.F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Stat. Assoc. 96, 161-173. [pdf]


randomForestSRC

R-software for random forests regression, classification, survival analysis, competing risks, multivariate, unsupervised, quantile regression, and class imbalanced q-classification. Missing data imputation, includes missForest and multivariate missForest. Fast subsampling random forests. Confidence intervals for variable importance. Minimal depth variable selection. Visualize trees using Safari or Google Chrome. Anonymous random forests for data privacy. 

R package (CRAN build) Github (beta builds)
randomForestSRC vignettes

randomForestSRC

varPro

Model-independent variable selection using rule-based variable priority for regression, classification and survival. R package (CRAN) Github (beta builds)

varPro

Permutation importance is widely used in variable selection for methods like random forests. It measures a variable's importance by permuting data and observing changes in prediction error but can introduce bias due to artificial permutations. Variable Priority (VarPro), a tree rule-based method, avoids this by comparing estimates from a rule's region to its release region, where variable constraints are removed. By relying solely on observed data, VarPro provides a robust, flexible alternative that consistently filters noise variables and mitigates permutation-based biases. [pdf]

randomForestSGT

R package for super greedy trees and forests.

Super Greedy Trees logo


Implements Super Greedy Trees (SGTs), a flexible decision tree method for regression that generalizes CART by using lasso-penalized parametric models to define multivariate geometric splits, including hyperplanes, ellipsoids, and hyperboloids. Each node split is selected via best split first (BSF), prioritizing cuts with the greatest empirical risk reduction. Fast coordinate descent is used for lasso fitting, with regularization tuned via cross-validation.