Hemant Ishwaran

Professor, Graduate Program Director, Director of Statistical Methodology, Division of Biostatistics, University of Miami

Google Scholar Profile
Complete List of Papers

My New Book (available mid 2025)

Research Interests

Machine Learning | Random Forests and Trees | Boosting | Survival | Cancer Staging | Causal Inference | Missing data | Nonparametric Bayes | Variable Selection

About Me

For the past 15 years, I have applied machine learning to public health, medical, and informatics settings, focusing on CVD, heart transplantation, cancer staging, and gene therapy resistance. I developed open-source software, including the popular random survival forest method. As an Expert Panel Member for the American Joint Committee on Cancer (AJCC), I created a data-driven machine learning procedure for cancer staging, now featured in the AJCC Cancer Staging Manuals.

Education

Harvard University, Postdoctoral Fellow, 1995
Yale University, PhD Statistics, 1993
Oxford University, MSc Applied Statistics, 1988
U of Toronto, BSc Mathematical Statistics, 1987

Selected Papers

Lu, M. and Ishwaran, H., (2024). Model-independent variable selection via the rule-based variable priority, arXiv 2409.09003.https://arxiv.org/abs/2409.09003. [pdf]

Lee D.K., Chen N. and Ishwaran H. (2021). Boosted nonparametric hazards with time-dependent covariates. Ann. Statist, 49(4), 2101-2128. [pdf]

O'Brien R. and Ishwaran H. (2019). A random forests quantile classifier for class imbalanced data. Pattern Recognit., 90, 232-249. [pdf] [html]

Tang F. and Ishwaran H. (2017). Random forest missing data algorithms. Stat. Anal. Data Mining, 10, 363–377. [pdf] arXiv:1701.05305

Ishwaran H., Kogalur U.B., Gorodeski E.Z., Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Stat. Assoc, 105, 205-217. [pdf]

Ishwaran H., Blackstone E.H., Hansen. C.A. and Rice T.W. (2009). A novel approach to cancer staging: application to esophageal cancer. Biostatistics, 10, 603-620. [pdf]

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist., 2, 841-860. [pdf]

Ishwaran H. and James L.F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Stat. Assoc. 96, 161-173. [pdf]


randomForestSRC

R-software for random forests regression, classification, survival analysis, competing risks, multivariate, unsupervised, quantile regression, and class imbalanced q-classification. Missing data imputation, includes missForest and multivariate missForest. Fast subsampling random forests. Confidence intervals for variable importance. Minimal depth variable selection. Visualize trees using Safari or Google Chrome. Anonymous random forests for data privacy. 

R package (CRAN build) Github (beta builds)
randomForestSRC vignettes

varPro

Model-independent variable selection using rule-based variable priority for regression, classification and survival. Github R package.



Permutation importance is widely used in variable selection for methods like random forests. It measures a variable’s importance by permuting data and observing changes in prediction error but can introduce bias due to artificial permutations. Variable Priority (VarPro), a tree rule-based method, avoids this by comparing estimates from a rule’s region to its release region, where variable constraints are removed. By relying solely on observed data, VarPro provides a robust, flexible alternative that consistently filters noise variables and mitigates permutation-based biases. [pdf]

boostmtree

Boosted multivariate trees for longitudinal data [pdf]

boostmtree

R package implementing Friedman's gradient descent boosting algorithm for longitudinal data using multivariate tree base learners. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter.