Hemant Ishwaran

Professor, Graduate Program Director, Director of Statistical Methodology, Division of Biostatistics, University of Miami

Spotlight

Boosted nonparametric hazards with time-dependent covariates, Annals of Statistics, 2021. [pdf]

Research Interests

Machine Learning | Random Forests and Trees | Boosting | Survival | Cancer Staging | Causal Inference | Missing data | Nonparametric Bayes | Variable Selection

About Me

For the last 15 years I have studied and applied machine learning methods to public health, medical and informatics settings, especially to CVD, heart transplantation, cancer staging and gene cancer therapy resistance and developed open source software used by data analysts all across the world, including the random survival forest method I pioneered. An example of a real world application of my work is my role as an Expert Panel Member for the AJCC (American Joint Committee on Cancer) where I developed a machine learning data driven procedure for cancer staging. This method is now used by AJCC and is published in the AJCC Cancer Staging Manuals.

Education

Harvard University, Postdoctoral Fellow, 1995
Yale University, PhD Statistics, 1993
Oxford University, MSc Applied Statistics, 1988
U of Toronto, BSc Mathematical Statistics, 1987

Google Scholar Profile

Complete List of Papers

Selected Papers

O'Brien R. and Ishwaran H. (2019). A random forests quantile classifier for class imbalanced data. Pattern Recognit., 90, 232-249. [pdf] [html]

Tang F. and Ishwaran H. (2017). Random forest missing data algorithms. Stat. Anal. Data Mining, 10, 363–377. [pdf] arXiv:1701.05305

Ishwaran H., Kogalur U.B., Gorodeski E.Z., Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Stat. Assoc, 105, 205-217. [pdf]

Ishwaran H., Blackstone E.H., Hansen. C.A. and Rice T.W. (2009). A novel approach to cancer staging: application to esophageal cancer. Biostatistics, 10, 603-620. [pdf]

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist., 2, 841-860. [pdf]

Ishwaran H. and James L.F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Stat. Assoc. 96, 161-173. [pdf]


randomForestSRC

R-software for random forests regression, classification, survival analysis, competing risks, multivariate, unsupervised, quantile regression, and class imbalanced q-classification. Missing data imputation, includes missForest and multivariate missForest. Fast subsampling random forests. Confidence intervals for variable importance. Minimal depth variable selection. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy. 

NEW! Mahalanobis splitting  for correlated outcomes in multivariate regression.

R package (CRAN build) Github (beta builds)

randomForestSRC vignettes

spikeslab

Spike and slab R package for high-dimensional linear regression models. Uses a generalized elastic net for variable selection. Parallel process enabled. [pdf]


BAMarray (3.0)

Java software for microarray data using Bayesian Analysis of Variance for Microarrays (BAM) [pdf]

boostmtree

Boosted multivariate trees for longitudinal data [pdf]

boostmtree

R package implementing Friedman's gradient descent boosting algorithm for longitudinal data using multivariate tree base learners. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter.


l2boost

Componentwise boosting for linear regression [pdf]

boostmtree

R package implementing Friedman's gradient boosting algorithm with L2-loss function and linear learner componentwise boosting. Includes the elasticNet data augmentation of Ehrlinger and Ishwaran (2012), which adds an L2 penalization (lambda) similar to the elastic net.