It's a really good tutorial. I like how you talk about Conditional Inference. My...

savagedata · on April 12, 2019

Thank you for the useful feedback! I'll have to look up GUIDE trees.

> This is interesting. I hear it was sqroot(total number of predictors).

I was probably looking at the randomForest R package documentation [1], which says:

> mtry Number of variables randomly sampled as candidates at each split. Note that the default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3)

I checked the H2O implementation of random forest [2] and they use the same defaults.

I'll add a note about the one third default being specific to regression since that seems like an important distinction.

[1] https://www.rdocumentation.org/packages/randomForest/version...

[2] http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/d...