The need for statistics in using the randomforest algorithm

Background

The Random Forest algorithm splits the tree at a point in a feature where the labeled group is best divided based on that feature. However, simply dividing the two groups based on entropic counts doesn't necessarily mean they are statistically significantly different, which can lead to serious errors.

Abstract

This project proposes a method for finding a statistically significant splitting criterion in the Random Forest algorithm.

Key Achievements

The Random Forest algorithm can be refined to construct a statistically reliable prediction model.

Published state

-NOne

Overviews

The traditional Random Forest method that constructs trees by dividing groups based on the entropy of labeled data.

This is an example where, by using entropy to split the tree into two groups, the two groups follow an exponential distribution and are not statistically significantly divided.