@rasbt Great 🧵! Another big issue is the RF tendency to overestimate importance of correlated features. Either we make a good feature engineering 1st, or we need to use a "conditional" permutation. There is a great series of papers from Strobl et al on thi
RT @JameCWiley: @flourn0 I use the Garson importance metric described in the appendix of this paper somewhat frequently, author describes i…
@flourn0 I use the Garson importance metric described in the appendix of this paper somewhat frequently, author describes it very clearly: https://t.co/YDMrrdu6OJ A must-read if you want to get into random forest importances: https://t.co/eZVqwmCk2D
RT @josejorgexl: I like to use tree-based models to determine the feature importance I have even written about it here on Twitter. Imagine…
I like to use tree-based models to determine the feature importance I have even written about it here on Twitter. Imagine my surprise when I discovered that this importance is biased in some cases! Here's the paper in case you're interested: https://t.co
RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…
RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…
RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…
RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…
RT @maureviv: I wish I had read this before https://t.co/h6dvXZoUsP
I wish I had read this before https://t.co/h6dvXZoUsP
RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…
WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It can’t possibly get worse, can it? (The previous examples come from Strobl et al: https://t.co/wlWOmyIFVf)
@GRich_Cinci What about in the following paper, wherein the authors artificially generate unrelated variables and then induce significant relationships between them through bootstrapping? See figure 11 https://t.co/aMxBnOtTOD
@dr_greg_landrum No worries, you are good to keep me honest and make me find the original article. https://t.co/1LYXg3lUdL See also the nice chapter here with a "disadvantages" section by @ChristophMolnar https://t.co/g3fHdKUUeJ
Bias in random forest variable importance measures: Illustrations, sources and a solution | BMC Bioinformatics https://t.co/aQ0FrUrPHQ
A useful paper with #R code dealing with the same topic is https://t.co/wowt29i8FF
I've been looking for you -> Bias in random forest variable importance measures: Illustrations, sources and a solution https://t.co/jCzaz1NPl2 #bmcbioinformatics
party: A Laboratory for Recursive Partytioning, my go-to random forest algorithm to reduce variable importance bias using conditional inference trees. Important if using categorical and continuous predictors in your model. #randomforest #machinelearning #R
@ak11 ジニ不純度の変化量で評価するfeature importanceは連続値の変数や多種のカテゴリ変数の重要度を過大評価するというバイアスがあるようです。 https://t.co/zTKlYrUpLE
@rquintino @HJDLopes @tiagotvv 1) biased towards preferring variables with more categories https://t.co/cxqTbQLYuH
RT @jnemecek: Bias in random forest variable importance measures: Illustrations, sources and a solution | BMC Bioinformatics https://t.co/…
Bias in random forest variable importance measures: Illustrations, sources and a solution | BMC Bioinformatics https://t.co/iQrX8EVwjt