Bias in random forest variable importance measures: Illustrations, sources and a solution

Overview of attention for article published in BMC Bioinformatics, January 2007

Altmetric Badge

About this Attention Score

In the top 5% of all research outputs scored by Altmetric
Among the highest-scoring outputs from this source (#45 of 7,793)
High Attention Score compared to outputs of the same age (98th percentile)
High Attention Score compared to outputs of the same age and source (98th percentile)

Mentioned by

blogs: 2 blogs
policy: 2 policy sources
twitter: 20 X users

googleplus: 1 Google+ user
q&a: 7 Q&A threads

Citations

dimensions_citation: 2473 Dimensions

Readers on

mendeley: 2244 Mendeley
citeulike: 14 CiteULike
connotea: 3 Connotea

Summary Blogs Policy documents X Google+ Q&A Dimensions citations

So far, Altmetric has seen 23 X posts from 20 X users, with an upper bound of 33,187 followers.

@rasbt Great 🧵! Another big issue is the RF tendency to overestimate importance of correlated features. Either we make a good feature engineering 1st, or we need to use a "conditional" permutation. There is a great series of papers from Strobl et al on thi

Reply Repost Favourite

RT @JameCWiley: @flourn0 I use the Garson importance metric described in the appendix of this paper somewhat frequently, author describes i…

Reply Repost Favourite

@flourn0 I use the Garson importance metric described in the appendix of this paper somewhat frequently, author describes it very clearly: https://t.co/YDMrrdu6OJ A must-read if you want to get into random forest importances: https://t.co/eZVqwmCk2D

Reply Repost Favourite

RT @josejorgexl: I like to use tree-based models to determine the feature importance I have even written about it here on Twitter. Imagine…

Reply Repost Favourite

I like to use tree-based models to determine the feature importance I have even written about it here on Twitter. Imagine my surprise when I discovered that this importance is biased in some cases! Here's the paper in case you're interested: https://t.co

Reply Repost Favourite

RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…

Reply Repost Favourite

RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…

Reply Repost Favourite

RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…

Reply Repost Favourite

RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…

Reply Repost Favourite

RT @maureviv: I wish I had read this before https://t.co/h6dvXZoUsP

Reply Repost Favourite

I wish I had read this before https://t.co/h6dvXZoUsP

Reply Repost Favourite

RT @kdpsinghlab: WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It…

Reply Repost Favourite

WRONG AGAIN. It’s X2. Again, this is simulated data to ensure it is X2. But why?! Why has the random forest failed us? It can’t possibly get worse, can it? (The previous examples come from Strobl et al: https://t.co/wlWOmyIFVf)

Reply Repost Favourite

@GRich_Cinci What about in the following paper, wherein the authors artificially generate unrelated variables and then induce significant relationships between them through bootstrapping? See figure 11 https://t.co/aMxBnOtTOD

Reply Repost Favourite

@dr_greg_landrum No worries, you are good to keep me honest and make me find the original article. https://t.co/1LYXg3lUdL See also the nice chapter here with a "disadvantages" section by @ChristophMolnar https://t.co/g3fHdKUUeJ

Reply Repost Favourite

Bias in random forest variable importance measures: Illustrations, sources and a solution | BMC Bioinformatics https://t.co/aQ0FrUrPHQ

Reply Repost Favourite

A useful paper with #R code dealing with the same topic is https://t.co/wowt29i8FF

Reply Repost Favourite

I've been looking for you -> Bias in random forest variable importance measures: Illustrations, sources and a solution https://t.co/jCzaz1NPl2 #bmcbioinformatics

Reply Repost Favourite

party: A Laboratory for Recursive Partytioning, my go-to random forest algorithm to reduce variable importance bias using conditional inference trees. Important if using categorical and continuous predictors in your model. #randomforest #machinelearning #R

Reply Repost Favourite

@ak11 ジニ不純度の変化量で評価するfeature importanceは連続値の変数や多種のカテゴリ変数の重要度を過大評価するというバイアスがあるようです。 https://t.co/zTKlYrUpLE

Reply Repost Favourite

@rquintino @HJDLopes @tiagotvv 1) biased towards preferring variables with more categories https://t.co/cxqTbQLYuH

Reply Repost Favourite

RT @jnemecek: Bias in random forest variable importance measures: Illustrations, sources and a solution | BMC Bioinformatics https://t.co/…

Reply Repost Favourite

Bias in random forest variable importance measures: Illustrations, sources and a solution | BMC Bioinformatics https://t.co/iQrX8EVwjt

Reply Repost Favourite