Random Forests …

Random Forests (RF) is a data mining method widely used in the fields of bioinformatics [12], speech recognition [13], and drug design and development [14]. Recently RF is gaining popularity in terrestrial ecology [15], [16], [17]; however, so far, only a handful of studies have applied RF in marine ecosystems [18], [19]. In short, RF, as the name suggested, is an ensemble of many decision trees with binary divisions. Each tree is grown from a bootstrap sample of response variable and each node is guided by a predictor value to maximize differences in offspring branches. The fit of the tree is examined using the data not in the bootstrap selection; hence, cross-validation with external data is not necessary. Predictive accuracy requires low bias and low correlation between decision trees [11]. RF achieves these by growing a large number of trees and then averaging the predictions. At the same time, the node decision is chosen from a random subset of predictors to make the trees look as different as possible. RF does not assume any data distribution and does not require formal selection of predictors. RF is robust to outlier and unbalanced data, making it a better choice than traditional statistical methods [12].



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s