Random Forest Regression

10/22/2018

If you have read the previous post, you could have guessed the ideology behind this new model. The forest gives it away. As in nature the forest is made up of many trees, Random forest regression is also combination of many Decision tree regressions.

The figure exactly explains how the random forest model works. From all the trees, the results generated from every tree is summed up and the average value is considered as the final result.

This type of model learning is called ensemble learning which means to make up a model from combination of different models or multiple iterations of the same model. Rest all the processing is the same as decision tree regression as random forest is nothing but multiple trees doing the same thing on different parts of the dataset.

The word random in the name is for the dataset the different trees train on. The dataset is split differently every time but into same proportions making sure that the same tree is not made everytime.

The code also is really similar to the decision tree.

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# Fitting Random Forest Regression to the dataset
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(X, y)

# Predicting a new result
y_pred = regressor.predict(6.5)

# Visualising the Random Forest Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (Random Forest Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

In this code, 10 different trees are made and the results of all are considered as the average of all is the final prediction of the model. The result of the above code is displayed below and it is almost similar to the decision tree model. The difference will be more visible if the data were to be big enough.

 With ensemble learning you can also make models from combining different models which is sometimes better than using a single model multiple times.

But how can one know which model is better for what weights and coefficients? There is a  way to do it by using evaluating functions of the model.

Hope you like this post and do tell if you find it useful. Everybody stay Awesome!  

Total Hits: hit counter