# Decision Tree Regression

10/21/2018

Decision trees are used for many purposes from sorting and searching to classifying but using it for regression is really hard as there are infinitely possible values for a regression. There is simple math behind it which can be implemented by multiple 'if' loops. This model is available in the sklearn.tree library. Take for example this data. What the model does is it separates out the data into segments which are represented by their averaged values in the y-coordinate and X1 and X2 are the decision variables in this case. These splits are then used to make a decision tree something like this. Thus by doing so, the tree can be really complex as the number of independent variables increase. Carrying out this model in code is actually simple as we do not have to deal with the math behind it.

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# Fitting Decision Tree Regression to the dataset
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)

# Predicting a new result
y_pred = regressor.predict(6.5)

# Visualising the Decision Tree Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (Decision Tree Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

The results that we get are a lot better than any of the models that we have tried till now. Using this graph predictions can be made easily just by plotting the point on the graph. But sometimes when there are more than one or two variables which is usually the case in real world data, this method is not used as the decisions made while making the tree can be bad in the early stages and due to that the whole model performs badly.

Hope you like this post and do tell if you find it useful. Everybody stay Awesome!