Multiple Linear Regression


This is the same as Simple Linear Regression but it is better as it also considers multiple features unlike the former. This model serves the same purpose to draw a straight line based on the training data.

The above figure is in three dimensions but if we were to view it in two dimension, it would appear to be a straight line. Even the equation is almost the same as of simple linear regression but it includes as many terms as there are independent variables plus some constant term.

As you can see, Multiple Linear Regression takes into account all the features for calculation of the dependent variable. But it is to be made sure that all the features are linearly related to the target variable otherwise the accuracy of the model would degrade. 

The main purpose of the model is to calculate the coefficients of the independent variables that on multiplication, make up the dependent variable.

Its implementation is also a lot like the simple linear regression. The following code is for the data of 50 startups.

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the Dummy Variable Trap
X = X[:, 1:]

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(), y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

In this example, we have used label encoder and one hot encoder to process the state feature as there are only three categories. Also removing one column after the one hot encoder is important as dummy variable trap makes the variables highly correlated.

After splitting, the model is trained and results are predicted based on it. All the coefficients and the intercept (the constant term) can be seen after fitting the data to the model. regressor.coef_ and regressor.intercept_ will do the following things respectively.

But as this is just a spinoff from the Simple Linear regression, it can only solve the problems where the data in linearly related to each other. So we need a model that could work with data with polynomial relations as the real data is rarely linearly correlated.

Hope you like this post and do tell if you find it useful. Everybody stay Awesome!

Total Hits: hit counter