Data Preprocessing :: Jaynankani

Feature Scaling

10/17/2018

This is usually the last part on data preprocessing as after this step, the data is not really legible and no human can spot any pattern in it. The basic goal of this step is to get all of data between -1 and 1.

Splitting the dataset

10/16/2018

The train_test_split is a function in the sklearn.cross_validation library but this will be removed from version 0.20 and will be available in sklearn.model_selection library.This function is basically used to split the dataset in two parts one for training another for testing.

Handling Categorical Data

10/15/2018

There are columns where data is divided into categories such as Male-Female, Red-Blue-Green, 0-1 etc. In our titanic dataset, we have four columns that are categorical including the target variable 'Survived'.

Handling Missing Numerical Values

10/14/2018

There are times that the dataset does contains empty fields of value and it occurs much oftenly than we think. This mostly happens because at the time of collecting the data, some fields that are not mandatory are left out. Take for example, the values of cabin in the titanic dataset. You will find a lot of empty values in...

Importing Dataset

10/13/2018

To get any predictions or any patterns, we need some data to feed it to the models. Now luckily in today's world there is a lot of data available online which could be manipulated to get some interesting results. But the question is how to get the data for your code.

Importing Libraries

10/12/2018