DATA PREPROCESSING

This is usually the last part on data preprocessing as after this step, the data is not really legible and no human can spot any pattern in it. The basic goal of this step is to get all of data between -1 and 1.

The train_test_split is a function in the sklearn.cross_validation library but this will be removed from version 0.20 and will be available in sklearn.model_selection library.This function is basically used to split the dataset in two parts one for training another for testing.

There are columns where data is divided into categories such as Male-Female, Red-Blue-Green, 0-1 etc. In our titanic dataset, we have four columns that are categorical including the target variable 'Survived'.

There are times that the dataset does contains empty fields of value and it occurs much oftenly than we think. This mostly happens because at the time of collecting the data, some fields that are not mandatory are left out. Take for example, the values of cabin in the titanic dataset. You will find a lot of empty values in...

To get any predictions or any patterns, we need some data to feed it to the models. Now luckily in today's world there is a lot of data available online which could be manipulated to get some interesting results. But the question is how to get the data for your code.

Most of the developers currently are using Python for their Machine Learning projects and there is a reason for it. Python has a lot of in-built libraries that are ready to use by just importing them. So today the topic will be importing useful libraries that you can use in your programs.