Exploratory Data Analysis of the Titanic Dataset

I have participated in the DataDNA October Challenge the Titanic Kaggle dataset. Analyzing the Titanic Kaggle dataset to identify which class was most likely to survive the Titanic disaster.

Dataset

Provided data is a themed dataset and may not be accurate against the actual events of the Titanic disaster. The titanic sample dataset records only 418 passengers’ data.

Task

Identify which class was most likely to survive.

My Process

In order to know how well each characteristic correlates with survival, I decided to approach the problem based on the characteristics available in the dataset.

Analyzed the given dataset for errors or possibly inaccurate values within characteristics and tried to correct those values or excluded the samples containing the errors.

Age data in this dataset is incomplete. It contains 86 null values which will not be included while measuring age correlation with survival rate. The cabin and ticket features are dropped during the analysis because of fewer data points.

Classifying available characteristics in the given dataset would facilitate further analysis.

Categorical:

Sex – Gender of passenger: male or female

Class – 1 = First Class, 2 = Second Class, 3 = Third Class

Point of embarkation – C = Cherbourg, Q = Queenstown, S = Southampton

Survived – 0 = Dead, 1 = Alive

Numerical:

Age – Age of passenger

Fare – Fare paid for a ticket

SblSp – Number of traveling with sibling or spouse

Parch – Number of traveling with parent or children

Findings

Passenger’s Gender

Out of 418 total passengers, only 152 (36%) survived and 266 (64%) died. This includes 100% of females who were rescued, compared to 100% of males who died. Most females (72) traveled in 3rd Class. This reflects that gender influences the number of survivors because women from all classes were rescued with priority over men.

Passenger’s Age

Children had the highest survival rate of 46% considering they were accompanied by their mothers. Young adults around the age of 20 to 30 had the lowest survival rate of 37%. Most of them traveled in the 3rd class.
Note – Age data in this dataset is incomplete. The age characteristic contains 86 null values, which are not included in the above chart.

Traveling with Siblings or Spouse

Passengers traveling with siblings or spouses had higher survival rates than those traveling alone.

Traveling with Parents or Children

Passengers who lost the most traveled alone in comparison to passengers with parents or children. Most of those families traveled in the 1st class while most solo passengers traveled in the 3rd class.

Port of Embarkation

(65%) passengers embarked from Southampton port, (24%) from Cherbourg port, and (11%) from Queenstown port. Passengers who boarded from Queenstown had the highest (52%) survival rate compared to Cherbourg (39%) and Southampton (33%).

Class of Travel

The survival rate was highest in the First class (47%), followed by the third class (33%) and second class (32%). This reflects the fact that the highest fare-paid passengers had the highest survival rate.

Summary

Overall, several characteristics were observed that correlated with the rate of survival of the Titanic Disaster.
Gender: Female passengers from all classes were rescued with priority.
Family: Passengers who traveled with family members had the highest survival rate. Most of them traveled in the 1st class and unaccompanied passengers who lost the most were in the 3rd class.
Age: Young adults around the age of 20 to 30 had the lowest survival rate since most of them traveled in the 3rd class.
Class:
1st class passengers were given priority as they paid the highest fare compared to other classes.
Embarkation:
Highest number of passengers who embarked from Southampton and traveled in 3rd class had the lowest rate of survival.

Data Visualization