Step 1: Understanding the Business Context

To analyze and investigate the Titanic dataset and summarize the main information that we can retrieve.

To predict which passengers survived the Titanic shipwreck and describe the survival rate of passengers on the Titanic.

The main source of data collected about Titanic came from the Encyclopedia Titanica (https://www.encyclopedia-titanica.org/). However, you can view a description of this dataset on the Kaggle website, where the data was obtained (https://www.kaggle.com/c/titanic/data).

Step 2: Understanding the Technical Context

The datasets were collected by a variety of researchers and were collected with primary sources where the data was collected at the time of the event such as newspapers, photographs, etc.

The sources of these data came from official inquiries in Britain and the USA and newspapers articles that related to the sinking of the Titanic.

The event happened a long time ago in the 1900s when the technologies were still not sophisticated which made it hard to collect proper data to analyze.

The dataset had a couple of columns that were missing values and invalid fields for the analysis.

Step 3: Understanding the Tables and Fields

We had one table which is the “passengers” table.