Introducing Titanic ML Machine Learning Contest - the first and ultimate challenge to explore the depth of machine learning contests and understand how the Kaggle platform works.

If you expect to connect with other participants about this contest, we invite you to join us on Discord! This allows you to communicate in dedicated channels for the contest, job postings, career discussions, and share resources, as well as socialize with fellow data scientists.

This competition boils down to a simple task: Use machine learning techniques to build a model that predicts which passengers survived the sinking of the Titanic.

For more details, you can read on and once you are ready to start competing, simply click on the “Join Competition” button to create your account and access competition data.

Next, head over to Alexis Cook's Titanic Contest Tutorial, which will take you step-by-step in submitting your first entry!

the challenge

On April 15, 1912, the world witnessed the sinking of the Titanic, one of the most controversial shipwrecks in history.

The RMS Titanic was wrecked during its maiden voyage, after striking an iceberg on April 15, 1912. Despite its reputation as “unsinkable,” the lack of lifeboats resulted in the loss of 1502 lives out of a total of 2224 passengers and crew.

Despite the many factors that influenced who survived, some groups appear to have had a greater chance of survival than others.

In this exciting challenge, we invite you to build a prediction model that answers the question: “Which groups of people had a greater chance of survival?” By using passenger data, such as name, age, gender, social class, economic status, etc.

Overview of how Kaggle contests work:

  1. Join the contest:
    Start by reading the challenge description carefully and make sure you understand the contest rules. Once you accept the challenge, you will get access to the contest dataset.
  2. Get started:
    Download the contest data and start creating prediction models using this data. You can create models on your computer using local environments, or you can use “Kaggle Notebooks,” a dedicated Jupyter Notebooks environment that provides free GPUs to make the process easier.
  3. Provide your expectations:
    Once you're happy with your model, upload your predictions as a submission file to the Kaggle platform. The platform will evaluate the performance of your model and provide its accuracy score.
  4. Check the rating:
    You can track your model's ranking and performance compared to others' models on the leaderboard available on the platform.
  5. Improve your score:
    To improve your model's performance and increase your understanding, head to the competition's dedicated discussion forum. There you will find tutorials and ideas from other subscribers that may help you develop your model.

Through these steps, you will be able to participate in Kaggle competitions and improve your skills in data analysis and developing forecasting models.

This competition will use two datasets, which are as follows:

  1. Train.csv dataset:
    This group contains details about a subset of the passengers who were on board the ship (approximately 891 passengers in total). The most important information here is whether the passengers survived or not (ground truth).
  2. Dataset Test.csv:
    This set contains information similar to the Train.csv set, but does not reveal the actual scores for each passenger. You use the patterns discovered in the Train.csv set to predict whether the passengers in the Test.csv set survived.

To submit your predictions to Kaggle:

  • Download the Test.csv data file.
  • Create a prediction model using this data.
  • Generate your predictions using the model.
  • Go to the Kaggle platform and click on the “Submit Predictions” button.
  • Upload your forecast file and follow the steps to submit it.
  • Your predictions will be evaluated and your accuracy score will be displayed on the leaderboard.

Note that the challenge requires predictions of survival or non-survival, so your model must be able to predict this value for every passenger in the Test.csv dataset.