fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Kick-off with Kaggle competitions to learn data science skills

Data Science Dojo
Nathan Piccini

January 23

What are Kaggle Competitions? I didn’t know, so I looked it up. Get started by reading what I learned and find an active list of Kaggle competitions. 

First of all, What’s Kaggle?

Until a few months ago I didn’t know the answer to that question. If you don’t either that’s okay, we’re going to answer it together. But first, you need to know a little background information about this data science network.

Kaggle was founded in 2010 with the idea that data scientists need a place to come together and collaborate on projects. This has transformed into a network with more than 1,000,000 registered users and has created a safe place for data science learning, sharing, and competition.

Using the human competitive spirit, Kaggle created a platform for organizations to host data science competitions that have fueled new methodologies and techniques in data science and given organizations new insights from the data they provided.

Being the competitive person I am, the competition aspect is what originally caught my eye, and gave me the desire to learn about the intricacies of a Kaggle Competition.

How Kaggle works

While combing through the Kaggle website and other informative articles, I found there are three basic steps in Kaggle Competitions.

  1. Preparation: Each Kaggle competition has a host, and each host has to prepare and provide data. When providing data, the host has the opportunity to give additional information such as a description, evaluation method, timeline, and prize for winning.
pubg kaggle competition description
Preparation of a Kaggle competition with the details
  1. Experimentation: At this time, you’ve had your morning coffee, you’ve read all the information in the overview 500 times, and you’re ready to win 1st place. Now is the time to experiment, submit, and learn. There are three ways to upload your work:
    • Kaggle Kernels
    • Manual Uploads
    • Kaggle API

    If you don’t want anyone to really know what you’re doing, you should upload your experiments manually or by using the Kaggle API. Kaggle Kernels are a way for competitors to share what they’ve accomplished and get feedback from their peers. Kernels will give you ideas as to how to conquer the data, and I suggest you go through some of the popular ones.

    Kaggle kernels from pubg competition

  2. Results: In every Kaggle competition, there are public and private leaderboards. Be warned, the leaderboards are VERY different. The public leaderboard is based on a small percentage of the test data decided by the host. Although it gives you a good idea, it does not always reflect who will win and lose.

The private leaderboard is what really matters. Not calculated until the end of the competition, this leaderboard is based on a larger proportion of data and, ultimately, decides the winners and losers.

Private leaderboard - Kaggle competitions

Public leaderboard - Kaggle competitions

If you would like to dive deep into the different types or formats and datasets offered by Kaggle, take a look at Kaggle’s Help and Documentation.

Active Kaggle competitions

[Updated May 6, 2019]

Kaggle competitions have a limited amount of time you can enter your experiments. This list does not represent the amount of time left to enter or the level of difficulty associated with posted datasets. One way to determine the level of difficulty is to look at the prize.

Typically, the larger the prize, the more difficult/advanced the problem is. You can also look at the type of competition. You can find the four categories and Kaggle’s description of them below.

  1. Featured: “These are full-scale machine learning challenges which pose difficult, generally commercially-purposed prediction problems.”
  2. Research: “Research competitions feature problems which are more experimental than featured competition problems.”
  3. Getting Started: “These are semi-permanent competitions that are meant to be used by new users just getting their foot in the door
    in the field of machine learning.”
  4. Playground: “These are competitions which often provide relatively simple machine learning tasks, and are similarly targeted at newcomers or Kagglers interested in practicing
    a new type of problem in a lower-stakes setting.”

I will try my best to keep this list as up-to-date as possible. Unfortunately, I’m not spending all my time on Kaggle’s website. So if you see something has ended, or a new competition has been added, please leave a comment below. Thanks and have fun!

Learn more about Kaggle

DSD Sign
Written by Nathan Piccini
Have a similar idea? Submit your guest post with us
Newsletters | Data Science Dojo
Up for a Weekly Dose of Data Science?

Subscribe to our weekly newsletter & stay up-to-date with current data science news, blogs, and resources.

Data Science Dojo | data science for everyone

Discover more from Data Science Dojo

Subscribe to get the latest updates on AI, Data Science, LLMs, and Machine Learning.