Guide to Data Science Competitions

“Don't worry about a thing,every littleSummer is finally here and so are the long form virtual hackathons. Unlike a traditional hackathon, which focus on what you can build in one place in one limited time span, virtual hackathons typically give you a month or more to work from where ever you like.

And for those of us who love data, we are not left behind. There are a number of data science competitions to choose from this summer. Whether it’s a new Kaggle challenge (which are posted year round) or the data science component of Challenge Post’s Summer Jam Series, there are plenty of opportunities to spend the summer either sharpening or showing off your skills.

The Landscape: Which Competitions are Which?

  • Kaggle
    Kaggle competitions have corporate sponsors that are looking for specific business questions answered with their sample data. In return, winners are rewarded handsomely, but you have to win first.
  • Summer Jam
    Challenge Post’s Summer Jam Open Data Mashup runs in June and focuses on mashing up multiple open data sets (use the Data Search Engine to find some great options). Competitors are not asked to answer a specific question, so this competition is well suited for beautiful experiments in visualizing data.
  • DrivenData
    Like Kaggle, DrivenData competitions have a sponsor with a specific research question and specific sample data. DrivenData sponsors, however, tend to be more social impact minded.

Over the months we’ve posted many great links on winning data science competitions through our mailing list, but if you’ve missed them here’s a list of the best resources, advice and tutorials:

Choosing Your Weapons

3 Must-Ask Questions Before Choosing That Machine Learning Algorithm!

Dictionary of Algorithms and Data Structures

Fast Non-Standard Data Structures for Python

A list of assorted tools and such mentioned and used During DSSG 2014

Data Science Resources

12 Best Free Ebooks for Machine Learning

Top 10 data mining algorithms in plain English

Python Shortcuts
The Top Mistakes Developers Make When Using Python for Big Data Analytics

11 Python Libraries You Might Not Know

iPython Notebook Gallery (includes pandas cheat sheet)

D3.js Step by Step

For inspiration, check this index of visualization types for visualizing text

Gestalt Principles for Data Visualization

Advice From Past Competitors
Machine learning best practices we’ve learned from hundreds of competitions – Ben Hamner of Kaggle


What I Learned From The Kaggle Criteo Data Science Odyssey

6 Tricks I Learned From The OTTO Kaggle Challenge

How to use R, H2O, and Domino for a Kaggle competition

Competing in a data science contest without reading the data