Summer is finally here and so are the long form virtual hackathons. Unlike a traditional hackathon, which focus on what you can build in one place in one limited time span, virtual hackathons typically give you a month or more to work from where ever you like.
And for those of us who love data, we are not left behind. There are a number of data science competitions to choose from this summer. Whether it’s a new Kaggle challenge (which are posted year round) or the data science component of Challenge Post’s Summer Jam Series, there are plenty of opportunities to spend the summer either sharpening or showing off your skills.
The Landscape: Which Competitions are Which?
- Kaggle
Kaggle competitions have corporate sponsors that are looking for specific business questions answered with their sample data. In return, winners are rewarded handsomely, but you have to win first. - Summer Jam
Challenge Post’s Summer Jam Open Data Mashup runs in June and focuses on mashing up multiple open data sets (use the Data Search Engine to find some great options). Competitors are not asked to answer a specific question, so this competition is well suited for beautiful experiments in visualizing data. - DrivenData
Like Kaggle, DrivenData competitions have a sponsor with a specific research question and specific sample data. DrivenData sponsors, however, tend to be more social impact minded.
Over the months we’ve posted many great links on winning data science competitions through our mailing list, but if you’ve missed them here’s a list of the best resources, advice and tutorials:
Choosing Your Weapons
DATA SCIENCE WARS: R VS. PYTHON
http://101.datascience.community/2015/05/12/data-science-wars-r-vs-python/
3 Must-Ask Questions Before Choosing That Machine Learning Algorithm!
http://www.analyticbridge.com/profiles/blogs/wait-why-are-you-using-that-algorithm
Dictionary of Algorithms and Data Structures
http://xlinux.nist.gov/dads/
Fast Non-Standard Data Structures for Python
http://kmike.ru/python-data-structures/
A list of assorted tools and such mentioned and used During DSSG 2014
https://hackpad.com/A-list-of-assorted-tools-and-such-mentioned-and-used-During-DSSG-2014-wl5QgF3LsSU
Data Science Resources
https://github.com/jonathan-bower/DataScienceResources
12 Best Free Ebooks for Machine Learning
http://designimag.com/best-free-machine-learning-ebooks/
Top 10 data mining algorithms in plain English
http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/
Python Shortcuts
The Top Mistakes Developers Make When Using Python for Big Data Analytics
https://www.airpair.com/python/posts/top-mistakes-python-big-data-analytics
11 Python Libraries You Might Not Know
http://blog.yhathq.com/posts/11-python-libraries-you-might-not-know.html
iPython Notebook Gallery (includes pandas cheat sheet)
http://nb.bianp.net/sort/views/
Visualizations
D3.js Step by Step
http://zeroviscosity.com/category/d3-js-step-by-step
For inspiration, check this index of visualization types for visualizing text
http://textvis.lnu.se/
Gestalt Principles for Data Visualization
http://emeeks.github.io/gestaltdataviz/section1.html
Advice From Past Competitors
Machine learning best practices we’ve learned from hundreds of competitions – Ben Hamner of Kaggle
https://www.youtube.com/watch?v=9Zag7uhjdYo
LESSONS LEARNED FROM THE HUNT FOR PROHIBITED CONTENT ON KAGGLE
http://mlwave.com/lessons-from-avito-prohibited-content-kaggle/
What I Learned From The Kaggle Criteo Data Science Odyssey
https://medium.com/@chris_bour/what-i-learned-from-the-kaggle-criteo-data-science-odyssey-b7d1ba980e6
6 Tricks I Learned From The OTTO Kaggle Challenge
https://medium.com/@chris_bour/6-tricks-i-learned-from-the-otto-kaggle-challenge-a9299378cd61
How to use R, H2O, and Domino for a Kaggle competition
http://blog.dominodatalab.com/using-r-h2o-and-domino-for-a-kaggle-competition/
Competing in a data science contest without reading the data
http://blog.mrtz.org/2015/03/09/competition.html
KAGGLE ENSEMBLING GUIDE
http://mlwave.com/kaggle-ensembling-guide/
Check also our interactive introduction to R course https://www.datacamp.com/courses/free-introduction-to-r
Reblogged this on .
Reblogged this on Do not stop thinking.
Reblogged this on My Big Data and Machine Learning Hub.