[Kaggle] 캐글 커리큘럼 (by. 이유한님)
Kaggle 스터디를 위한 커리큘럼입니다.
커리큘럼 참여에 있어 "처음부터 끝까지 3번씩 따라쓰고 이해하는 것"이 중요합니다.
그리고 이 과정을 통해 어떠한 Data인지, Project는 어떤 것인지, 어떤 학습이 되었는지
이해하는 것 또한 중요합니다.
이유한님께서 캐글 스터디 전용 커리큘럼을 정리해주셨다고 해서 사이트 또한 함께 공유합니다.
https://kaggle-kr.tistory.com/32
약간의 변경을 했으니, 커리큘럼을 필사적으로 필사하여 따라가봅시다.
1. 표 데이터(Tabular data)
1-1. 표를 이용한 이진 분류(Binary classification)
1st level. Titanic: Machine Learning from Disaster
Titanic: Machine Learning from Disaster
Start here! Predict survival on the Titanic and get familiar with ML basics
www.kaggle.com
- 1주차 : 타이타닉 튜토리얼 1 - Exploratory data analysis, visualization, machine learning
- 2주차 : EDA To Prediction(DieTanic)
- 3주차 : Titanic Top 4% with ensemble modeling
- 4주차 : Introduction to Ensembling/Stacking in Python
2nd level. Porto Seguro’s Safe Driver Prediction
Porto Seguro’s Safe Driver Prediction
Predict if a driver will file an insurance claim next year.
www.kaggle.com
- 5주차 : Data Preparation & Exploration
- 6주차 : Interactive Porto Insights - A Plot.ly Tutorial
- 7주차 : XGBoost CV (LB .284)
- 8주차 : Porto Seguro Exploratory Analysis and Prediction
3rd level. Home Credit Default Risk
Home Credit Default Risk
Can you predict how capable each applicant is of repaying a loan?
www.kaggle.com
- 9주차 : Introduction: Home Credit Default Risk Competition
- 10주차 : Introduction to Manual Feature Engineering
- 11주차 : Stacking Test-Sklearn, XGBoost, CatBoost, LightGBM
- 12주차 : LightGBM 7th place solution
1-2. 표를 이용한 다중 분류(Multi-class classification)
1st level. Costa Rican Household Poverty Level Prediction
Costa Rican Household Poverty Level Prediction
Can you identify which households have the highest need for social welfare assistance?
www.kaggle.com
- 13주차 : A Complete Introduction and Walkthrough
- 14주차 : 3250feats->532 feats using shap[LB: 0.436]
- 15주차 : XGBoost
1-3. 표를 이용한 회귀(Regression)
1st level. New York City Taxi Trip Duration
New York City Taxi Trip Duration
Share code and data to improve ride time predictions
www.kaggle.com
- 16주차 : Dynamics of New York city - Animation
- 17주차 : EDA + Baseline Model
- 18주차 : Beat the benchmark!
2nd level. Zillow Prize: Zillow’s Home Value Prediction (Zestimate)
Zillow Prize: Zillow’s Home Value Prediction (Zestimate)
Can you improve the algorithm that changed the world of real estate?
www.kaggle.com
- 19주차 : Simple Exploration Notebook - Zillow Prize
- 20주차 : Simple XGBoost Starter (~0.0655)
- 21주차 : Zillow EDA On Missing Values & Multicollinearity
- 22주차 : XGBoost, LightGBM, and OLS and NN
2. 이미지 데이터(Image classification)
2-1. 이미지를 이용한 이진 분류(Binary classification)
1st level. Statoil/C-CORE Iceberg Classifier Challenge
Statoil/C-CORE Iceberg Classifier Challenge
Ship or iceberg, can you decide from space?
www.kaggle.com
- 23주차 : Keras Model for Beginners (0.210 on LB)+EDA+R&D
- 24주차 : Transfer Learning with VGG-16 CNN+AUG LB 0.1712
- 25주차 : Submarineering.EVEN BETTER PUBLIC SCORE until now.
- 26주차 : Keras+TF LB 0.18
2-2. 이미지를 이용한 다중 분류(Multi-class classification)
1st level. TensorFlow Speech Recognition Challenge
TensorFlow Speech Recognition Challenge
Can you build an algorithm that understands simple speech commands?
www.kaggle.com
- 27주차 : Speech representation and data exploration
- 28주차 : Light-Weight CNN LB 0.74
- 29주차 : WavCeption V1: a 1-D Inception approach (LB 0.76)
3. 자연어 처리(Natural language processing)
1st level. Spooky Author Identification
Spooky Author Identification
Share code and discuss insights to identify horror authors from their writings
www.kaggle.com
- 30주차 : Spooky NLP and Topic Modelling tutorial
- 31주차 : Approaching (Almost) Any NLP Problem on Kaggle
- 32주차 : Simple Feature Engg Notebook - Spooky Author
2nd level. Mercari Price Suggestion Challenge
Mercari Price Suggestion Challenge
Can you automatically suggest product prices to online sellers?
www.kaggle.com
- 33주차 : Mercari Interactive EDA + Topic Modelling
- 34주차 : A simple nn solution with Keras (~0.48611 PL)
- 35주차 : Ridge (LB 0.41943)
- 36주차 : LGB and FM [18th Place - 0.40604]
3rd level. Toxic Comment Classification Challenge
Toxic Comment Classification Challenge
Identify and classify toxic online comments
www.kaggle.com
- 37주차 : [For Beginners] Tackling Toxic Using Keras
- 38주차 : Stop the S@#$ - Toxic Comments EDA
- 39주차 : Logistic regression with words and char n-grams
- 40주차 : Classifying multi-label comments (0.9741 lb)
4. 딥러닝을 이용한 객체 분할(Object segmentation)
1st level. 2018 Data Science Bowl
2018 Data Science Bowl
Find the nuclei in divergent images to advance medical discovery
www.kaggle.com
- 41주차 : Teaching notebook for total imaging newbies
- 42주차 : Keras U-Net starter - LB 0.277
- 43주차 : Nuclei Overview to Submission
5. 기타 : 이상 검출(anomaly detection), 시각화(visualization)
1st level. Credit Card Fraud Detection
Credit Card Fraud Detection
Anonymized credit card transactions labeled as fraudulent or genuine
www.kaggle.com
- 44주차 : In depth skewed data classif. (93% recall acc now)
- 45주차 : Anomaly Detection - Credit Card Fraud Analysis
- 46주차 : Semi-Supervised Anomaly Detection Survey
2nd level. Kaggle Machine Learning & Data Science Survey 2017
2017 Kaggle ML & DS Survey
A big picture view of the state of data science and machine learning.
www.kaggle.com
- 47주차 : Novice to Grandmaster
- 48주차 : What do Kagglers say about Data Science ?
- 49주차 : PLOTLY TUTORIAL - 1