티스토리

GOOD's DATA LAB

검색하기

[Kaggle] 캐글 커리큘럼 (by. 이유한님)

Data Science/Kaggle

[Kaggle] 캐글 커리큘럼 (by. 이유한님)

구떼이 2019. 12. 30. 22:05

kaggle.

Kaggle 스터디를 위한 커리큘럼입니다.
커리큘럼 참여에 있어 "처음부터 끝까지 3번씩 따라쓰고 이해하는 것"이 중요합니다.
그리고 이 과정을 통해 어떠한 Data인지, Project는 어떤 것인지, 어떤 학습이 되었는지
이해하는 것 또한 중요합니다.

이유한님께서 캐글 스터디 전용 커리큘럼을 정리해주셨다고 해서 사이트 또한 함께 공유합니다.

https://kaggle-kr.tistory.com/32

약간의 변경을 했으니, 커리큘럼을 필사적으로 필사하여 따라가봅시다.

1. 표 데이터(Tabular data)

1-1. 표를 이용한 이진 분류(Binary classification)

1st level. Titanic: Machine Learning from Disaster

Titanic: Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

www.kaggle.com

1주차 : 타이타닉 튜토리얼 1 - Exploratory data analysis, visualization, machine learning
2주차 : EDA To Prediction(DieTanic)
3주차 : Titanic Top 4% with ensemble modeling
4주차 : Introduction to Ensembling/Stacking in Python

2nd level. Porto Seguro’s Safe Driver Prediction

Porto Seguro’s Safe Driver Prediction

Predict if a driver will file an insurance claim next year.

www.kaggle.com

5주차 : Data Preparation & Exploration
6주차 : Interactive Porto Insights - A Plot.ly Tutorial
7주차 : XGBoost CV (LB .284)
8주차 : Porto Seguro Exploratory Analysis and Prediction

3rd level. Home Credit Default Risk

Home Credit Default Risk

Can you predict how capable each applicant is of repaying a loan?

www.kaggle.com

9주차 : Introduction: Home Credit Default Risk Competition
10주차 : Introduction to Manual Feature Engineering
11주차 : Stacking Test-Sklearn, XGBoost, CatBoost, LightGBM
12주차 : LightGBM 7th place solution

1-2. 표를 이용한 다중 분류(Multi-class classification)

1st level. Costa Rican Household Poverty Level Prediction

Costa Rican Household Poverty Level Prediction

Can you identify which households have the highest need for social welfare assistance?

www.kaggle.com

13주차 : A Complete Introduction and Walkthrough
14주차 : 3250feats->532 feats using shap[LB: 0.436]
15주차 : XGBoost

1-3. 표를 이용한 회귀(Regression)

1st level. New York City Taxi Trip Duration

New York City Taxi Trip Duration

Share code and data to improve ride time predictions

www.kaggle.com

16주차 : Dynamics of New York city - Animation
17주차 : EDA + Baseline Model
18주차 : Beat the benchmark!

2nd level. Zillow Prize: Zillow’s Home Value Prediction (Zestimate)

Zillow Prize: Zillow’s Home Value Prediction (Zestimate)

Can you improve the algorithm that changed the world of real estate?

www.kaggle.com

19주차 : Simple Exploration Notebook - Zillow Prize
20주차 : Simple XGBoost Starter (~0.0655)
21주차 : Zillow EDA On Missing Values & Multicollinearity
22주차 : XGBoost, LightGBM, and OLS and NN

2. 이미지 데이터(Image classification)

2-1. 이미지를 이용한 이진 분류(Binary classification)

1st level. Statoil/C-CORE Iceberg Classifier Challenge

Statoil/C-CORE Iceberg Classifier Challenge

Ship or iceberg, can you decide from space?

www.kaggle.com

23주차 : Keras Model for Beginners (0.210 on LB)+EDA+R&D
24주차 : Transfer Learning with VGG-16 CNN+AUG LB 0.1712
25주차 : Submarineering.EVEN BETTER PUBLIC SCORE until now.
26주차 : Keras+TF LB 0.18

2-2. 이미지를 이용한 다중 분류(Multi-class classification)

1st level. TensorFlow Speech Recognition Challenge

TensorFlow Speech Recognition Challenge

Can you build an algorithm that understands simple speech commands?

www.kaggle.com

27주차 : Speech representation and data exploration
28주차 : Light-Weight CNN LB 0.74
29주차 : WavCeption V1: a 1-D Inception approach (LB 0.76)

3. 자연어 처리(Natural language processing)

1st level. Spooky Author Identification

Spooky Author Identification

Share code and discuss insights to identify horror authors from their writings

www.kaggle.com

30주차 : Spooky NLP and Topic Modelling tutorial
31주차 : Approaching (Almost) Any NLP Problem on Kaggle
32주차 : Simple Feature Engg Notebook - Spooky Author

2nd level. Mercari Price Suggestion Challenge

Mercari Price Suggestion Challenge

Can you automatically suggest product prices to online sellers?

www.kaggle.com

33주차 : Mercari Interactive EDA + Topic Modelling
34주차 : A simple nn solution with Keras (~0.48611 PL)
35주차 : Ridge (LB 0.41943)
36주차 : LGB and FM [18th Place - 0.40604]

3rd level. Toxic Comment Classification Challenge

Toxic Comment Classification Challenge

Identify and classify toxic online comments

www.kaggle.com

37주차 : [For Beginners] Tackling Toxic Using Keras
38주차 : Stop the S@#$ - Toxic Comments EDA
39주차 : Logistic regression with words and char n-grams
40주차 : Classifying multi-label comments (0.9741 lb)

4. 딥러닝을 이용한 객체 분할(Object segmentation)

1st level. 2018 Data Science Bowl

2018 Data Science Bowl

Find the nuclei in divergent images to advance medical discovery

www.kaggle.com

41주차 : Teaching notebook for total imaging newbies
42주차 : Keras U-Net starter - LB 0.277
43주차 : Nuclei Overview to Submission

5. 기타 : 이상 검출(anomaly detection), 시각화(visualization)

1st level. Credit Card Fraud Detection

Credit Card Fraud Detection

Anonymized credit card transactions labeled as fraudulent or genuine

www.kaggle.com

44주차 : In depth skewed data classif. (93% recall acc now)
45주차 : Anomaly Detection - Credit Card Fraud Analysis
46주차 : Semi-Supervised Anomaly Detection Survey

2nd level. Kaggle Machine Learning & Data Science Survey 2017

2017 Kaggle ML & DS Survey

A big picture view of the state of data science and machine learning.

www.kaggle.com

47주차 : Novice to Grandmaster
48주차 : What do Kagglers say about Data Science ?
49주차 : PLOTLY TUTORIAL - 1

저작자표시