ISL

Introduction to Statistical Learning

Symbols

OUTLINE A

Introduction
Statistical Learning
Linear Regression
Classification
Resampling Methods
Linear Model Selection and Regularization
Moving Beyond Linearity
Tree-Based Methods
Support Vector Machines
Unsupervised Learning

OUTLINE B

1. Introduction

2. Statistical Learning

What is Statistical Learning?
Assessing Model Accuracy
LAB: Intro to R
EXERCISES

3. Linear Regression

Simple Linear Regression
Multiple Linear Regression
Other considerations in the Regression Model
The Marketing Plan
Comparison of Linear Regression with K-Nearest Neighbors
LAB: Linear Regression
EXERCISES

4. Classification

An Overview of Classification
Why Not Linear Regression?
Logistic Regression
Linear Discriminant Analysis
A Comparison of Classification Methods
LAB: Logistic Regression, LDA, QDA and KNN
EXERCISES

5. Resampling Methods

Cross-Validation
The Bootstrap
LAB: Cross-Validation and the Bootstrap
EXERCISES

6. Linear Model Selection and Regularization

Subset Selection
Shrinkage Methods
Dimension Reduction Methods
Considerations in High Dimensions
LAB 1: Subset Selection Methods
LAB 2: Ridge Regression and the Lasso
LAB 3: PCR and PLS Regression
EXERCISES

7. Moving Beyond Linearity

Polynomial Regression
Step Functions
Basic Functions
Regression Splines
Smoothing Splines
Local Regression
Generalized Additive Models
LAB: Non-Linear Modeling
EXERCISES

8. Tree-Based Methods

The Basics of Decision Trees
Bagging, Random Forests, Boosting
LAB: Decision Trees

9. Support Vector Machiens

Maximal Margin Classifier
Support Vector Classifiers
Support Vector Machines
SVMs with More than Two Classes
Relationship to Logistic Regression
LAB: Support Vector Machines
EXERCISES

10. Unsupervised Learning

The CHallenge of Unsupervised Learning
Principal Components Analysis
Clustering Methods
LAB 1: Principal Components Analysis
LAB 2: Clustering
LAB 3: NC160 Data Example
EXERCISES

CHAPTER OUTLINES

Chapter 1: Introduction

Chapter 2: Statistical Learning

What is Statistical Learning?
1. Why Estimate F?
2. How do we estimate F?
3. The Trade-Off between Prediction Accuracy and Model Interpretability
4. Supervised versus Unsupervised Learning
5. Regression Versus Classification Problems
Assessing Model Accuracy
1. Measuring the Quality of Fit
2. The Bias-Variance Trade-Off
3. The Classification Setting
LAB: Intro to R
EXERCISES

2.1: What is Statistical Learning?

TL;DR: A set of approaches for estimating f (Where f is a function of X, our predictors/input variables, that equal Y our output variable)

Why Estimate F?
1. Prediction
  1. f can be a black box
  2. EX: We want to see if this patient’s blood sample will tell us if this person is at high risk for something
    - Who will respond positively to a mailing?
2. Inference
  1. When we want to understand the relationship between X and Y
  2. How Y changes as a function of X
  3. f cannot be a black box
    - Which predictors are associated with the response?
    - What is the relationship between the response and each predictor?
    - Can the relationship between Y and each predictor be adequately summarized using a linear equation or is the relationship more complicated?
    - What effect will changing the price of a product have on sales?
    - How much extra will a house be worth if it has a view of the river? TL;DR – Linear is good for inference, non-linear is better for prediction (and worse for interpretability)
How do we estimate F?
1. Parametric Methods
  1. Trying to fit to a linear model
  2. DEF: reduce the problem of estimating f down to one of estimating a set of PARAMETERS
2. Non-parametric Methods
  1. DEF: Seek an estimate of f that gets as close to the dta points as possible without being too rough or wiggly
  2. Large number of observations needed
The Trade-Off between Prediction Accuracy and Model Interpretability
1. Inflexible == more linear == less accurate, easier to interpret
2. Flexible == non linear (think svms) == more accurate, harder to interpret
Supervised versus Unsupervised Learning
1. SUPERVISED: For each observation of the predictor measurements (xi), there is an associated response measurement (yi)
  1. PREDICTION: Accurately predict the response for future observations
  2. INFERENCE: Better understand the relationship between the response and predictors
2. UNSUPERVISED: “Flying blind” – it’s not possible to fit a linear regression bc we don’t have a response variable that can “supervise” our analysis.
Regression Versus Classification Problems

2.2: Assessing Model Accuracy

Measuring the Quality of Fit
The Bias-Variance Trade-Off
The Classification Setting
1. The Bayes Classifier
2. K-Nearest Neighbors

Daniel Caraway

Introduction to Statistical Learning

OUTLINE A

OUTLINE B

1. Introduction

2. Statistical Learning

3. Linear Regression

4. Classification

5. Resampling Methods

6. Linear Model Selection and Regularization

7. Moving Beyond Linearity

8. Tree-Based Methods

9. Support Vector Machiens

10. Unsupervised Learning

CHAPTER OUTLINES

Chapter 1: Introduction

Chapter 2: Statistical Learning

2.1: What is Statistical Learning?

2.2: Assessing Model Accuracy