Regression
Simple Linear Regression
- bridges statistics and machine learning
- dependent variable (predition)
- independent variable (explanatory)
- salary vs experience
- height vs weight
- types of Regression
- Linear
- Polynomial
- decision tree
- random forest
- support vector
- ridge
- lasso
- elasticnet
- bayesian
- example
- y - continous variable
- x - dependent variable
- equation
- y
- slope
- random error term
- scatter plot
- best fit line (overfitting)
Least Squares Method for finding Parameters
-
least square estimate (goal of linear algebra)
-
error
- residual error
- sum of residuals (offset problem)
- square of sum of residuals
-
least squares Method
- minimize the errors
- why square errors
- measuing the magnitude of erros, regardless of +ve or -ve
- area of the residual box - did not get it ??
R2 understanding
-
Coefficient of Determination (R^2)
-
SST = SSE + SSR
-
SST - total sum of squares
-
SSE - sum of squared errors
-
SSR - regression sum of squares
-
proportion of our data explained in variation
-
SSR/SST
-
(1 - (SSE/SST))
-
between to 0 - 1
-
0 means bad
-
1 means good
-
explain variation
Multiple Linear Regression
-
use multiple inputs to predict one output
-
predict one continous variable
-
given two or more dependent variable
-
hyperplane
-
linear relationship
-
normality of errors
- errors are normally distributed
- needed only for inference
- diagnose with a nomal qq plot
-
uncorrelated errors
- errors should be uncorrelated
- iid
- one obvious violation would be time series data
- the error today predicts the error tomorrow
- dianost with an index plot
-
homscedasticity
- heteroskedasticity
- constant variance of errors
- unequal scatter of the residual in comparision to measured values
- diagnose with a plot of fitted values versus residuals
-
no or little MultiCollinearity
- independent variables should not be correlated with each other
Matrix Inversion
- matrix algebra
- matrix inversion method
- deisgn matrix
Gradient Descent
- numerical optimization
Adjusted R2 understanding
Polynomial Regression
- polynomial vs simple linear regression
Assumptions of Linear Regression
Feature Selection
- what is causing multi MultiCollinearity issue?
- overfitting
- corelation Coefficient
- sequential forward feature Selection (greedy algo)
- elbow plot
- parsimonious model - simple model
- backward stepwise selection (recursive feature elimination)
MultiCollinearity
- VIF (variation inflation factor)
- Principal Component Analysis
- orthgonal = linearly independent
- covariance matrix