Regression
Notes

Regression

Simple Linear Regression

  • bridges statistics and machine learning
  • dependent variable (predition)
  • independent variable (explanatory)
  • salary vs experience
  • height vs weight
  • types of Regression
    • Linear
    • Polynomial
    • decision tree
    • random forest
    • support vector
    • ridge
    • lasso
    • elasticnet
    • bayesian
  • example
    • y - continous variable
    • x - dependent variable
  • equation
    • y
    • slope
    • random error term
  • scatter plot
  • best fit line (overfitting)

Least Squares Method for finding Parameters

  • least square estimate (goal of linear algebra)

  • error

    • residual error
    • sum of residuals (offset problem)
    • square of sum of residuals
  • least squares Method

    • minimize the errors
    • why square errors
      • measuing the magnitude of erros, regardless of +ve or -ve
    • area of the residual box - did not get it ??

R2 understanding

  • Coefficient of Determination (R^2)

  • SST = SSE + SSR

  • SST - total sum of squares

  • SSE - sum of squared errors

  • SSR - regression sum of squares

  • proportion of our data explained in variation

  • SSR/SST

  • (1 - (SSE/SST))

  • between to 0 - 1

  • 0 means bad

  • 1 means good

  • explain variation

Multiple Linear Regression

  • use multiple inputs to predict one output

  • predict one continous variable

  • given two or more dependent variable

  • hyperplane

  • linear relationship

  • normality of errors

    • errors are normally distributed
    • needed only for inference
    • diagnose with a nomal qq plot
  • uncorrelated errors

    • errors should be uncorrelated
    • iid
    • one obvious violation would be time series data
    • the error today predicts the error tomorrow
    • dianost with an index plot
  • homscedasticity

    • heteroskedasticity
    • constant variance of errors
    • unequal scatter of the residual in comparision to measured values
    • diagnose with a plot of fitted values versus residuals
  • no or little MultiCollinearity

    • independent variables should not be correlated with each other

Matrix Inversion

  • matrix algebra
  • matrix inversion method
  • deisgn matrix

Gradient Descent

  • numerical optimization

Adjusted R2 understanding

Polynomial Regression

  • polynomial vs simple linear regression

Assumptions of Linear Regression

Feature Selection

  • what is causing multi MultiCollinearity issue?
  • overfitting
  • corelation Coefficient
  • sequential forward feature Selection (greedy algo)
  • elbow plot
  • parsimonious model - simple model
  • backward stepwise selection (recursive feature elimination)

MultiCollinearity

  • VIF (variation inflation factor)
  • Principal Component Analysis
  • orthgonal = linearly independent
  • covariance matrix

EigenValue Decomposition

Singular Value Decomposition

Principal Component Analysis