Simple Linear Regression

Learning Objectives Coverage

LO1: Describe a simple linear regression model, how the least squares criterion is used to estimate regression coefficients, and the interpretation of these coefficients

Core Concept

Simple linear regression extends the correlation analysis from Topic 9 into a predictive framework. Where correlation tells us whether and how strongly two variables are related, regression tells us the specific nature of that relationship and allows us to make predictions. The model Y = b₀ + b₁X + ε forms the basis for beta estimation in equity analysis, factor models in portfolio management, and yield curve modeling in fixed income.

Key components:

Dependent variable (Y): The outcome being predicted
Independent variable (X): The predictor variable
Intercept (b₀): Y-value when X equals zero
Slope (b₁): Change in Y for one-unit change in X
Error term (ε): Unexplained variation

Formulas & Calculations

Main formula: Y = b₀ + b₁X + ε formula exam-focus
- b₁ = Σ[(X - X̄)(Y - Ȳ)] / Σ[(X - X̄)²] formula
- b₀ = Ȳ - b₁X̄ formula
HP 12C steps:
1. Enter paired data points using statistics mode
2. Press [f] [REG] to calculate regression coefficients
3. Display b₁: [RCL] [3], Display b₀: [RCL] [4]
Common variations: Standardized coefficients, natural log transformations

Practical Examples

Traditional Finance Example: Predicting stock returns (Y) based on market returns (X)
- Given: Stock returns = 2% + 1.2 × Market returns
- If market return = 5%, predicted stock return = 2% + 1.2(5%) = 8%
Calculation walkthrough: Least squares minimizes Σε² = Σ(Y - Ŷ)²
Interpretation: Slope coefficient represents systematic risk (beta), intercept represents alpha

DeFi Application

Protocol example: Predicting Uniswap V3 TVL based on ETH price movements defi-application
Implementation: Smart contracts can use linear regression oracles for dynamic fee adjustments
Advantages/Challenges: Real-time data availability vs. gas costs for on-chain calculations

LO2: Explain the assumptions underlying the simple linear regression model, and describe how residuals and residual plots indicate if these assumptions may have been violated

Core Concept

Definition: Four critical assumptions must hold for valid regression results: linearity, homoscedasticity, independence, and normality
Why it matters: Violated assumptions lead to biased estimates and invalid statistical tests
Key components:
- Linearity: Relationship is truly linear
- Homoscedasticity: Constant error variance
- Independence: Errors are uncorrelated
- Normality: Errors follow normal distribution

Formulas & Calculations

Residual calculation: e = Y - Ŷ = Y - (b₀ + b₁X)
Durbin-Watson test: DW = Σ(eₜ - eₜ₋₁)² / Σeₜ²
HP 12C steps: Calculate residuals by subtracting predicted values from actual values
Common variations: Breusch-Pagan test for homoscedasticity, Jarque-Bera test for normality

Practical Examples

Traditional Finance Example: Testing beta stability in CAPM model
Calculation walkthrough: Plot residuals vs. fitted values to check for patterns
Interpretation: Random scatter indicates valid assumptions; patterns suggest violations

DeFi Application

Protocol example: Validating assumptions in yield farming return predictions
Implementation: Automated residual analysis in DeFi analytics platforms
Advantages/Challenges: Continuous monitoring vs. computational complexity

LO3: Calculate and interpret measures of fit and formulate and evaluate tests of fit and of regression coefficients in a simple linear regression

Core Concept

Definition: Measures of fit assess how well the regression model explains the variation in the dependent variable
Why it matters: Determines model reliability for investment decisions and risk management
Key components:
- R²: Coefficient of determination
- Adjusted R²: R² adjusted for degrees of freedom
- F-statistic: Overall model significance
- t-statistics: Individual coefficient significance

Formulas & Calculations

R² formula: R² = SSR/SST = 1 - (SSE/SST) formula exam-focus
- SST = Σ(Y - Ȳ)² (Total Sum of Squares)
- SSR = Σ(Ŷ - Ȳ)² (Regression Sum of Squares)
- SSE = Σ(Y - Ŷ)² (Error Sum of Squares)
F-statistic: F = MSR/MSE = [SSR/1] / [SSE/(n-2)]
t-statistic: t = (b₁ - 0) / sb₁
HP 12C steps: Use correlation function [f] [REG], square result for R²

Practical Examples

Traditional Finance Example: Portfolio beta has R² = 0.64, meaning 64% of return variation is explained by market movements
Calculation walkthrough: F-test determines if the relationship is statistically significant
Interpretation: Higher R² indicates better fit, but consider overfitting risks

DeFi Application

Protocol example: Measuring goodness of fit in automated market maker price prediction models
Implementation: Dynamic R² monitoring for rebalancing strategies
Advantages/Challenges: Real-time model validation vs. gas optimization

LO4: Describe the use of analysis of variance (ANOVA) in regression analysis, interpret ANOVA results, and calculate and interpret the standard error of estimate in a simple linear regression

Core Concept

Definition: ANOVA decomposes total variation into explained (regression) and unexplained (error) components
Why it matters: Provides framework for testing overall model significance and quantifying prediction accuracy
Key components:
- Total variation (SST)
- Explained variation (SSR)
- Unexplained variation (SSE)
- Standard error of estimate (SEE)

Formulas & Calculations

ANOVA identity: SST = SSR + SSE formula exam-focus
Standard error of estimate: SEE = √[SSE/(n-2)] formula
Mean square regression: MSR = SSR/1
Mean square error: MSE = SSE/(n-2)
HP 12C steps: Calculate SEE using standard deviation function on residuals

Practical Examples

Traditional Finance Example: ANOVA table shows F = 25.6, p < 0.001, indicating significant relationship
Calculation walkthrough: SEE = $2.50 m e an s 68$ 2.50 of actual values
Interpretation: Lower SEE indicates more precise predictions

DeFi Application

Protocol example: ANOVA analysis of liquidity provision returns in automated market makers
Implementation: Standard error bands for impermanent loss prediction
Advantages/Challenges: Continuous recalibration vs. transaction costs

LO5: Calculate and interpret the predicted value for the dependent variable, and a prediction interval for it, given an estimated linear regression model and a value for the independent variable

Core Concept

Definition: Prediction intervals provide range estimates for future observations, accounting for both parameter uncertainty and inherent variability
Why it matters: Enables risk-aware decision making in volatile DeFi and traditional markets
Key components:
- Point prediction: Single best estimate
- Prediction interval: Range with specified confidence level
- Confidence interval: Range for mean response
- Standard error of prediction

Formulas & Calculations

Point prediction: Ŷ = b₀ + b₁X
Prediction interval: Ŷ ± t(α/2,n-2) × SEE × √[1 + 1/n + (X-X̄)²/Σ(X-X̄)²] formula exam-focus
Confidence interval for mean: Ŷ ± t(α/2,n-2) × SEE × √[1/n + (X-X̄)²/Σ(X-X̄)²] formula
HP 12C steps: Use normal distribution for large samples, t-distribution for small samples

Practical Examples

Traditional Finance Example: Predict stock price with 95% confidence: $52.30 \pm$ 4.20
Calculation walkthrough: Wider intervals for X values far from mean
Interpretation: Prediction intervals are wider than confidence intervals

DeFi Application

Protocol example: Predicting token price ranges for automated rebalancing
Implementation: Dynamic prediction intervals in yield optimization strategies
Advantages/Challenges: Adaptive confidence levels vs. computational overhead

LO6: Describe different functional forms of simple linear regressions

Core Concept

Definition: Transformations of variables to capture non-linear relationships while maintaining linear regression framework
Why it matters: Many financial relationships are non-linear but can be linearized through transformations
Key components:
- Log-linear models
- Linear-log models
- Log-log models
- Polynomial terms

Formulas & Calculations

Log-linear: ln(Y) = b₀ + b₁X (exponential growth)
Linear-log: Y = b₀ + b₁ln(X) (diminishing returns)
Log-log: ln(Y) = b₀ + b₁ln(X) (elasticity model)
HP 12C steps: Use [LN] function to transform variables before regression

Practical Examples

Traditional Finance Example: Stock returns vs. log of market cap (size effect)
Calculation walkthrough: ln(Price) = 2.5 + 0.3×Time captures compound growth
Interpretation: Log transformations often stabilize variance and linearize relationships

DeFi Application

Protocol example: Log-log model for liquidity vs. trading volume in AMMs
Implementation: Transformed variables in yield curve modeling
Advantages/Challenges: Better model fit vs. interpretation complexity

Core Concepts Summary (80/20 Principle)

Must-Know Concepts

Least Squares Estimation: Minimizes sum of squared residuals to find best-fit line
R² Interpretation: Proportion of variation in Y explained by X
Assumption Violations: Check residual plots for patterns indicating problems
Statistical Significance: Use t-tests for coefficients, F-test for overall model
Prediction vs. Confidence Intervals: Prediction intervals are wider due to additional uncertainty

Quick Reference Table

Concept	Formula	When to Use	DeFi Equivalent
Simple Regression	Y = b₀ + b₁X + ε	Linear relationships	Token price vs. TVL
R²	SSR/SST	Model fit assessment	AMM efficiency metrics
F-statistic	MSR/MSE	Overall significance	Protocol risk models
Prediction Interval	Ŷ ± t×SEE×√[…]	Forecasting	Yield range prediction

Comprehensive Formula Sheet

Essential Formulas

Formula 1: Simple Linear Regression
Y = b₀ + b₁X + ε
Where: Y = dependent variable, X = independent variable, 
       b₀ = intercept, b₁ = slope, ε = error term
Used for: Modeling linear relationships

Formula 2: Least Squares Coefficients
b₁ = Σ[(X - X̄)(Y - Ȳ)] / Σ[(X - X̄)²]
b₀ = Ȳ - b₁X̄
Where: X̄, Ȳ = sample means
Used for: Coefficient estimation

Formula 3: Coefficient of Determination
R² = SSR/SST = 1 - (SSE/SST)
Where: SST = total sum of squares, SSR = regression sum of squares,
       SSE = error sum of squares
Used for: Measuring goodness of fit

Formula 4: F-statistic
F = MSR/MSE = [SSR/1] / [SSE/(n-2)]
Where: MSR = mean square regression, MSE = mean square error
Used for: Testing overall model significance

Formula 5: Standard Error of Estimate
SEE = √[SSE/(n-2)]
Where: n = sample size
Used for: Measuring prediction accuracy

Formula 6: Prediction Interval
Ŷ ± t(α/2,n-2) × SEE × √[1 + 1/n + (X-X̄)²/Σ(X-X̄)²]
Used for: Forecasting with uncertainty bounds

HP 12C Calculator Sequences

Operation 1: Linear Regression Setup
RPN Steps: [f] [CLx], enter data pairs [ENTER] [Σ+], [f] [REG]
Example: Calculate slope and intercept from data points

Operation 2: Correlation Coefficient
RPN Steps: [f] [REG], [RCL] [7] (displays correlation)
Example: r = 0.85 indicates strong positive relationship

Operation 3: Prediction Calculation
RPN Steps: [RCL] [4], X value [ENTER], [RCL] [3], [×], [+]
Example: Predict Y when X = 10 using stored coefficients

Practice Problems

Basic Level (Understanding)

Problem: A regression of stock returns (Y) on market returns (X) yields: Y = 0.02 + 1.3X. The R² = 0.56.
- Given: Regression equation and R²
- Find: Interpret the coefficients and R²
- Solution:
  - Intercept (0.02): Stock has 2% expected return when market return is 0%
  - Slope (1.3): For each 1% increase in market return, stock return increases 1.3%
  - R² (0.56): 56% of stock return variation is explained by market movements
- Answer: The stock has above-market sensitivity (beta > 1) and modest explanatory power

Intermediate Level (Application)

Problem: A DeFi protocol’s TVL (Y, in millions) is regressed against token price (X, in dollars): Y = 50 + 15X, SEE = $25M, n = 30.
- Given: Regression equation, standard error, sample size
- Find: 95% prediction interval when token price = $10
- Solution:
  - Point prediction: Ŷ = 50 + 15(10) = $200M
  - t₀.₀₂₅,₂₈ ≈ 2.048
  - Prediction interval: 200 ± 2.048 × 25 × √[1 + 1/30 + (10-X̄)²/Σ(X-X̄)²]
  - Assuming standard terms: 200 ± 51.2
- Answer: TVL prediction ranges from $148.8 Mt o$ 251.2M with 95% confidence

Advanced Level (Analysis)

Problem: Analyze a yield farming return model using log transformations. Original model: ln(Yield) = 2.5 + 0.8×ln(Risk), R² = 0.72, F = 45.6
- Given: Log-log regression with goodness-of-fit measures
- Find: Interpret the elasticity coefficient and evaluate model adequacy
- Solution:
  - Elasticity interpretation: 1% increase in risk leads to 0.8% increase in yield
  - Model fit: 72% of yield variation explained by risk
  - F-test: Highly significant relationship (F = 45.6 >> F₀.₀₅,₁,₂₈ ≈ 4.2)
  - Economic meaning: Diminishing returns to risk-taking
- Answer: Model shows strong risk-return relationship with less than proportional yield increases for additional risk, consistent with efficient market theory

DeFi Applications & Real-World Examples

Traditional Finance Context

Institution Example: Banks use regression to model loan default rates based on credit scores
Market Application: Beta estimation for portfolio risk management
Historical Case: CAPM validation studies using market index regression

DeFi Parallels

Protocol Implementation: Compound protocol uses regression-based interest rate models defi-application
Smart Contract Logic: Automated market makers employ regression for price discovery
Advantages: Real-time recalibration, transparent algorithms, 24/7 operation
Limitations: Gas costs, oracle dependencies, model risk in volatile markets

Case Studies

Case 1: Uniswap V3 Liquidity Prediction
- Background: AMM needs to predict optimal liquidity ranges
- Analysis: Regression of trading volume on price volatility and TVL
- Outcomes: Improved capital efficiency through dynamic range adjustment
- Lessons learned: Non-linear relationships require careful transformation

Common Pitfalls & Exam Tips

Frequent Mistakes

Mistake 1: Confusing correlation with causation - regression shows association, not causation
Mistake 2: Ignoring assumption violations - always check residual plots
Mistake 3: Over-interpreting R² - high R² doesn’t guarantee good predictions outside sample range

Exam Strategy

Time management: Allocate 4-5 minutes per regression problem
Question patterns: Often combined with hypothesis testing and confidence intervals
Quick checks: Verify R² is between 0 and 1, check units in predictions

Key Takeaways

Essential Points

✓ Simple linear regression models Y = b₀ + b₁X + ε where b₁ represents the marginal effect ✓ R² measures explained variation; higher values indicate better model fit ✓ Four key assumptions: linearity, homoscedasticity, independence, normality ✓ F-test evaluates overall model significance; t-tests evaluate individual coefficients ✓ Prediction intervals are wider than confidence intervals due to additional uncertainty

Memory Aids

Mnemonic: “LINE” for assumptions (Linearity, Independence, Normality, Equal variance)
Visual: Scatter plot with best-fit line and residual plots
Analogy: Regression is like finding the “average” relationship between variables

Cross-References & Additional Resources

Prerequisite: Statistical Measures of Asset Returns, Hypothesis Testing
Related: Parametric and Non-Parametric Tests of Independence
Advanced: Multiple Linear Regression (not in Finance Certification 1)

Source Materials

Primary Reading: Volume 1, Chapter 10, Simple Linear Regression
Key Sections: Least squares estimation, assumption testing, ANOVA
Practice Questions: End-of-chapter problems 1-15

External Resources

Videos: Khan Academy statistics series on regression
Articles: “Regression Analysis in Finance” - Finance Research Foundation
Tools: Excel regression analysis, R statistical software, Python scipy

Review Checklist

Before moving on, ensure you can:

Explain each learning objective in your own words
Calculate regression coefficients using least squares method
Complete ANOVA table and interpret F-statistic
Check regression assumptions using residual analysis
Calculate and interpret prediction intervals
Identify appropriate functional form transformations
Apply concepts to both traditional finance and DeFi scenarios

Home

Explorer

Topic 10: Simple Linear Regression

Simple Linear Regression

Learning Objectives Coverage

LO1: Describe a simple linear regression model, how the least squares criterion is used to estimate regression coefficients, and the interpretation of these coefficients

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO2: Explain the assumptions underlying the simple linear regression model, and describe how residuals and residual plots indicate if these assumptions may have been violated

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO3: Calculate and interpret measures of fit and formulate and evaluate tests of fit and of regression coefficients in a simple linear regression

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO4: Describe the use of analysis of variance (ANOVA) in regression analysis, interpret ANOVA results, and calculate and interpret the standard error of estimate in a simple linear regression

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO5: Calculate and interpret the predicted value for the dependent variable, and a prediction interval for it, given an estimated linear regression model and a value for the independent variable

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO6: Describe different functional forms of simple linear regressions

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

Core Concepts Summary (80/20 Principle)

Must-Know Concepts

Quick Reference Table

Comprehensive Formula Sheet

Essential Formulas

HP 12C Calculator Sequences

Practice Problems

Basic Level (Understanding)

Intermediate Level (Application)

Advanced Level (Analysis)

DeFi Applications & Real-World Examples

Traditional Finance Context

DeFi Parallels

Case Studies

Common Pitfalls & Exam Tips

Frequent Mistakes

Exam Strategy

Key Takeaways

Essential Points

Memory Aids

Cross-References & Additional Resources

Related Topics

Source Materials

External Resources

Review Checklist

Graph View

Table of Contents

Backlinks