Simple Linear Regression
Learning Objectives Coverage
LO1: Describe a simple linear regression model, how the least squares criterion is used to estimate regression coefficients, and the interpretation of these coefficients
Core Concept
Simple linear regression extends the correlation analysis from Topic 9 into a predictive framework. Where correlation tells us whether and how strongly two variables are related, regression tells us the specific nature of that relationship and allows us to make predictions. The model Y = b₀ + b₁X + ε forms the basis for beta estimation in equity analysis, factor models in portfolio management, and yield curve modeling in fixed income.
Key components:
- Dependent variable (Y): The outcome being predicted
- Independent variable (X): The predictor variable
- Intercept (b₀): Y-value when X equals zero
- Slope (b₁): Change in Y for one-unit change in X
- Error term (ε): Unexplained variation
Formulas & Calculations
- Main formula: Y = b₀ + b₁X + ε formula exam-focus
- HP 12C steps:
- Enter paired data points using statistics mode
- Press [f] [REG] to calculate regression coefficients
- Display b₁: [RCL] [3], Display b₀: [RCL] [4]
- Common variations: Standardized coefficients, natural log transformations
Practical Examples
- Traditional Finance Example: Predicting stock returns (Y) based on market returns (X)
- Given: Stock returns = 2% + 1.2 × Market returns
- If market return = 5%, predicted stock return = 2% + 1.2(5%) = 8%
- Calculation walkthrough: Least squares minimizes Σε² = Σ(Y - Ŷ)²
- Interpretation: Slope coefficient represents systematic risk (beta), intercept represents alpha
DeFi Application
- Protocol example: Predicting Uniswap V3 TVL based on ETH price movements defi-application
- Implementation: Smart contracts can use linear regression oracles for dynamic fee adjustments
- Advantages/Challenges: Real-time data availability vs. gas costs for on-chain calculations
LO2: Explain the assumptions underlying the simple linear regression model, and describe how residuals and residual plots indicate if these assumptions may have been violated
Core Concept
- Definition: Four critical assumptions must hold for valid regression results: linearity, homoscedasticity, independence, and normality
- Why it matters: Violated assumptions lead to biased estimates and invalid statistical tests
- Key components:
- Linearity: Relationship is truly linear
- Homoscedasticity: Constant error variance
- Independence: Errors are uncorrelated
- Normality: Errors follow normal distribution
Formulas & Calculations
- Residual calculation: e = Y - Ŷ = Y - (b₀ + b₁X)
- Durbin-Watson test: DW = Σ(eₜ - eₜ₋₁)² / Σeₜ²
- HP 12C steps: Calculate residuals by subtracting predicted values from actual values
- Common variations: Breusch-Pagan test for homoscedasticity, Jarque-Bera test for normality
Practical Examples
- Traditional Finance Example: Testing beta stability in CAPM model
- Calculation walkthrough: Plot residuals vs. fitted values to check for patterns
- Interpretation: Random scatter indicates valid assumptions; patterns suggest violations
DeFi Application
- Protocol example: Validating assumptions in yield farming return predictions
- Implementation: Automated residual analysis in DeFi analytics platforms
- Advantages/Challenges: Continuous monitoring vs. computational complexity
LO3: Calculate and interpret measures of fit and formulate and evaluate tests of fit and of regression coefficients in a simple linear regression
Core Concept
- Definition: Measures of fit assess how well the regression model explains the variation in the dependent variable
- Why it matters: Determines model reliability for investment decisions and risk management
- Key components:
- R²: Coefficient of determination
- Adjusted R²: R² adjusted for degrees of freedom
- F-statistic: Overall model significance
- t-statistics: Individual coefficient significance
Formulas & Calculations
- R² formula: R² = SSR/SST = 1 - (SSE/SST) formula exam-focus
- SST = Σ(Y - Ȳ)² (Total Sum of Squares)
- SSR = Σ(Ŷ - Ȳ)² (Regression Sum of Squares)
- SSE = Σ(Y - Ŷ)² (Error Sum of Squares)
- F-statistic: F = MSR/MSE = [SSR/1] / [SSE/(n-2)]
- t-statistic: t = (b₁ - 0) / sb₁
- HP 12C steps: Use correlation function [f] [REG], square result for R²
Practical Examples
- Traditional Finance Example: Portfolio beta has R² = 0.64, meaning 64% of return variation is explained by market movements
- Calculation walkthrough: F-test determines if the relationship is statistically significant
- Interpretation: Higher R² indicates better fit, but consider overfitting risks
DeFi Application
- Protocol example: Measuring goodness of fit in automated market maker price prediction models
- Implementation: Dynamic R² monitoring for rebalancing strategies
- Advantages/Challenges: Real-time model validation vs. gas optimization
LO4: Describe the use of analysis of variance (ANOVA) in regression analysis, interpret ANOVA results, and calculate and interpret the standard error of estimate in a simple linear regression
Core Concept
- Definition: ANOVA decomposes total variation into explained (regression) and unexplained (error) components
- Why it matters: Provides framework for testing overall model significance and quantifying prediction accuracy
- Key components:
- Total variation (SST)
- Explained variation (SSR)
- Unexplained variation (SSE)
- Standard error of estimate (SEE)
Formulas & Calculations
- ANOVA identity: SST = SSR + SSE formula exam-focus
- Standard error of estimate: SEE = √[SSE/(n-2)] formula
- Mean square regression: MSR = SSR/1
- Mean square error: MSE = SSE/(n-2)
- HP 12C steps: Calculate SEE using standard deviation function on residuals
Practical Examples
- Traditional Finance Example: ANOVA table shows F = 25.6, p < 0.001, indicating significant relationship
- Calculation walkthrough: SEE = 2.50 of actual values
- Interpretation: Lower SEE indicates more precise predictions
DeFi Application
- Protocol example: ANOVA analysis of liquidity provision returns in automated market makers
- Implementation: Standard error bands for impermanent loss prediction
- Advantages/Challenges: Continuous recalibration vs. transaction costs
LO5: Calculate and interpret the predicted value for the dependent variable, and a prediction interval for it, given an estimated linear regression model and a value for the independent variable
Core Concept
- Definition: Prediction intervals provide range estimates for future observations, accounting for both parameter uncertainty and inherent variability
- Why it matters: Enables risk-aware decision making in volatile DeFi and traditional markets
- Key components:
- Point prediction: Single best estimate
- Prediction interval: Range with specified confidence level
- Confidence interval: Range for mean response
- Standard error of prediction
Formulas & Calculations
- Point prediction: Ŷ = b₀ + b₁X
- Prediction interval: Ŷ ± t(α/2,n-2) × SEE × √[1 + 1/n + (X-X̄)²/Σ(X-X̄)²] formula exam-focus
- Confidence interval for mean: Ŷ ± t(α/2,n-2) × SEE × √[1/n + (X-X̄)²/Σ(X-X̄)²] formula
- HP 12C steps: Use normal distribution for large samples, t-distribution for small samples
Practical Examples
- Traditional Finance Example: Predict stock price with 95% confidence: 4.20
- Calculation walkthrough: Wider intervals for X values far from mean
- Interpretation: Prediction intervals are wider than confidence intervals
DeFi Application
- Protocol example: Predicting token price ranges for automated rebalancing
- Implementation: Dynamic prediction intervals in yield optimization strategies
- Advantages/Challenges: Adaptive confidence levels vs. computational overhead
LO6: Describe different functional forms of simple linear regressions
Core Concept
- Definition: Transformations of variables to capture non-linear relationships while maintaining linear regression framework
- Why it matters: Many financial relationships are non-linear but can be linearized through transformations
- Key components:
- Log-linear models
- Linear-log models
- Log-log models
- Polynomial terms
Formulas & Calculations
- Log-linear: ln(Y) = b₀ + b₁X (exponential growth)
- Linear-log: Y = b₀ + b₁ln(X) (diminishing returns)
- Log-log: ln(Y) = b₀ + b₁ln(X) (elasticity model)
- HP 12C steps: Use [LN] function to transform variables before regression
Practical Examples
- Traditional Finance Example: Stock returns vs. log of market cap (size effect)
- Calculation walkthrough: ln(Price) = 2.5 + 0.3×Time captures compound growth
- Interpretation: Log transformations often stabilize variance and linearize relationships
DeFi Application
- Protocol example: Log-log model for liquidity vs. trading volume in AMMs
- Implementation: Transformed variables in yield curve modeling
- Advantages/Challenges: Better model fit vs. interpretation complexity
Core Concepts Summary (80/20 Principle)
Must-Know Concepts
- Least Squares Estimation: Minimizes sum of squared residuals to find best-fit line
- R² Interpretation: Proportion of variation in Y explained by X
- Assumption Violations: Check residual plots for patterns indicating problems
- Statistical Significance: Use t-tests for coefficients, F-test for overall model
- Prediction vs. Confidence Intervals: Prediction intervals are wider due to additional uncertainty
Quick Reference Table
| Concept | Formula | When to Use | DeFi Equivalent |
|---|---|---|---|
| Simple Regression | Y = b₀ + b₁X + ε | Linear relationships | Token price vs. TVL |
| R² | SSR/SST | Model fit assessment | AMM efficiency metrics |
| F-statistic | MSR/MSE | Overall significance | Protocol risk models |
| Prediction Interval | Ŷ ± t×SEE×√[…] | Forecasting | Yield range prediction |
Comprehensive Formula Sheet
Essential Formulas
Formula 1: Simple Linear Regression
Y = b₀ + b₁X + ε
Where: Y = dependent variable, X = independent variable,
b₀ = intercept, b₁ = slope, ε = error term
Used for: Modeling linear relationships
Formula 2: Least Squares Coefficients
b₁ = Σ[(X - X̄)(Y - Ȳ)] / Σ[(X - X̄)²]
b₀ = Ȳ - b₁X̄
Where: X̄, Ȳ = sample means
Used for: Coefficient estimation
Formula 3: Coefficient of Determination
R² = SSR/SST = 1 - (SSE/SST)
Where: SST = total sum of squares, SSR = regression sum of squares,
SSE = error sum of squares
Used for: Measuring goodness of fit
Formula 4: F-statistic
F = MSR/MSE = [SSR/1] / [SSE/(n-2)]
Where: MSR = mean square regression, MSE = mean square error
Used for: Testing overall model significance
Formula 5: Standard Error of Estimate
SEE = √[SSE/(n-2)]
Where: n = sample size
Used for: Measuring prediction accuracy
Formula 6: Prediction Interval
Ŷ ± t(α/2,n-2) × SEE × √[1 + 1/n + (X-X̄)²/Σ(X-X̄)²]
Used for: Forecasting with uncertainty bounds
HP 12C Calculator Sequences
Operation 1: Linear Regression Setup
RPN Steps: [f] [CLx], enter data pairs [ENTER] [Σ+], [f] [REG]
Example: Calculate slope and intercept from data points
Operation 2: Correlation Coefficient
RPN Steps: [f] [REG], [RCL] [7] (displays correlation)
Example: r = 0.85 indicates strong positive relationship
Operation 3: Prediction Calculation
RPN Steps: [RCL] [4], X value [ENTER], [RCL] [3], [×], [+]
Example: Predict Y when X = 10 using stored coefficients
Practice Problems
Basic Level (Understanding)
- Problem: A regression of stock returns (Y) on market returns (X) yields: Y = 0.02 + 1.3X. The R² = 0.56.
- Given: Regression equation and R²
- Find: Interpret the coefficients and R²
- Solution:
- Intercept (0.02): Stock has 2% expected return when market return is 0%
- Slope (1.3): For each 1% increase in market return, stock return increases 1.3%
- R² (0.56): 56% of stock return variation is explained by market movements
- Answer: The stock has above-market sensitivity (beta > 1) and modest explanatory power
Intermediate Level (Application)
- Problem: A DeFi protocol’s TVL (Y, in millions) is regressed against token price (X, in dollars): Y = 50 + 15X, SEE = $25M, n = 30.
- Given: Regression equation, standard error, sample size
- Find: 95% prediction interval when token price = $10
- Solution:
- Point prediction: Ŷ = 50 + 15(10) = $200M
- t₀.₀₂₅,₂₈ ≈ 2.048
- Prediction interval: 200 ± 2.048 × 25 × √[1 + 1/30 + (10-X̄)²/Σ(X-X̄)²]
- Assuming standard terms: 200 ± 51.2
- Answer: TVL prediction ranges from 251.2M with 95% confidence
Advanced Level (Analysis)
- Problem: Analyze a yield farming return model using log transformations. Original model: ln(Yield) = 2.5 + 0.8×ln(Risk), R² = 0.72, F = 45.6
- Given: Log-log regression with goodness-of-fit measures
- Find: Interpret the elasticity coefficient and evaluate model adequacy
- Solution:
- Elasticity interpretation: 1% increase in risk leads to 0.8% increase in yield
- Model fit: 72% of yield variation explained by risk
- F-test: Highly significant relationship (F = 45.6 >> F₀.₀₅,₁,₂₈ ≈ 4.2)
- Economic meaning: Diminishing returns to risk-taking
- Answer: Model shows strong risk-return relationship with less than proportional yield increases for additional risk, consistent with efficient market theory
DeFi Applications & Real-World Examples
Traditional Finance Context
- Institution Example: Banks use regression to model loan default rates based on credit scores
- Market Application: Beta estimation for portfolio risk management
- Historical Case: CAPM validation studies using market index regression
DeFi Parallels
- Protocol Implementation: Compound protocol uses regression-based interest rate models defi-application
- Smart Contract Logic: Automated market makers employ regression for price discovery
- Advantages: Real-time recalibration, transparent algorithms, 24/7 operation
- Limitations: Gas costs, oracle dependencies, model risk in volatile markets
Case Studies
- Case 1: Uniswap V3 Liquidity Prediction
- Background: AMM needs to predict optimal liquidity ranges
- Analysis: Regression of trading volume on price volatility and TVL
- Outcomes: Improved capital efficiency through dynamic range adjustment
- Lessons learned: Non-linear relationships require careful transformation
Common Pitfalls & Exam Tips
Frequent Mistakes
- Mistake 1: Confusing correlation with causation - regression shows association, not causation
- Mistake 2: Ignoring assumption violations - always check residual plots
- Mistake 3: Over-interpreting R² - high R² doesn’t guarantee good predictions outside sample range
Exam Strategy
- Time management: Allocate 4-5 minutes per regression problem
- Question patterns: Often combined with hypothesis testing and confidence intervals
- Quick checks: Verify R² is between 0 and 1, check units in predictions
Key Takeaways
Essential Points
✓ Simple linear regression models Y = b₀ + b₁X + ε where b₁ represents the marginal effect ✓ R² measures explained variation; higher values indicate better model fit ✓ Four key assumptions: linearity, homoscedasticity, independence, normality ✓ F-test evaluates overall model significance; t-tests evaluate individual coefficients ✓ Prediction intervals are wider than confidence intervals due to additional uncertainty
Memory Aids
- Mnemonic: “LINE” for assumptions (Linearity, Independence, Normality, Equal variance)
- Visual: Scatter plot with best-fit line and residual plots
- Analogy: Regression is like finding the “average” relationship between variables
Cross-References & Additional Resources
Related Topics
- Prerequisite: Statistical Measures of Asset Returns, Hypothesis Testing
- Related: Parametric and Non-Parametric Tests of Independence
- Advanced: Multiple Linear Regression (not in Finance Certification 1)
Source Materials
- Primary Reading: Volume 1, Chapter 10, Simple Linear Regression
- Key Sections: Least squares estimation, assumption testing, ANOVA
- Practice Questions: End-of-chapter problems 1-15
External Resources
- Videos: Khan Academy statistics series on regression
- Articles: “Regression Analysis in Finance” - Finance Research Foundation
- Tools: Excel regression analysis, R statistical software, Python scipy
Review Checklist
Before moving on, ensure you can:
- Explain each learning objective in your own words
- Calculate regression coefficients using least squares method
- Complete ANOVA table and interpret F-statistic
- Check regression assumptions using residual analysis
- Calculate and interpret prediction intervals
- Identify appropriate functional form transformations
- Apply concepts to both traditional finance and DeFi scenarios