Parametric and Non-Parametric Tests of Independence

Learning Objectives Coverage

LO1: Explain parametric and nonparametric tests of the hypothesis that the population correlation coefficient equals zero, and determine whether the hypothesis is rejected at a given level of significance

Core Concept

This topic applies the hypothesis testing framework specifically to questions about relationships between variables. Tests of independence determine whether two variables have a statistically significant relationship, using either parametric (distribution-based) or nonparametric (rank-based) methods. The correlation concepts from Topic 3 are now placed into a formal inferential framework, enabling us to distinguish genuine relationships from noise.

Formulas & Calculations

  • Pearson correlation: r = sXY/(sX × sY) formula
  • Test statistic: t = r√(n-2)/√(1-r²) with df = n-2 formula exam-focus
  • Spearman rank: rs = 1 - (6Σdi²)/(n(n²-1)) formula
  • HP 12C steps:
    Correlation t-test:
    [r] ENTER [n] 2 - √x ×
    1 [r] x² - √x ÷
    

Practical Examples

  • Traditional Finance Example: Testing correlation between two mutual funds
    • Sample: 36 monthly returns, r = 0.43
    • t = 0.43√(36-2)/√(1-0.43²) = 2.77
    • Critical value at 5%: ±2.032
    • Decision: Reject H₀, significant correlation exists
  • Interpretation: Funds move together, important for diversification decisions

DeFi Application

  • Protocol example: Testing correlation between ETH and DeFi token returns
  • Implementation: Use Spearman rank due to non-normal crypto returns
  • Advantages/Challenges: High volatility and outliers make nonparametric tests more reliable

LO2: Explain tests of independence based on contingency table data

Core Concept

  • Definition: Chi-square test analyzes relationships between categorical variables using contingency tables
  • Why it matters: Evaluates dependencies between discrete outcomes like investment decisions, risk categories, or protocol types
  • Key components: Observed frequencies, expected frequencies, chi-square statistic, degrees of freedom

Formulas & Calculations

  • Chi-square statistic: χ² = Σ[(Oij - Eij)²/Eij] formula exam-focus
  • Expected frequency: Eij = (Row i total × Column j total)/Grand total formula
  • Degrees of freedom: df = (r-1)(c-1)
  • Standardized residual: (Oij - Eij)/√Eij
  • HP 12C steps: Manual calculation required for each cell

Practical Examples

  • Traditional Finance Example: ETF classification analysis
    • 1,594 ETFs classified by size and investment type
    • 3×3 contingency table (large/mid/small × growth/blend/value)
    • χ² = 32.08 with df = 4
    • Critical value at 5%: 9.488
    • Decision: Reject independence, size and style are related
  • Interpretation: Investment style depends on market cap focus

DeFi Application

  • Protocol example: Testing relationship between protocol type (DEX/Lending/Yield) and risk level (Low/Medium/High)
  • Implementation: Create contingency table from protocol classifications
  • Advantages/Challenges: On-chain transparency allows complete population analysis

Core Concepts Summary (80/20 Principle)

Must-Know Concepts

  1. Pearson Correlation: Measures linear relationship, assumes normality
  2. Spearman Rank: Robust to outliers, works with non-normal data
  3. Chi-Square Test: Tests independence for categorical variables
  4. Test Selection: Parametric when assumptions met, nonparametric when robust needed

Quick Reference Table

Test TypeData TypeAssumptionTest StatisticDeFi Use Case
PearsonContinuousNormalt = r√(n-2)/√(1-r²)Stable token correlations
SpearmanRanked/OrdinalNoners formulaVolatile token analysis
Chi-squareCategoricalIndependenceχ² = Σ(O-E)²/EProtocol type vs risk
ContingencyDiscreteRandom sampleSame as χ²Wallet behavior patterns

Comprehensive Formula Sheet

Essential Formulas

Pearson Correlation Coefficient:
r = Σ[(xi - x̄)(yi - ȳ)]/√[Σ(xi - x̄)²Σ(yi - ȳ)²]
Alternative: r = sXY/(sX × sY)
Where: sXY = covariance, sX, sY = standard deviations
Used for: Linear relationship between normally distributed variables

Correlation t-test:
t = r√(n-2)/√(1-r²)
df = n - 2
Where: r = sample correlation, n = sample size
Used for: Testing H₀: ρ = 0

Spearman Rank Correlation:
rs = 1 - (6Σdi²)/(n(n²-1))
Where: di = difference in ranks for observation i
Used for: Non-normal data or ordinal variables

Chi-Square Test of Independence:
χ² = ΣΣ[(Oij - Eij)²/Eij]
Expected: Eij = (Row i total × Column j total)/Grand total
df = (r-1)(c-1)
Used for: Testing independence of categorical variables

Standardized Residual:
zij = (Oij - Eij)/√Eij
Used for: Identifying which cells contribute most to χ²

HP 12C Calculator Sequences

Pearson Correlation t-statistic:
RPN Steps: [r] ENTER [n] 2 - √x × 1 [r] x² - √x ÷
Example: 0.43 ENTER 36 2 - √x × 1 0.43 x² - √x ÷ = 2.77

Spearman Rank (manual):
1. Rank X values: smallest = 1
2. Rank Y values: smallest = 1
3. Calculate di = rank(Xi) - rank(Yi)
4. Square each di
5. Sum all di²
6. Apply formula: 1 - (6×sum)/(n×(n²-1))

Chi-square cell calculation:
RPN Steps: [O] ENTER [E] - x² [E] ÷
Example: 425 ENTER 400 - x² 400 ÷ = 1.5625

Practice Problems

Basic Level (Understanding)

  1. Problem: Test correlation between two assets with r = 0.35, n = 25
    • Given: r = 0.35, n = 25, α = 5% (two-tailed)
    • Find: Test statistic and decision
    • Solution:
      • t = 0.35√(25-2)/√(1-0.35²) = 0.35×4.796/0.937 = 1.79
      • Critical values: ±2.069 (df = 23)
      • |1.79| < 2.069
    • Answer: Fail to reject H₀; correlation not significant at 5%

Intermediate Level (Application)

  1. Problem: Compare Pearson and Spearman for crypto returns with outliers
    • Given: 30 daily returns, Pearson r = 0.65, Spearman rs = 0.45
    • Find: Which correlation is more appropriate and test significance
    • Solution:
      • Outliers present → Spearman more appropriate
      • t = 0.45√(30-2)/√(1-0.45²) = 2.66
      • Critical value at 5%: ±2.048
    • Answer: Spearman shows significant correlation; more reliable with outliers

Advanced Level (Analysis)

  1. Problem: Analyze DeFi protocol categorization (3×4 contingency table)
    • Given:
      • Rows: Protocol type (DEX/Lending/Derivatives)
      • Columns: TVL quartiles (Q1/Q2/Q3/Q4)
      • 500 total protocols
    • Find: Test independence and identify patterns
    • Solution:
      • Calculate expected frequencies for each cell
      • Compute χ² statistic
      • df = (3-1)(4-1) = 6
      • Critical value at 5%: 12.592
      • Calculate standardized residuals
    • Answer: If χ² > 12.592, protocol type and TVL are dependent

DeFi Applications & Real-World Examples

Traditional Finance Context

  • Institution Example: Portfolio managers test correlations for diversification benefits
  • Market Application: Risk managers use contingency tables for stress test scenarios
  • Historical Case: 2008 crisis revealed hidden correlations missed by normal-period analysis

DeFi Parallels

  • Protocol Implementation: Yearn Finance uses correlation analysis for vault strategy selection
  • Smart Contract Logic: Risk assessment protocols categorize positions using contingency analysis
  • Advantages: Complete transaction history enables robust correlation estimates
  • Limitations: Short history and regime changes affect correlation stability

Case Studies

  1. Case 1: LP Position Risk Classification defi-application
    • Background: Categorizing Uniswap v3 positions by concentration and impermanent loss
    • Analysis: Chi-square test on 2×3 table (concentrated/wide × low/medium/high IL)
    • Outcomes: Strong dependence found (χ² = 45.3, p < 0.001)
    • Lessons learned: Position width significantly affects IL risk profile

Common Pitfalls & Exam Tips

Frequent Mistakes

  • Mistake 1: Using Pearson with non-normal data - Check distribution first
  • Mistake 2: Wrong df for chi-square - Remember (r-1)(c-1) not r×c
  • Mistake 3: Interpreting correlation as causation - Correlation ≠ causation

Exam Strategy

  • Time management: 3-4 minutes for correlation tests, 5-6 for contingency tables
  • Question patterns: Often asks to choose between parametric/nonparametric
  • Quick checks: Spearman values always between -1 and 1

Key Takeaways

Essential Points

✓ Pearson tests linear relationships assuming normality ✓ Spearman uses ranks, robust to outliers and non-normality ✓ Chi-square tests independence of categorical variables ✓ Expected frequencies = (row total × column total)/grand total ✓ Choice of test depends on data characteristics and assumptions

Memory Aids

  • Mnemonic: “PRSC” - Pearson Regular, Spearman Ranks, Chi-square Categories
  • Visual: Contingency table with observed over expected in each cell
  • Analogy: Correlation like dance partners - moving together (positive) or opposite (negative)

Cross-References & Additional Resources

Source Materials

  • Primary Reading: Volume 1, Chapter 9, Pages 1-24
  • Key Sections: Correlation tests (p.5-12), Contingency tables (p.15-20)
  • Practice Questions: End-of-chapter problems 1-15

External Resources

  • Videos: StatQuest’s “Pearson vs Spearman Correlation”
  • Articles: “A Guide to Appropriate Use of Correlation” - BMJ Statistics
  • Tools: Python pandas.DataFrame.corr(method=‘spearman’), R’s chisq.test()

Review Checklist

Before moving on, ensure you can:

  • Calculate and test Pearson correlation coefficient
  • Apply Spearman rank correlation for non-normal data
  • Construct contingency tables and calculate expected frequencies
  • Perform chi-square test of independence
  • Choose appropriate test based on data characteristics