Parametric and Non-Parametric Tests of Independence
Learning Objectives Coverage
LO1: Explain parametric and nonparametric tests of the hypothesis that the population correlation coefficient equals zero, and determine whether the hypothesis is rejected at a given level of significance
Core Concept
This topic applies the hypothesis testing framework specifically to questions about relationships between variables. Tests of independence determine whether two variables have a statistically significant relationship, using either parametric (distribution-based) or nonparametric (rank-based) methods. The correlation concepts from Topic 3 are now placed into a formal inferential framework, enabling us to distinguish genuine relationships from noise.
Formulas & Calculations
- Pearson correlation: r = sXY/(sX × sY) formula
- Test statistic: t = r√(n-2)/√(1-r²) with df = n-2 formula exam-focus
- Spearman rank: rs = 1 - (6Σdi²)/(n(n²-1)) formula
- HP 12C steps:
Correlation t-test: [r] ENTER [n] 2 - √x × 1 [r] x² - √x ÷
Practical Examples
- Traditional Finance Example: Testing correlation between two mutual funds
- Sample: 36 monthly returns, r = 0.43
- t = 0.43√(36-2)/√(1-0.43²) = 2.77
- Critical value at 5%: ±2.032
- Decision: Reject H₀, significant correlation exists
- Interpretation: Funds move together, important for diversification decisions
DeFi Application
- Protocol example: Testing correlation between ETH and DeFi token returns
- Implementation: Use Spearman rank due to non-normal crypto returns
- Advantages/Challenges: High volatility and outliers make nonparametric tests more reliable
LO2: Explain tests of independence based on contingency table data
Core Concept
- Definition: Chi-square test analyzes relationships between categorical variables using contingency tables
- Why it matters: Evaluates dependencies between discrete outcomes like investment decisions, risk categories, or protocol types
- Key components: Observed frequencies, expected frequencies, chi-square statistic, degrees of freedom
Formulas & Calculations
- Chi-square statistic: χ² = Σ[(Oij - Eij)²/Eij] formula exam-focus
- Expected frequency: Eij = (Row i total × Column j total)/Grand total formula
- Degrees of freedom: df = (r-1)(c-1)
- Standardized residual: (Oij - Eij)/√Eij
- HP 12C steps: Manual calculation required for each cell
Practical Examples
- Traditional Finance Example: ETF classification analysis
- 1,594 ETFs classified by size and investment type
- 3×3 contingency table (large/mid/small × growth/blend/value)
- χ² = 32.08 with df = 4
- Critical value at 5%: 9.488
- Decision: Reject independence, size and style are related
- Interpretation: Investment style depends on market cap focus
DeFi Application
- Protocol example: Testing relationship between protocol type (DEX/Lending/Yield) and risk level (Low/Medium/High)
- Implementation: Create contingency table from protocol classifications
- Advantages/Challenges: On-chain transparency allows complete population analysis
Core Concepts Summary (80/20 Principle)
Must-Know Concepts
- Pearson Correlation: Measures linear relationship, assumes normality
- Spearman Rank: Robust to outliers, works with non-normal data
- Chi-Square Test: Tests independence for categorical variables
- Test Selection: Parametric when assumptions met, nonparametric when robust needed
Quick Reference Table
| Test Type | Data Type | Assumption | Test Statistic | DeFi Use Case |
|---|---|---|---|---|
| Pearson | Continuous | Normal | t = r√(n-2)/√(1-r²) | Stable token correlations |
| Spearman | Ranked/Ordinal | None | rs formula | Volatile token analysis |
| Chi-square | Categorical | Independence | χ² = Σ(O-E)²/E | Protocol type vs risk |
| Contingency | Discrete | Random sample | Same as χ² | Wallet behavior patterns |
Comprehensive Formula Sheet
Essential Formulas
Pearson Correlation Coefficient:
r = Σ[(xi - x̄)(yi - ȳ)]/√[Σ(xi - x̄)²Σ(yi - ȳ)²]
Alternative: r = sXY/(sX × sY)
Where: sXY = covariance, sX, sY = standard deviations
Used for: Linear relationship between normally distributed variables
Correlation t-test:
t = r√(n-2)/√(1-r²)
df = n - 2
Where: r = sample correlation, n = sample size
Used for: Testing H₀: ρ = 0
Spearman Rank Correlation:
rs = 1 - (6Σdi²)/(n(n²-1))
Where: di = difference in ranks for observation i
Used for: Non-normal data or ordinal variables
Chi-Square Test of Independence:
χ² = ΣΣ[(Oij - Eij)²/Eij]
Expected: Eij = (Row i total × Column j total)/Grand total
df = (r-1)(c-1)
Used for: Testing independence of categorical variables
Standardized Residual:
zij = (Oij - Eij)/√Eij
Used for: Identifying which cells contribute most to χ²
HP 12C Calculator Sequences
Pearson Correlation t-statistic:
RPN Steps: [r] ENTER [n] 2 - √x × 1 [r] x² - √x ÷
Example: 0.43 ENTER 36 2 - √x × 1 0.43 x² - √x ÷ = 2.77
Spearman Rank (manual):
1. Rank X values: smallest = 1
2. Rank Y values: smallest = 1
3. Calculate di = rank(Xi) - rank(Yi)
4. Square each di
5. Sum all di²
6. Apply formula: 1 - (6×sum)/(n×(n²-1))
Chi-square cell calculation:
RPN Steps: [O] ENTER [E] - x² [E] ÷
Example: 425 ENTER 400 - x² 400 ÷ = 1.5625
Practice Problems
Basic Level (Understanding)
- Problem: Test correlation between two assets with r = 0.35, n = 25
- Given: r = 0.35, n = 25, α = 5% (two-tailed)
- Find: Test statistic and decision
- Solution:
- t = 0.35√(25-2)/√(1-0.35²) = 0.35×4.796/0.937 = 1.79
- Critical values: ±2.069 (df = 23)
- |1.79| < 2.069
- Answer: Fail to reject H₀; correlation not significant at 5%
Intermediate Level (Application)
- Problem: Compare Pearson and Spearman for crypto returns with outliers
- Given: 30 daily returns, Pearson r = 0.65, Spearman rs = 0.45
- Find: Which correlation is more appropriate and test significance
- Solution:
- Outliers present → Spearman more appropriate
- t = 0.45√(30-2)/√(1-0.45²) = 2.66
- Critical value at 5%: ±2.048
- Answer: Spearman shows significant correlation; more reliable with outliers
Advanced Level (Analysis)
- Problem: Analyze DeFi protocol categorization (3×4 contingency table)
- Given:
- Rows: Protocol type (DEX/Lending/Derivatives)
- Columns: TVL quartiles (Q1/Q2/Q3/Q4)
- 500 total protocols
- Find: Test independence and identify patterns
- Solution:
- Calculate expected frequencies for each cell
- Compute χ² statistic
- df = (3-1)(4-1) = 6
- Critical value at 5%: 12.592
- Calculate standardized residuals
- Answer: If χ² > 12.592, protocol type and TVL are dependent
- Given:
DeFi Applications & Real-World Examples
Traditional Finance Context
- Institution Example: Portfolio managers test correlations for diversification benefits
- Market Application: Risk managers use contingency tables for stress test scenarios
- Historical Case: 2008 crisis revealed hidden correlations missed by normal-period analysis
DeFi Parallels
- Protocol Implementation: Yearn Finance uses correlation analysis for vault strategy selection
- Smart Contract Logic: Risk assessment protocols categorize positions using contingency analysis
- Advantages: Complete transaction history enables robust correlation estimates
- Limitations: Short history and regime changes affect correlation stability
Case Studies
- Case 1: LP Position Risk Classification defi-application
- Background: Categorizing Uniswap v3 positions by concentration and impermanent loss
- Analysis: Chi-square test on 2×3 table (concentrated/wide × low/medium/high IL)
- Outcomes: Strong dependence found (χ² = 45.3, p < 0.001)
- Lessons learned: Position width significantly affects IL risk profile
Common Pitfalls & Exam Tips
Frequent Mistakes
- Mistake 1: Using Pearson with non-normal data - Check distribution first
- Mistake 2: Wrong df for chi-square - Remember (r-1)(c-1) not r×c
- Mistake 3: Interpreting correlation as causation - Correlation ≠ causation
Exam Strategy
- Time management: 3-4 minutes for correlation tests, 5-6 for contingency tables
- Question patterns: Often asks to choose between parametric/nonparametric
- Quick checks: Spearman values always between -1 and 1
Key Takeaways
Essential Points
✓ Pearson tests linear relationships assuming normality ✓ Spearman uses ranks, robust to outliers and non-normality ✓ Chi-square tests independence of categorical variables ✓ Expected frequencies = (row total × column total)/grand total ✓ Choice of test depends on data characteristics and assumptions
Memory Aids
- Mnemonic: “PRSC” - Pearson Regular, Spearman Ranks, Chi-square Categories
- Visual: Contingency table with observed over expected in each cell
- Analogy: Correlation like dance partners - moving together (positive) or opposite (negative)
Cross-References & Additional Resources
Related Topics
- Prerequisite: Hypothesis Testing (Topic 8) for test framework
- Related: Simple Linear Regression (Topic 10) extends correlation to prediction
- Advanced: Big Data Techniques (Topic 11) for large-scale correlation analysis
Source Materials
- Primary Reading: Volume 1, Chapter 9, Pages 1-24
- Key Sections: Correlation tests (p.5-12), Contingency tables (p.15-20)
- Practice Questions: End-of-chapter problems 1-15
External Resources
- Videos: StatQuest’s “Pearson vs Spearman Correlation”
- Articles: “A Guide to Appropriate Use of Correlation” - BMJ Statistics
- Tools: Python pandas.DataFrame.corr(method=‘spearman’), R’s chisq.test()
Review Checklist
Before moving on, ensure you can:
- Calculate and test Pearson correlation coefficient
- Apply Spearman rank correlation for non-normal data
- Construct contingency tables and calculate expected frequencies
- Perform chi-square test of independence
- Choose appropriate test based on data characteristics