Estimation and Inference
Learning Objectives Coverage
LO1: Compare and contrast simple random, stratified random, cluster, convenience, and judgmental sampling and their implications for sampling error in an investment problem
Core Concept
Sampling methods are techniques for selecting a subset of observations from a population to make inferences about the entire population. Proper sampling methodology ensures valid statistical inference and reduces sampling error in investment analysis. The key distinction is between probability sampling (where every member of the population has a known, non-zero chance of selection) and non-probability sampling (where selection depends on researcher judgment).
Formulas & Calculations
- Sampling Error: Difference between observed statistic value and the quantity it estimates
- Sample size determination: Larger samples reduce sampling error proportionally to 1/√n
- HP 12C steps: Not directly applicable for sampling method selection
Practical Examples
- Traditional Finance Example: Bond index construction using stratified sampling
- Divide bonds by issuer type (agency, Treasury, corporate)
- Further stratify by maturity intervals (10 buckets) and coupon rates (2 levels)
- Creates 60 strata (3 × 10 × 2) requiring minimum 60 issues
- Calculation walkthrough: Select proportional samples from each stratum based on market value weights
- Interpretation: Ensures all bond characteristics are represented in the portfolio
DeFi Application
- Protocol example: Uniswap v3 liquidity provider analysis
- Implementation: Stratify LPs by position size, fee tier, and price range concentration
- Advantages/Challenges: On-chain data provides complete population access vs. traditional sampling limitations
LO2: Explain the central limit theorem and its importance for the distribution and standard error of the sample mean
Core Concept
The Central Limit Theorem (CLT) is arguably the most important result in statistics for financial analysts. It states that for any population with finite variance, the sampling distribution of the sample mean approaches a normal distribution as sample size increases — regardless of the shape of the underlying population distribution. This is what justifies using z-tests and t-tests in hypothesis testing, even when the original data is skewed or kurtotic (as is typical for asset returns discussed in Topic 3). The CLT applies reliably when n >= 30. exam-focus
Formulas & Calculations
- Main formula: Standard Error = σ/√n (population σ known) or s/√n (population σ unknown) formula exam-focus
- HP 12C steps: hp12c
Standard Error calculation: [σ or s] ENTER [n] √x ÷ - Common variations: Finite population correction when sampling > 5% of population
Practical Examples
- Traditional Finance Example: Euro-Asia-Africa Equity Index analysis
- Population: 1,258 daily returns, mean = 0.035%
- Sample of 30: Standard error = 1.26%/√30 = 0.23%
- Sample of 100: Standard error = 1.26%/√100 = 0.126%
- Calculation walkthrough: As sample size increases from 30 to 100, standard error decreases by 45%
- Interpretation: Larger samples provide more precise estimates of population parameters
DeFi Application
- Protocol example: Analyzing Aave lending rates across different market conditions defi-application
- Implementation: Calculate average rates from hourly snapshots, apply CLT for confidence intervals
- Advantages/Challenges: High-frequency on-chain data allows for large sample sizes, improving estimate precision
LO3: Describe the use of resampling (bootstrap, jackknife) to estimate the sampling distribution of a statistic
Core Concept
Resampling methods are computational techniques that repeatedly draw samples from original data to estimate sampling distributions. The bootstrap method (covered in more depth in the simulation topic) draws samples with replacement, while the jackknife method uses a leave-one-out approach. These methods are invaluable when analytical formulas for standard errors do not exist or are too complex to derive.
Formulas & Calculations
- Bootstrap Standard Error: sX̄ = √[1/(B-1) × Σ(θ̂b - θ̄)²]
- B = number of resamples (typically 1,000-10,000)
- θ̂b = statistic from resample b
- θ̄ = mean of all resample statistics
- HP 12C steps: Not directly applicable; requires computational methods
- Common variations: Percentile bootstrap for confidence intervals
Practical Examples
- Traditional Finance Example: Rarely traded stock with 12 monthly returns
- Generate 1,000 bootstrap samples
- Calculate mean for each resample
- Standard error from bootstrap = 0.04408
- Calculation walkthrough: Draw 12 returns with replacement, calculate mean, repeat 1,000 times
- Interpretation: Provides standard error estimate despite small sample size
DeFi Application
- Protocol example: Estimating confidence intervals for impermanent loss in Uniswap pools defi-application
- Implementation: Bootstrap historical price paths to simulate IL distributions
- Advantages/Challenges: Handles non-normal distributions common in crypto markets
Core Concepts Summary (80/20 Principle)
Must-Know Concepts
- Central Limit Theorem: Sample means are normally distributed regardless of population distribution when n ≥ 30
- Standard Error: Measures precision of sample estimates, decreases with √n
- Stratified Sampling: Divides population into homogeneous groups for more representative samples
- Bootstrap: Resampling with replacement to estimate complex statistics’ distributions
Quick Reference Table
| Concept | Formula | When to Use | DeFi Equivalent |
|---|---|---|---|
| Standard Error | σ/√n or s/√n | Estimating sample mean precision | TVL volatility estimates |
| CLT Application | n ≥ 30 for normality | Making population inferences | Protocol usage statistics |
| Stratified Sampling | Proportional allocation | Heterogeneous populations | LP position analysis by size |
| Bootstrap | Resample with replacement | No analytical formula exists | Yield farming return distributions |
Comprehensive Formula Sheet
Essential Formulas
Sample Variance:
s² = Σ(Xi - X̄)² / (n - 1)
Where: Xi = individual observations, X̄ = sample mean, n = sample size
Used for: Estimating population variance from sample data
Standard Error (known σ):
σX̄ = σ / √n
Where: σ = population standard deviation, n = sample size
Used for: Calculating precision of sample mean estimates
Standard Error (unknown σ):
sX̄ = s / √n
Where: s = sample standard deviation, n = sample size
Used for: Real-world applications where population σ is unknown
Bootstrap Standard Error:
sX̄ = √[1/(B-1) × Σ(θ̂b - θ̄)²]
Where: B = number of resamples, θ̂b = resample statistic, θ̄ = mean of resamples
Used for: Estimating standard error of complex statistics
HP 12C Calculator Sequences
Standard Error Calculation:
RPN Steps: [std dev] ENTER [sample size] √x ÷
Example: 15.5 ENTER 25 √x ÷ = 3.1
Sample Variance (manual):
RPN Steps: Use Σ+ key for data entry, then g s² for variance
Example: Clear statistics (f Σ), enter each value followed by Σ+, then g s
Coefficient of Variation:
RPN Steps: [std dev] ENTER [mean] ÷ 100 ×
Example: 3.5 ENTER 25 ÷ 100 × = 14%
Practice Problems
Basic Level (Understanding)
- Problem: Calculate the standard error for a sample of 36 observations with population σ = 12
- Given: n = 36, σ = 12
- Find: Standard error of the sample mean
- Solution: σX̄ = σ/√n = 12/√36 = 12/6 = 2
- Answer: The standard error is 2, meaning sample means typically deviate by 2 units from the population mean
Intermediate Level (Application)
- Problem: A DeFi protocol’s daily returns have mean 0.5% and standard deviation 3%. What’s the standard error for 50-day average returns?
- Given: μ = 0.5%, σ = 3%, n = 50
- Find: Standard error of 50-day average
- Solution:
- σX̄ = 3%/√50 = 3%/7.07 = 0.424%
- By CLT, distribution is approximately normal
- Answer: 0.424% standard error; 95% of 50-day averages fall within ±0.85% of true mean
Advanced Level (Analysis)
- Problem: Compare simple random vs. stratified sampling for a crypto index with 60% large-cap (σ = 2%), 30% mid-cap (σ = 4%), 10% small-cap (σ = 8%)
- Given: Population proportions and volatilities by market cap
- Find: Relative efficiency of stratified sampling
- Solution:
- Simple random: Uses pooled variance
- Stratified: Weighted average of strata variances
- Efficiency gain = reduction in standard error
- Answer: Stratified sampling reduces standard error by approximately 25%, providing more precise estimates
DeFi Applications & Real-World Examples
Traditional Finance Context
- Institution Example: Investment banks use stratified sampling for risk assessment across diverse portfolios
- Market Application: Index fund managers employ sampling techniques to track benchmarks efficiently
- Historical Case: LTCM failure highlighted importance of proper sampling across market conditions
DeFi Parallels
- Protocol Implementation: Compound uses exponentially weighted sampling for interest rate models
- Smart Contract Logic: Chainlink oracles aggregate price samples using robust statistical methods
- Advantages: Complete on-chain data eliminates sampling bias from data availability
- Limitations: Gas costs limit complex resampling implementations on-chain
Case Studies
- Case 1: Yield Farming Strategy Analysis
- Background: Evaluating returns across multiple DeFi protocols
- Analysis: Bootstrap 1,000 portfolio combinations from historical yields
- Outcomes: 95% confidence interval for expected returns: [12%, 28%] APY
- Lessons learned: Resampling captures correlation structures missed by simple averaging
Common Pitfalls & Exam Tips
Frequent Mistakes
- Mistake 1: Using sample standard deviation instead of standard error for inference - Remember to divide by √n
- Mistake 2: Applying CLT with small samples (n < 30) - Results may not be reliable
- Mistake 3: Ignoring stratification benefits in heterogeneous populations - Can significantly improve precision
Exam Strategy
- Time management: 3-4 minutes per sampling question
- Question patterns: Often combined with hypothesis testing in subsequent topics
- Quick checks: Verify standard error decreases as n increases
Key Takeaways
Essential Points
✓ Central Limit Theorem enables normal distribution assumption for large samples (n ≥ 30) ✓ Standard error = σ/√n measures sampling precision ✓ Stratified sampling improves efficiency for heterogeneous populations ✓ Bootstrap provides distribution estimates without analytical formulas ✓ Sampling method choice significantly impacts inference validity
Memory Aids
- Mnemonic: “CLT at 30” - Central Limit Theorem reliable at n = 30+
- Visual: Bell curve emerges from any distribution shape as n increases
- Analogy: Sampling is like tasting soup - stirring (randomization) ensures representative taste
Cross-References & Additional Resources
Related Topics
- Prerequisite: Statistical Measures of Asset Returns (Topic 3)
- Related: Hypothesis Testing (Topic 8) builds directly on these concepts
- Advanced: Big Data Techniques (Topic 11) extends to massive samples
Source Materials
- Primary Reading: Volume 1, Chapter 7, Pages 1-28
- Key Sections: CLT explanation (p.12-15), Bootstrap methods (p.20-24)
- Practice Questions: End-of-chapter problems 1-15
External Resources
- Videos: Khan Academy’s Central Limit Theorem series
- Articles: “Bootstrap Methods and Their Application” by Davison & Hinkley
- Tools: R’s boot package, Python’s scipy.stats.bootstrap
Review Checklist
Before moving on, ensure you can:
- Distinguish between all five sampling methods and their appropriate uses
- Calculate standard error given population or sample standard deviation
- Explain why CLT is crucial for statistical inference
- Describe when to use bootstrap vs. analytical methods
- Apply sampling concepts to DeFi protocol analysis