Estimation and Inference

Learning Objectives Coverage

LO1: Compare and contrast simple random, stratified random, cluster, convenience, and judgmental sampling and their implications for sampling error in an investment problem

Core Concept

Sampling methods are techniques for selecting a subset of observations from a population to make inferences about the entire population. Proper sampling methodology ensures valid statistical inference and reduces sampling error in investment analysis. The key distinction is between probability sampling (where every member of the population has a known, non-zero chance of selection) and non-probability sampling (where selection depends on researcher judgment).

Formulas & Calculations

Sampling Error: Difference between observed statistic value and the quantity it estimates
Sample size determination: Larger samples reduce sampling error proportionally to 1/√n
HP 12C steps: Not directly applicable for sampling method selection

Practical Examples

Traditional Finance Example: Bond index construction using stratified sampling
- Divide bonds by issuer type (agency, Treasury, corporate)
- Further stratify by maturity intervals (10 buckets) and coupon rates (2 levels)
- Creates 60 strata (3 × 10 × 2) requiring minimum 60 issues
Calculation walkthrough: Select proportional samples from each stratum based on market value weights
Interpretation: Ensures all bond characteristics are represented in the portfolio

DeFi Application

Protocol example: Uniswap v3 liquidity provider analysis
Implementation: Stratify LPs by position size, fee tier, and price range concentration
Advantages/Challenges: On-chain data provides complete population access vs. traditional sampling limitations

LO2: Explain the central limit theorem and its importance for the distribution and standard error of the sample mean

Core Concept

The Central Limit Theorem (CLT) is arguably the most important result in statistics for financial analysts. It states that for any population with finite variance, the sampling distribution of the sample mean approaches a normal distribution as sample size increases — regardless of the shape of the underlying population distribution. This is what justifies using z-tests and t-tests in hypothesis testing, even when the original data is skewed or kurtotic (as is typical for asset returns discussed in Topic 3). The CLT applies reliably when n >= 30. exam-focus

Formulas & Calculations

Main formula: Standard Error = σ/√n (population σ known) or s/√n (population σ unknown) formula exam-focus

HP 12C steps: hp12c

Standard Error calculation:
[σ or s] ENTER
[n] √x ÷

Common variations: Finite population correction when sampling > 5% of population

Practical Examples

Traditional Finance Example: Euro-Asia-Africa Equity Index analysis
- Population: 1,258 daily returns, mean = 0.035%
- Sample of 30: Standard error = 1.26%/√30 = 0.23%
- Sample of 100: Standard error = 1.26%/√100 = 0.126%
Calculation walkthrough: As sample size increases from 30 to 100, standard error decreases by 45%
Interpretation: Larger samples provide more precise estimates of population parameters

DeFi Application

Protocol example: Analyzing Aave lending rates across different market conditions defi-application
Implementation: Calculate average rates from hourly snapshots, apply CLT for confidence intervals
Advantages/Challenges: High-frequency on-chain data allows for large sample sizes, improving estimate precision

LO3: Describe the use of resampling (bootstrap, jackknife) to estimate the sampling distribution of a statistic

Core Concept

Resampling methods are computational techniques that repeatedly draw samples from original data to estimate sampling distributions. The bootstrap method (covered in more depth in the simulation topic) draws samples with replacement, while the jackknife method uses a leave-one-out approach. These methods are invaluable when analytical formulas for standard errors do not exist or are too complex to derive.

Formulas & Calculations

Bootstrap Standard Error: sX̄ = √[1/(B-1) × Σ(θ̂b - θ̄)²]
- B = number of resamples (typically 1,000-10,000)
- θ̂b = statistic from resample b
- θ̄ = mean of all resample statistics
HP 12C steps: Not directly applicable; requires computational methods
Common variations: Percentile bootstrap for confidence intervals

Practical Examples

Traditional Finance Example: Rarely traded stock with 12 monthly returns
- Generate 1,000 bootstrap samples
- Calculate mean for each resample
- Standard error from bootstrap = 0.04408
Calculation walkthrough: Draw 12 returns with replacement, calculate mean, repeat 1,000 times
Interpretation: Provides standard error estimate despite small sample size

DeFi Application

Protocol example: Estimating confidence intervals for impermanent loss in Uniswap pools defi-application
Implementation: Bootstrap historical price paths to simulate IL distributions
Advantages/Challenges: Handles non-normal distributions common in crypto markets

Core Concepts Summary (80/20 Principle)

Must-Know Concepts

Central Limit Theorem: Sample means are normally distributed regardless of population distribution when n ≥ 30
Standard Error: Measures precision of sample estimates, decreases with √n
Stratified Sampling: Divides population into homogeneous groups for more representative samples
Bootstrap: Resampling with replacement to estimate complex statistics’ distributions

Quick Reference Table

Concept	Formula	When to Use	DeFi Equivalent
Standard Error	σ/√n or s/√n	Estimating sample mean precision	TVL volatility estimates
CLT Application	n ≥ 30 for normality	Making population inferences	Protocol usage statistics
Stratified Sampling	Proportional allocation	Heterogeneous populations	LP position analysis by size
Bootstrap	Resample with replacement	No analytical formula exists	Yield farming return distributions

Comprehensive Formula Sheet

Essential Formulas

Sample Variance:
s² = Σ(Xi - X̄)² / (n - 1)
Where: Xi = individual observations, X̄ = sample mean, n = sample size
Used for: Estimating population variance from sample data

Standard Error (known σ):
σX̄ = σ / √n
Where: σ = population standard deviation, n = sample size
Used for: Calculating precision of sample mean estimates

Standard Error (unknown σ):
sX̄ = s / √n
Where: s = sample standard deviation, n = sample size
Used for: Real-world applications where population σ is unknown

Bootstrap Standard Error:
sX̄ = √[1/(B-1) × Σ(θ̂b - θ̄)²]
Where: B = number of resamples, θ̂b = resample statistic, θ̄ = mean of resamples
Used for: Estimating standard error of complex statistics

HP 12C Calculator Sequences

Standard Error Calculation:
RPN Steps: [std dev] ENTER [sample size] √x ÷
Example: 15.5 ENTER 25 √x ÷ = 3.1

Sample Variance (manual):
RPN Steps: Use Σ+ key for data entry, then g s² for variance
Example: Clear statistics (f Σ), enter each value followed by Σ+, then g s

Coefficient of Variation:
RPN Steps: [std dev] ENTER [mean] ÷ 100 ×
Example: 3.5 ENTER 25 ÷ 100 × = 14%

Practice Problems

Basic Level (Understanding)

Problem: Calculate the standard error for a sample of 36 observations with population σ = 12
- Given: n = 36, σ = 12
- Find: Standard error of the sample mean
- Solution: σX̄ = σ/√n = 12/√36 = 12/6 = 2
- Answer: The standard error is 2, meaning sample means typically deviate by 2 units from the population mean

Intermediate Level (Application)

Problem: A DeFi protocol’s daily returns have mean 0.5% and standard deviation 3%. What’s the standard error for 50-day average returns?
- Given: μ = 0.5%, σ = 3%, n = 50
- Find: Standard error of 50-day average
- Solution:
  - σX̄ = 3%/√50 = 3%/7.07 = 0.424%
  - By CLT, distribution is approximately normal
- Answer: 0.424% standard error; 95% of 50-day averages fall within ±0.85% of true mean

Advanced Level (Analysis)

Problem: Compare simple random vs. stratified sampling for a crypto index with 60% large-cap (σ = 2%), 30% mid-cap (σ = 4%), 10% small-cap (σ = 8%)
- Given: Population proportions and volatilities by market cap
- Find: Relative efficiency of stratified sampling
- Solution:
  - Simple random: Uses pooled variance
  - Stratified: Weighted average of strata variances
  - Efficiency gain = reduction in standard error
- Answer: Stratified sampling reduces standard error by approximately 25%, providing more precise estimates

DeFi Applications & Real-World Examples

Traditional Finance Context

Institution Example: Investment banks use stratified sampling for risk assessment across diverse portfolios
Market Application: Index fund managers employ sampling techniques to track benchmarks efficiently
Historical Case: LTCM failure highlighted importance of proper sampling across market conditions

DeFi Parallels

Protocol Implementation: Compound uses exponentially weighted sampling for interest rate models
Smart Contract Logic: Chainlink oracles aggregate price samples using robust statistical methods
Advantages: Complete on-chain data eliminates sampling bias from data availability
Limitations: Gas costs limit complex resampling implementations on-chain

Case Studies

Case 1: Yield Farming Strategy Analysis
- Background: Evaluating returns across multiple DeFi protocols
- Analysis: Bootstrap 1,000 portfolio combinations from historical yields
- Outcomes: 95% confidence interval for expected returns: [12%, 28%] APY
- Lessons learned: Resampling captures correlation structures missed by simple averaging

Common Pitfalls & Exam Tips

Frequent Mistakes

Mistake 1: Using sample standard deviation instead of standard error for inference - Remember to divide by √n
Mistake 2: Applying CLT with small samples (n < 30) - Results may not be reliable
Mistake 3: Ignoring stratification benefits in heterogeneous populations - Can significantly improve precision

Exam Strategy

Time management: 3-4 minutes per sampling question
Question patterns: Often combined with hypothesis testing in subsequent topics
Quick checks: Verify standard error decreases as n increases

Key Takeaways

Essential Points

✓ Central Limit Theorem enables normal distribution assumption for large samples (n ≥ 30) ✓ Standard error = σ/√n measures sampling precision ✓ Stratified sampling improves efficiency for heterogeneous populations ✓ Bootstrap provides distribution estimates without analytical formulas ✓ Sampling method choice significantly impacts inference validity

Memory Aids

Mnemonic: “CLT at 30” - Central Limit Theorem reliable at n = 30+
Visual: Bell curve emerges from any distribution shape as n increases
Analogy: Sampling is like tasting soup - stirring (randomization) ensures representative taste

Cross-References & Additional Resources

Prerequisite: Statistical Measures of Asset Returns (Topic 3)
Related: Hypothesis Testing (Topic 8) builds directly on these concepts
Advanced: Big Data Techniques (Topic 11) extends to massive samples

Source Materials

Primary Reading: Volume 1, Chapter 7, Pages 1-28
Key Sections: CLT explanation (p.12-15), Bootstrap methods (p.20-24)
Practice Questions: End-of-chapter problems 1-15

External Resources

Videos: Khan Academy’s Central Limit Theorem series
Articles: “Bootstrap Methods and Their Application” by Davison & Hinkley
Tools: R’s boot package, Python’s scipy.stats.bootstrap

Review Checklist

Before moving on, ensure you can:

Distinguish between all five sampling methods and their appropriate uses
Calculate standard error given population or sample standard deviation
Explain why CLT is crucial for statistical inference
Describe when to use bootstrap vs. analytical methods
Apply sampling concepts to DeFi protocol analysis

Home

Explorer

Topic 7: Estimation and Inference

Estimation and Inference

Learning Objectives Coverage

LO1: Compare and contrast simple random, stratified random, cluster, convenience, and judgmental sampling and their implications for sampling error in an investment problem

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO2: Explain the central limit theorem and its importance for the distribution and standard error of the sample mean

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

LO3: Describe the use of resampling (bootstrap, jackknife) to estimate the sampling distribution of a statistic

Core Concept

Formulas & Calculations

Practical Examples

DeFi Application

Core Concepts Summary (80/20 Principle)

Must-Know Concepts

Quick Reference Table

Comprehensive Formula Sheet

Essential Formulas

HP 12C Calculator Sequences

Practice Problems

Basic Level (Understanding)

Intermediate Level (Application)

Advanced Level (Analysis)

DeFi Applications & Real-World Examples

Traditional Finance Context

DeFi Parallels

Case Studies

Common Pitfalls & Exam Tips

Frequent Mistakes

Exam Strategy

Key Takeaways

Essential Points

Memory Aids

Cross-References & Additional Resources

Related Topics

Source Materials

External Resources

Review Checklist

Graph View

Table of Contents

Backlinks