study/Statistics

[Statistics] Exponential Families(지수족)

희김 2025. 4. 21. 15:22

Exponential Families

1. Definition

Exponential Families은 지수함수와 연관되어 있는 확률분포들을 말합니다.

Exponential Families은 자주 사용되는 많은 확률 모델들을 포함하고 통계이론, 통계적 모델링에 중요한 확률분포입니다.

Exponential Families의 확률 모델들은 같은 General한 형태로 쓰일 수 있고, 점추정, 가설 검정, 통계적 모델링에서 매우 유용하고 중요한 성질을 가지고 있습니다.

만약 각각의 원소가 다음과 같이 표현될떄, 확률 모델 ${f_\theta : \theta \in \Theta}$ 를 Exponential Family라고 부릅니다.

$$
f_{\theta} = h(x)c(\theta)\exp\bigg(\sum_{j=1}^kw_j(\theta)T_j(x)\bigg), \in \mathbb{R}
$$

where $h(x) \geq 0$ and $T_j(x)$ are functions of x which does not depend on $\theta$

and $c(\theta) > 0$ and $w_j(\theta)$ are functions of posiibly vector-valued $\theta$ which does not depend on $x$.

특정 밀도의 집합이 Exponential Families임을 보이기 위해서는 우리는 위의 $h(x), T_j(x), c(\theta), w_j(\theta)$를 구하고 General한 형태가 되는지 보여야 합니다.

여러 분포들을 Exponential Family 형태로 표현해보겠습니다.

2. Examples of Exponential Families

(1) Binomial Distribution $\text{Bin}(n, p)$

$$
f_p(x) = {n \choose x}p^x(1-p)^{n-x}\mathbb{1}(x \in {0,1,\dots,n})
$$

Exponential Family Form:

$$
f_p(x) = {n \choose x}(1-p)^n \exp\left(\log\left(\frac{p}{1-p}\right) x\right) \mathbb{1}(x \in {0,1,\dots,n})
$$

$\theta = p$
$c(\theta) = (1-p)^n$
$w_1(\theta) = \log\left(\frac{p}{1-p}\right)$
$T_1(x) = x$
$h(x) = \mathbb{1}(x \in {0,1,\dots,n})$

(2) Poisson Distribution $\text{Poisson}(\lambda)$

$$
f_{\lambda}(x) = e^{-\lambda}\dfrac{\lambda^x}{x!}\mathbb{1}(x \in {0,1,2,\dots})
$$

Exponential Family Form:

$$
f_{\lambda}(x) = \exp\left( -\lambda + x \log \lambda - \log x! \right) \mathbb{1}(x \in \mathbb{N}_0)
$$

$\theta = \lambda$
$c(\theta) = e^{-\lambda}$
$w_1(\theta) = \log \lambda$
$T_1(x) = x$
$h(x) = \frac{1}{x!}\mathbb{1}(x \in \mathbb{N}_0)$

(3) Normal Distribution $N(\mu, \sigma^2)$

$$
f_{\mu, \sigma^2}(x) = \dfrac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\dfrac{(x - \mu)^2}{2\sigma^2}\right)
$$

Exponential Family Form:

$$
f_{\mu, \sigma^2}(x) = \dfrac{1}{\sqrt{2\pi}} \dfrac{1}{\sigma} \exp\left( -\dfrac{\mu^2}{2\sigma^2} \right) \exp\left( -\dfrac{1}{2\sigma^2}x^2 + x\dfrac{\mu}{\sigma^2} \right)
$$

$\theta = (\mu, \sigma^2)^\top$
$c(\theta) = \dfrac{1}{\sigma} \exp\left( -\dfrac{\mu^2}{2\sigma^2} \right)$
$w_1(\theta) = -\dfrac{1}{2\sigma^2}$
$w_2(\theta) = \dfrac{\mu}{\sigma^2}$
$T_1(x) = x^2$, $T_2(x) = x$
$h(x) = \dfrac{1}{\sqrt{2\pi}}$

(4) Beta Distribution $\text{Beta}(\alpha, \beta)$

$$
f_{\alpha,\beta}(x) = \dfrac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1}(1 - x)^{\beta - 1} \mathbb{1}(x \in (0,1))
$$

Exponential Family Form:

$$
f(x) = \exp\left( (\alpha - 1) \log x + (\beta - 1) \log(1 - x) \right) \cdot \text{const} \cdot \mathbb{1}(x \in (0,1))
$$

$\theta = (\alpha, \beta)^\top$
$c(\theta) = \dfrac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}$
$w_1(\theta) = \alpha - 1$, $w_2(\theta) = \beta - 1$
$T_1(x) = \log x$, $T_2(x) = \log(1 - x)$
$h(x) = \mathbb{1}(x \in (0,1))$

(5) Gamma Distribution $\text{Gamma}(\alpha, \beta)$

$$
f(x) = \dfrac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x} \mathbb{1}(x > 0)
$$

Exponential Family Form:

$$
f(x) = \exp\left( (\alpha - 1) \log x - \beta x \right) \cdot \text{const} \cdot \mathbb{1}(x > 0)
$$

$\theta = (\alpha, \beta)^\top$
$c(\theta) = \dfrac{\beta^\alpha}{\Gamma(\alpha)}$
$w_1(\theta) = \alpha - 1$, $w_2(\theta) = -\beta$
$T_1(x) = \log x$, $T_2(x) = x$
$h(x) = \mathbb{1}(x > 0)$

3. Natural Parameterization of Exponential Families

Exponential Family의 큰 장점 중 하나는 Natural Parameterization( or Canonical Parameterization)입니다.

기본 형태:

$$
f_\theta(x) = h(x) \cdot c(\theta) \cdot \exp\left( \sum_{j=1}^k w_j(\theta) T_j(x) \right)
$$

이 표현에서 $\eta_j = w_j(\theta)$로 치환하면 다음과 같이 정리할 수 있습니다:

▶ Canonical Form (Natural Parameterization)

$$
fη(x)=h(x)⋅\exp\bigg(\sum_{j=1}^k\eta_jT_j(x)−A(\eta)\bigg)
$$

여기서 $A(\eta)$는 normalization constant로, $A(\eta) = -\log c(\theta(\eta))$로 정의됩니다.

💡 장점

Natural parameter $\eta$는 likelihood 계산, 점추정, sufficient statistic 정의에 있어서 계산을 훨씬 단순하게 만들어줍니다.
MLE가 convex 최적화로 풀릴 수 있음
대부분의 Exponential Family는 sufficient statistic이 $T(x)$로 간단히 표현되고, MLE는 $\mathbb{E}_\eta[T(X)] = \bar{T}$ 형태로 주어짐

Ex. $Ber(p)$

일반형: $\theta = p$
자연 모수: $\eta = \log\left(\dfrac{p}{1-p}\right)$

4. Sufficient Statistic in Exponential Families

Exponential Family는 충분통계량(Sufficient Statistic)이 명확하게 정의되고 간단하게 주어진다는 큰 장점이 있습니다.

앞서 본 Canonical Form:

$$
f_\eta(x)=h(x)⋅\exp⁡(\eta^\top(x)−A(\eta))
$$

이때 $T(x)$는 충분통계량(sufficient statistic)이며, 이는 관측된 데이터 $x$에 대한 모든 정보를 담고 있습니다.

즉, 데이터 $x$의 전체 정보를 통계량 $T(x)$ 하나로 요약할 수 있고, 그로부터 모수 $\eta$에 대한 모든 정보를 추정할 수 있습니다.

5. Log-partition Function $A(\eta)$

정규화 상수 역할을 하는 $A(\eta)$는 다음과 같이 정의됩니다:

$$
A(\eta)=\log\int h(x)\exp(\eta^\top T(x))dx
$$

✨ 중요한 성질: $A(\eta)$는 convex function

$\eta$에 대한 convex함은 MLE 문제를 convex optimization으로 바꿔주며, 최적화가 안정적이고 유일해집니다.
$A(\eta)$의 gradient는 다음과 같이 주어집니다:

$$
\nabla A(\eta) = \mathbb{E}_\eta[T(X)]
$$

이 함수의 gradient $\nabla A(\eta)$는 해당 natural paremeter $\eta$에 따른 분포에서의 기댓값을 의미합니다:

즉, $A(\eta)$의 gradient는 sufficient statistic의 기댓값을 의미합니다.

6. Maximum Likelihood Estimation (MLE)

$n$개의 독립적인 관측값 $x_1, \dots, x_n$이 주어졌을 때, likelihood는 다음과 같습니다:

$$
L(\eta) = \prod_{i=1}^n f_{\eta}(x_i) = \prod_{i=1}^n h(x_i) \exp\left( \eta^\top T(x_i) - A(\eta) \right)
$$

log-likelihood는 다음과 같습니다:

$$
ℓ(\eta)=\eta^\top(\sum_{i=1}^nT(x_i))−nA(\eta)+\sum_{i=1}^n\log h(x_i)
$$

MLE는 log-likelihood를 최대화하는 $\hat{\eta}$를 찾는 문제이며, sufficient statistic의 평균으로 조건을 만족시킵니다:

$$
E_{\hat{\eta}}[T(X)]=\bar{T}=\dfrac{1}{n}\sum_{i=1}^nT(x_i)
$$

즉, MLE는 moment-matching과 동일하다. 이것은 Exponential Family의 핵심적인 장점 중 하나입니다.

7. Conjugate Prior

Bayesian 분석에서 Exponential Family는 매우 매끄럽습니다.

왜냐하면 Exponential Family는 항상 conjugate prior를 갖기 때문입니다.

일반적인 형태의 conjugate prior:

$$
π(\eta)\propto \exp(\eta^\top\tau−\lambda A(\eta))
$$

여기서 $\tau$는 prior의 sufficient statistic에 해당하고, $\lambda$는 prior의 "강도"를 나타낸다. 이 prior를 적용했을 때, posterior도 같은 형태를 유지하게 됩니다.

즉,

$$
π(\eta∣x_1,…,x_n)\propto \exp(\eta^\top(\tau+\sum_{i=1}^nT(x_i))−(\lambda+n)A(\eta))
$$

→ prior + data로 posterior의 sufficient statistic이 단순히 누적되어 더해지는 구조가 됩니다.

이 구조 덕분에 Exponential Family는 Bayesian 추론에서도 매우 강력하며, posterior 분석, predictive distribution 계산, decision theory 등 다양한 분야에 효율적으로 활용될 수 있습니다.

8. Generalized Linear Models

Generalized Linear Model (GLM)은 Linear regression의 개념을 확장하여 다양한 형태의 반응변수를 다룰 수 있게 해주는 매우 강력한 통계 모델입니다.

GLM의 핵심 아이디어는 다음 세 가지 구성요소로 정리됩니다:

✅ GLM의 3요소

Random Component
- 반응변수 $Y$의 분포는 Exponential Family에 속함
- 예: Gaussian, Bernoulli, Poisson 등
Systematic Component(or Linear Predictor)
- 설명변수 $X_1, \dots, X_p$로 구성된 선형 함수
- $$
  \eta = X^\top\beta
  $$
- 여기서 $\eta$는 위에서부터 다루었던 Natural parameter로 해석됨
Link Function (연결 함수)
- 기대값 $\mu = \mathbb{E}[Y]$와 선형 예측자 $\eta$를 연결
- $$
  g(\mu) = \eta
  $$
- 또는 반대로 표현하면: $\mu = g^{-1}(\eta)$

💡 왜 Exponential Family가 중요할까?

GLM에서 반응변수의 분포를 Exponential Family로 제한하면 다음과 같은 이점이 있습니다:

log-likelihood 함수가 convex가 되어 안정적인 추정이 가능
MLE가 Newton-Raphson, Fisher Scoring 등 효율적인 방법으로 계산 가능
sufficient statistic이 존재하므로, 이론적으로도 추정이 간결
다양한 분포들이 대부분 포함되어 실제 데이터에 유연하게 대응 가능

📦 예시: 다양한 GLM 모델

모델	반응변수 분포	Link Function	응용 예시
Linear Regression	$Y \sim N(\mu, \sigma^2)$	$g(\mu) = \mu$	연속형 예측
Logistic Regression	$Y \sim \text{Bernoulli}(p)$	$g(\mu) = \log\left(\frac{p}{1-p}\right)$	이진 분류
Poisson Regression	$Y \sim \text{Poisson}(\lambda)$	$g(\mu) = \log(\lambda)$	사건 횟수 예측
Gamma Regression	$Y \sim \text{Gamma}(\alpha, \beta)$	$g(\mu) = 1/\mu;; \text{or};;\log(\mu)$	양의 연속값 예측

🔁 수학적 구조의 통일

GLM은 Exponential Family의 구조 덕분에 다음과 같은 형태로 항상 정리됩니다:

$$
f_{Y}(y|\theta) = h(y)\exp(\theta\cdot T(y) -A(\theta) )
$$

여기서 $\theta = \eta = X^\top \beta$, 그리고 $T(y)$는 충분통계량.

이렇게 되면 GLM은 모든 통계 계산이 하나의 구조로 통일되므로, 다양한 문제에 대해 동일한 프레임워크 안에서 접근이 가능합니다.

지수족에 관해 공부하며 적어보았습니다