--- title: "Integration with algebraic.mle and algebraic.dist" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Integration with algebraic.mle and algebraic.dist} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) old_opts <- options(digits = 4) ``` ## The Three-Package Ecosystem The `likelihood.model` package is part of an ecosystem of three interoperable packages: - **`algebraic.dist`** — probability distribution objects with a common interface (`params`, `sampler`, `expectation`, etc.) - **`algebraic.mle`** — maximum likelihood estimation infrastructure, defining the `mle` class and generics like `params`, `nparams`, `observed_fim`, `rmap`, `marginal`, and `expectation` - **`likelihood.model`** — likelihood model specification and Fisherian inference When you fit a model with `likelihood.model`, the result is a `fisher_mle` object that inherits from `algebraic.mle::mle`. This means all of the `algebraic.mle` generics work on it directly: ```{r load, message=FALSE, warning=FALSE} library(likelihood.model) library(algebraic.mle) ``` ## Basic Interface: `params`, `nparams`, `observed_fim` Let's fit a Weibull model and explore the MLE through the `algebraic.mle` interface: ```{r basic-fit} set.seed(42) true_shape <- 2.5 true_scale <- 3.0 n <- 200 df <- data.frame(x = rweibull(n, shape = true_shape, scale = true_scale)) model <- weibull_uncensored("x") mle_result <- fit(model)(df, par = c(1, 1)) # algebraic.mle generics — these work because fisher_mle inherits from mle params(mle_result) nparams(mle_result) ``` The MLE estimates (roughly 2.29 for shape, 2.96 for scale) are close to the true values (2.5, 3.0), demonstrating good recovery with $n = 200$ observations. The `params()` and `nparams()` generics work identically to how they work on native `algebraic.mle::mle` objects — because `fisher_mle` inherits from that class, the entire `algebraic.mle` interface is available without any adapter code. The `observed_fim()` function returns the observed Fisher information matrix, computed as the negative Hessian of the log-likelihood at the MLE: ```{r observed-fim} observed_fim(mle_result) ``` The off-diagonal element (roughly $-31$) reveals a negative correlation between the shape and scale estimators — when one is overestimated, the other tends to be underestimated. This is a common feature of location-scale families: the parameters trade off against each other in explaining the data. This should be positive definite at the MLE — we can verify: ```{r fim-pd} eigen(observed_fim(mle_result))$values ``` Both eigenvalues are positive, confirming the MLE sits at a proper maximum of the log-likelihood. The ratio of the eigenvalues (roughly 2.3:1) tells us the likelihood surface is moderately elongated — the curvature is steeper in one direction than the other, meaning one linear combination of the parameters is better determined than the other. For comparison, `vcov()` is the inverse of the observed FIM: ```{r vcov-vs-fim} vcov(mle_result) solve(observed_fim(mle_result)) ``` The numerical equality confirms that `vcov()` is computed as `solve(observed_fim())`, the classical relationship $\widehat{\text{Var}}(\hat\theta) = I(\hat\theta)^{-1}$. This is worth noting because the two matrices carry different conceptual meaning — the FIM measures *information content* (how much the data tell us about the parameters), while the variance-covariance matrix measures *estimation uncertainty* — yet they are matrix inverses of each other. ## Parameter Transformations via `rmap` One powerful feature of `algebraic.mle` is the `rmap()` function, which transforms MLE parameters using the delta method. This propagates uncertainty through nonlinear transformations. For a Weibull distribution with shape $k$ and scale $\lambda$, the mean lifetime is: $$E[T] = \lambda \, \Gamma\!\left(1 + \tfrac{1}{k}\right)$$ We can transform our (shape, scale) MLE to get an MLE for the mean lifetime: ```{r rmap} # Transform Weibull (shape, scale) -> mean lifetime mean_life_mle <- rmap(mle_result, function(p) { c(mean_lifetime = p[2] * gamma(1 + 1 / p[1])) }) params(mean_life_mle) se(mean_life_mle) ``` The delta method gives an estimated mean lifetime of roughly 2.62 with an SE of about 0.085. This propagates the joint uncertainty in (shape, scale) through the nonlinear $\Gamma$-function transformation — something not possible with simple error propagation formulas that treat parameters independently. The off-diagonal covariance we saw earlier is correctly accounted for. Compare to the true mean lifetime: ```{r rmap-compare} true_mean <- true_scale * gamma(1 + 1 / true_shape) cat("True mean lifetime:", true_mean, "\n") cat("Estimated mean lifetime:", params(mean_life_mle), "\n") cat("95% CI:", confint(mean_life_mle), "\n") ``` The true mean lifetime (about 2.66) falls within the 95% confidence interval, as expected. The estimate slightly undershoots because our shape estimate (roughly 2.29) is below the true 2.5 — and the mean lifetime $\lambda\,\Gamma(1 + 1/k)$ is sensitive to shape through the $\Gamma$ function. We can also transform to multiple derived quantities at once: ```{r rmap-multi} # Derive mean, variance, and median of the Weibull distribution derived_mle <- rmap(mle_result, function(p) { k <- p[1]; lam <- p[2] c( mean = lam * gamma(1 + 1/k), var = lam^2 * (gamma(1 + 2/k) - gamma(1 + 1/k)^2), median = lam * log(2)^(1/k) ) }) params(derived_mle) se(derived_mle) ``` Notice that the variance has the largest SE (roughly 0.15) relative to its estimate — second moments are inherently harder to estimate than first moments like the mean and median. This is a general phenomenon: higher-order moments amplify estimation uncertainty because they involve larger powers of the data. ## Marginal Distributions The `marginal()` function extracts the sampling distribution of a single parameter from a multivariate MLE: ```{r marginal} # Marginal for shape parameter shape_mle <- marginal(mle_result, 1) params(shape_mle) se(shape_mle) confint(shape_mle) # Marginal for scale parameter scale_mle <- marginal(mle_result, 2) params(scale_mle) se(scale_mle) confint(scale_mle) ``` Both 95% CIs contain the true values (shape: 2.5; scale: 3.0). The marginal distributions are useful for reporting single-parameter uncertainties, though they discard the correlation between parameters. For inference that respects the joint structure — such as a confidence region for (shape, scale) simultaneously — you would use the full multivariate MLE object. ## Monte Carlo Expectations The `expectation()` function computes Monte Carlo expectations over the MLE's asymptotic sampling distribution. This is useful for computing quantities that are nonlinear functions of the parameters. ```{r expectation} set.seed(123) # E[shape^2] under the asymptotic distribution e_shape_sq <- expectation(mle_result, function(p) p[1]^2, control = list(n = 10000L)) cat("E[shape^2]:", e_shape_sq, "\n") cat("shape^2 at MLE:", params(mle_result)[1]^2, "\n") # Probability that shape > 2 (under asymptotic distribution) pr_shape_gt_2 <- expectation(mle_result, function(p) as.numeric(p[1] > 2), control = list(n = 10000L)) cat("P(shape > 2):", pr_shape_gt_2, "\n") ``` $E[\hat k^2] \approx 5.25$ versus the plug-in value $\hat k^2 \approx 5.24$. The small difference (roughly 0.01) equals $\text{Var}(\hat k)$, as expected from the identity $E[X^2] = \text{Var}(X) + (E[X])^2$. The probability $P(\hat k > 2) \approx 0.99$ gives strong evidence that the true shape exceeds 2, which would be useful in practice to confirm the hazard rate is increasing (a Weibull shape $> 1$ implies increasing hazard; $> 2$ implies the hazard rate itself is accelerating). ## Mean Squared Error Under standard MLE asymptotics, bias is zero and MSE equals the variance-covariance matrix: ```{r mse} mse(mle_result) all.equal(mse(mle_result), vcov(mle_result)) ``` The `TRUE` result confirms that MSE = Vcov under the assumption of zero asymptotic bias. In finite samples (especially small $n$), a bootstrap-based `mse()` can differ from `vcov()` because the bias component $\text{bias}^2$ is no longer negligible. ## Bootstrap Integration The `sampler()` generic creates a bootstrap sampling distribution. The resulting `fisher_boot` object also satisfies the `mle` interface: ```{r bootstrap, cache=TRUE} set.seed(42) boot_sampler <- sampler(model, df = df, par = c(1, 1)) boot_result <- boot_sampler(n = 200) # Same algebraic.mle generics work params(boot_result) nparams(boot_result) se(boot_result) bias(boot_result) ``` The bootstrap bias (roughly 0.006 for shape, $-0.006$ for scale) is negligible compared to the standard errors, consistent with the MLE being asymptotically unbiased. The bootstrap SEs are slightly smaller than the asymptotic SEs — both methods agree well with $n = 200$ observations, but they need not agree exactly because they estimate the sampling variability in different ways (resampling vs the curvature of the log-likelihood). The bootstrap provides an alternative (non-parametric) estimate of the sampling distribution. Compare bootstrap and asymptotic confidence intervals: ```{r boot-compare} cat("Asymptotic 95% CI:\n") confint(mle_result) cat("\nBootstrap percentile 95% CI:\n") confint(boot_result, type = "perc") ``` The asymptotic and bootstrap CIs are close but not identical. The bootstrap percentile intervals are slightly asymmetric (the upper tails extend a bit further), capturing the mild skewness in the finite-sample distribution that the symmetric Wald intervals miss. For well-behaved likelihoods with moderate $n$, the practical difference is small — but in small-sample or non-regular problems, the bootstrap can be substantially more accurate. ## Using `algebraic.dist` Distributions ```{r dist-check, include=FALSE} has_dist <- requireNamespace("algebraic.dist", quietly = TRUE) ``` ```{r dist-section, eval=has_dist} library(algebraic.dist) ``` The `algebraic.dist` package provides distribution objects with a consistent interface. We can compare the MLE's asymptotic sampling distribution to theoretical distribution objects. For example, the MLE of an exponential rate parameter $\hat\lambda$ is asymptotically normal with variance $\lambda^2/n$. We can create this theoretical distribution: ```{r dist-compare, eval=has_dist} set.seed(42) lambda_true <- 2.0 n_exp <- 200 df_exp <- data.frame(t = rexp(n_exp, rate = lambda_true)) model_exp <- exponential_lifetime("t") mle_exp <- fit(model_exp)(df_exp) cat("MLE:", params(mle_exp), "\n") cat("SE:", se(mle_exp), "\n") # Theoretical asymptotic distribution of the MLE asymp_var <- lambda_true^2 / n_exp asymp_dist <- normal(mu = lambda_true, var = asymp_var) cat("\nTheoretical asymptotic distribution:\n") cat(" Mean:", params(asymp_dist)[1], "\n") cat(" Variance:", params(asymp_dist)[2], "\n") # Compare: sample from the MLE's estimated distribution mle_sampler <- sampler(mle_exp) set.seed(1) mle_samples <- mle_sampler(5000) # vs. sample from the theoretical distribution dist_sampler <- sampler(asymp_dist) set.seed(1) dist_samples <- dist_sampler(5000) cat("\nMLE sampler mean:", mean(mle_samples), "\n") cat("Theoretical sampler mean:", mean(dist_samples), "\n") cat("MLE sampler sd:", sd(mle_samples), "\n") cat("Theoretical sampler sd:", sd(dist_samples), "\n") ``` The MLE sampler is centered at $\hat\lambda$ (the estimate from our data), while the theoretical distribution is centered at the true $\lambda = 2.0$. The gap between these centers reflects sampling variability — our particular data set yielded a rate estimate somewhat different from the truth. The standard deviations are comparable, with the MLE-based SE slightly smaller because it uses the estimated (smaller) rate rather than the true rate in the variance formula $\lambda^2/n$. Distribution objects from `algebraic.dist` also support computing exact expectations via numerical integration, which we can compare to our Monte Carlo estimates: ```{r dist-expect, eval=has_dist} # Exact E[X^2] for the asymptotic distribution exact_e_x2 <- expectation(asymp_dist, function(x) x^2) cat("Exact E[lambda_hat^2]:", exact_e_x2, "\n") # Compare to Monte Carlo estimate from the MLE set.seed(42) mc_e_x2 <- expectation(mle_exp, function(p) p[1]^2, control = list(n = 50000L)) cat("Monte Carlo E[lambda_hat^2]:", mc_e_x2, "\n") ``` The exact $E[\hat\lambda^2]$ (from the theoretical distribution centered at $\lambda = 2$) differs noticeably from the Monte Carlo estimate based on the MLE. This is not an error — they are expectations under *different* distributions. The theoretical distribution uses the true parameter, while the MLE-based distribution is centered at $\hat\lambda$. This distinction is important in practice: the MLE-based quantities are what you can compute from data alone, while the theoretical ones require knowing the true parameter (which is usually unknown). ## Summary The `fisher_mle` objects returned by `likelihood.model` are fully compatible with the `algebraic.mle` interface: | Generic | What it does on `fisher_mle` | |---------|------------------------------| | `params()` | Parameter estimates (same as `coef()`) | | `nparams()` | Number of parameters | | `se()` | Standard errors | | `vcov()` | Variance-covariance matrix | | `observed_fim()` | Observed Fisher information ($-H$) | | `bias()` | Asymptotic bias (zero) | | `mse()` | Mean squared error (equals `vcov` asymptotically) | | `aic()` | Akaike information criterion | | `confint()` | Wald confidence intervals | | `rmap()` | Delta method parameter transformations | | `marginal()` | Marginal sampling distributions | | `expectation()` | Monte Carlo expectations over sampling distribution | | `sampler()` | Asymptotic or bootstrap sampling function | ```{r cleanup, include=FALSE} options(old_opts) ```