--- title: "Likelihood contributions model" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Likelihood contributions model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) old_opts <- options(digits = 2) ``` ## Installation You can install the development version of `likelihood.model` from [GitHub](https://github.com/queelius/likelihood.model) with: ```{r install-likelihood-model, eval = FALSE} if (!require(devtools)) { install.packages("devtools") } devtools::install_github("queelius/likelihood.model") ``` ```{r load-packages, message=FALSE, warning=FALSE, echo = FALSE} library(dplyr) library(likelihood.model) library(generics) ``` ## Exponential series system Consider a series system with $m$ components, where the $p$th component in the $i$-th system has lifetime $T_{i p}$ with rate $\lambda_p$, and $T_{i j}$ for all $i,j$ are independent. The system lifetime is the minimum of the component lifetimes, i.e., $T_i = \min\{T_{i 1}, \ldots, T_{i m}\}$. Let's draw a sample for this model for $m=3$ and $\lambda = (1.1, 1.2, 1.3)$ for a sample size of $n=150$. ```{r data-gen-exp-series} n <- 150 rates <- c(1.1, 1.2, 1.3) set.seed(1235) df <- data.frame( t1 = rexp(n, rates[1]), t2 = rexp(n, rates[2]), t3 = rexp(n, rates[3]) ) df$t <- apply(df, 1, min) # map each observation to the corresponding index (component) that was minimum df$k <- apply(df, 1, which.min) for (i in 1:n) { for (p in 1:3) { if (p == df$k[i]) next df[i, p] <- NA } } head(df) ``` Column $t_k$ is the failure time of the $k$-th component, column $t$ is the column for the series system lifetime, the same as $t_k$, and column $k$ is the component that caused the system failure. We see that the component cause of failure is observed and the other component lifetimes are not observed (NA). In what follows, we consider two cases of masking or censoring, or otherwise *incomplete data* of some form. 1. We know the system lifetime and the component cause of failure. (Columns $t$ and $k$ are observed. 2. We observe a system lifetime with a right-censoring mechanism and the component cause of failure. ## Case 1: Observed system failure time and component cause of failure Suppose we have a series system. As a series system, whenever a component fails, the system fails, and thus the system lifetime is equal to the minimum of the component lifetimes. Suppose the component lifetimes are exponentially distributed with rate $\lambda_k$ for component $k$. In this case, we assume we can observe the system lifetime and the component that caused the system failure, which means we know a few things about each system in the sample: (1) The component cause of failure $K_i$ for each system $i$. (2) The system failure time $T_i$ for each system $i$. (3) The component failure time $T_{i K_i}$ for each system $i$. (4) Each of the non-failed component failure times $T_{i p}$ for each system $i$ and each component $p \neq K_i$ survived at least until $T_{i \, K_i}$. We might be tempted to think that we can just consider each observation, where we know the system lifetime and the component cause, and then just estimate the component's parameter based on its failure time. However, this is incorrect, because when we condition on the component that caused the system failure, we are not observing a random sample of that component's lifetime, but a conditional sample. For instance, if $k = 1$, then we are observing $T_{i 1} | T_{i 1} < T_{i 2} \text{ and } T_{i 1} < T_{i 3}$. What is this distribution? We can derive it as: $$ f_{T_i|K_i}(t_i | k_i) = f_{T_i,K_i}(t_i,k_i) / f_{K_i}(k_i) = f_{K_i|T_i}(k_i|t_i) f_{T_i}(t_i) / f_{K_i}(k_i). $$ By the memoryless property of the exponential distribution, we have $$ f_{K_i|T_i}(k_i|t_i) = f_{K_i}(k_i), $$ since the failure rates are constant (and thus independent of the time), and thus the probability that a componet is the cause of a system failure is independent of the time of the system failure. That means that $$ f_{T_i|K_i}(t_i | k_i) = f_{T_i}(t_i), $$ which in the exponential case is just the pdf of the series system. Since the minimum of exponentially distributed random variables is exponentially distributed with a falure rate equal to the sum of the failure rates of the components, this estimator is an estimator of the sum of the failure rates (or the failure rate of the series system). Instead, we just consider this to be a kind of right-censoring problem, where we have $m=3$ components, and thus when component $1$ is the component cause of failure at $t_i$, that means we observe an exact failure time $T_{i 1} = t_i$ and right-censor the other two components at $t_i$. The loglikehood contribution for this observation is $$ L_i = \log f_1(t_i|\lambda_1) + \log R_2(t_i|\lambda_2) + \log R_3(t_i|\lambda_3), $$ which simplifies to $$ L_i = \log \lambda_1 - (\lambda_1 + \lambda_2 + \lambda_3) t_i, $$ which has a score $$ s_i = \frac{1}{\lambda_1} \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} - t_i \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} $$ and a Hessian $$ H_i = -\frac{1}{\lambda_1^2} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. $$ To get the total log-likelihood, score, and Hessian, we just sum them up (by i.i.d.). In fact, that's what we do here. We show how to use the likelihood contribution model. We show two ways to configure the model to dispatch the `loglik` and `score` functions for the observation type. We implement methods for the `loglik` and `score` functions. This allows dispatching based on the observation type, as defined in `obs_type`. ```{r observed-likelihood, cache=TRUE} score.observed <- function(df, rates, ...) { rep(-sum(df$t), length(rates)) + (df |> dplyr::count(k))$n / rates } loglik.observed <- function(df, rates, ...) { sum(log(rates[df$k])) - sum(rates) * sum(df$t) } model.observed <- likelihood_contr_model$new( obs_type = function(df) { "observed" } ) ``` Now that we have the likelihood model, we can use it to estimate the parameters. We use the `optim` function to do the numerical optimization. We use the `hessian = TRUE` argument to get the Hessian matrix, which we can use to get the standard errors of the estimates. ```{r mle-optim-observed, cache=TRUE} optim(rates, fn = loglik(model.observed), df=df, control=list(fnscale=-1)) ``` The estimates are close to the true rates $\lambda = (1.1, 1.2, 1.3)$, and convergence code 0 indicates successful optimization. Note that `likelihood_model` objects have a default method for fitting MLEs, `fit`. Internally, this method uses `optim` to maximize the log-likelihood, but it packs in some additional information and returns the result as a `fisher_mle` object. ```{r mle-observed-likelihood-model, cache=TRUE} rates.hat <- fit(model.observed)(df, par = rates) summary(rates.hat) ``` The 95% confidence intervals all contain the true parameter values, and the standard errors are small relative to the estimates, reflecting the information in $n=150$ observations. We implement `hess_loglik.observed` for completeness. ```{r hess-loglike} hess_loglik.observed <- function(df, rates, ...) { p <- length(rates) H <- matrix(0, p, p) counts <- df |> dplyr::count(k) for (j in 1:p) { H[j, j] <- -counts$n[j] / rates[j]^2 } H } # Observed information from MLE (negative Hessian) cat("Observed information matrix (from MLE):\n") print(-rates.hat$hessian) # Computed directly via hess_loglik cat("\nHessian at MLE:\n") print(hess_loglik(model.observed)(df, coef(rates.hat))) ``` The two matrices agree, confirming that our analytical `hess_loglik.observed` implementation is correct. Now, we use the Bootstrap method to estimate the sampling distribution of the MLE. ```{r bootstrap, cache=TRUE} model.samp <- sampler(model.observed, df = df, par = rates) boot_result <- model.samp(n = 500) cat("Bootstrap MLE:\n") print(coef(boot_result)) cat("\nBootstrap standard errors:\n") print(se(boot_result)) cat("\nBootstrap bias:\n") print(bias(boot_result)) cat("\nBootstrap covariance:\n") print(vcov(boot_result)) ``` The bootstrap standard errors should be close to the asymptotic standard errors from the Fisher information matrix, and the bootstrap bias estimates should be negligible (all under 0.01), confirming that the MLE is nearly unbiased for this sample size. ## Case 2: Right-censoring We add right-censoring to the model. We assume that the right-censoring times are independent of the component lifetimes and the candidate sets. In particular, we consider the data described earlier, but with right-censoring at time $\tau = 0.5$. The presence of this right-censoring means that we know the event occurred after time $\tau$ for these censored observations, but we do not know exactly when (and it may still be ongoing). As a first stab, you might be inclined to just ignore the censored data. However, this is not a good idea, because it will lead to biased estimates. This is problematic because then we are estimating the parameters of a truncated distribution, $T_i | T_i < \tau$, rather than the original distribution, $S_i, \tau_i$ where $S_i = T_i$ if $T_i < \tau_i$ and otherwise $S_i = \tau_i$. This is a well-known problem in statistics, and it is why we need to use *survival analysis* techniques to handle right-censored data. This is similiar to the earlier problem we described where we said it was a mistake to only focus on the component cause of failure and estimate the parameters of the components independently. In that case, we would be estimating component $j$'s parameters from the truncated distribution, $T_{i j} | K_i = j$, rather than the original distribution, $T_i = \min\{T_{i 1}, \ldots, T_{i m}\}$. In either case, the censoring or masking mechanisms must not be ignored, otherwise we will get biased estimates. Here's what that right-censored data looks like, with right-censoring time of $\tau = 0.5$. ```{r} # if df$t > .5, then the system was right-censored df.censored <- df |> mutate( censored = t > .5, t = ifelse(censored, .5, t) ) df.censored[df.censored[, "censored"], "k"] <- NA df.censored[df.censored[, "censored"], paste0("t", 1:3)] <- rep(NA, 3) head(df.censored) ``` We construct their censoring functions with: ```{r} loglik.censored <- function(df, rates) { -sum(rates) * sum(df$t) } score.censored <- function(df, rates) { rep(-sum(df$t), length(rates)) } hess_loglik_censored <- function(df, rates) { p <- length(rates) matrix(0, p, p) } model.censored <- likelihood_contr_model$new( obs_type = function(df) { ifelse(df$censored, "censored", "observed") } ) mle.censored <- fit(model.censored)(df.censored, par = rates) summary(mle.censored) ``` So, recall the true parameters are $\lambda = (`r rates`)'$. We can see that the estimates are reasonable, particularly for the relatively small sample size. ### Inference on censored estimates Let's examine the censored estimates more closely and compare them to the uncensored case. Censoring reduces information, so we expect wider confidence intervals: ```{r case2-inference} cat("Uncensored estimates:\n") cat(" Estimates:", coef(rates.hat), "\n") cat(" Std errors:", se(rates.hat), "\n") cat("\nCensored estimates:\n") cat(" Estimates:", coef(mle.censored), "\n") cat(" Std errors:", se(mle.censored), "\n") cat("\n95% Confidence Intervals (uncensored):\n") print(confint(rates.hat)) cat("\n95% Confidence Intervals (censored):\n") print(confint(mle.censored)) cat("\nTrue parameters:", rates, "\n") ``` As expected, censoring inflates the standard errors and widens the confidence intervals because the censored observations carry less information than exact ones. Both sets of intervals still contain the true parameter values. We can also use Fisherian likelihood inference to examine the strength of evidence for the censored estimates: ```{r case2-fisherian} # Support at MLE (should be 0) s_at_mle <- support(mle.censored, coef(mle.censored), df.censored, model.censored) cat("Support at MLE:", s_at_mle, "\n") # Evidence for MLE vs true parameters ev <- evidence(model.censored, df.censored, coef(mle.censored), rates) cat("Evidence (MLE vs true):", ev, "\n") # Likelihood interval for rate 1 (first component) li <- likelihood_interval(mle.censored, df.censored, model.censored, k = 8, param = 1) cat("\n1/8 likelihood interval for lambda_1:\n") print(li) ``` The evidence value near 0 means the MLE and true parameters are about equally well-supported by the data, which is expected when the MLE is close to the truth. The $1/8$ likelihood interval for $\lambda_1$ provides a plausible range for the parameter without making probability statements --- it contains all parameter values whose likelihood is at least $1/8$ of the maximum. ## Case 3: Single-parameter model (exponential rate) The `likelihood_contr_model` also works well with single-parameter models. Here we estimate a single exponential rate from data where some observations are exact and others are right-censored, using separate contribution types for each. This example also demonstrates that the `score` function works correctly with scalar parameters (a case that requires careful handling of R's dimension-dropping behavior). ```{r single-param} # Define contribution types for a single-rate exponential loglik.exp_exact <- function(df, par, ...) { lambda <- par[1] sum(log(lambda) - lambda * df$t) } loglik.exp_right <- function(df, par, ...) { lambda <- par[1] -lambda * sum(df$t) } score.exp_exact <- function(df, par, ...) { lambda <- par[1] nrow(df) / lambda - sum(df$t) } score.exp_right <- function(df, par, ...) { lambda <- par[1] -sum(df$t) } # Simulate right-censored exponential data set.seed(42) lambda_true <- 2.0 n_single <- 200 censor_time_single <- 0.4 raw_lifetimes <- rexp(n_single, lambda_true) df_single <- data.frame( t = pmin(raw_lifetimes, censor_time_single), type = ifelse(raw_lifetimes > censor_time_single, "exp_right", "exp_exact") ) cat("Exact observations:", sum(df_single$type == "exp_exact"), "\n") cat("Right-censored:", sum(df_single$type == "exp_right"), "\n") model_single <- likelihood_contr_model$new( obs_type = function(df) df$type, assumptions = c("exponential lifetimes", "non-informative right censoring") ) # Fit and verify mle_single <- fit(model_single)(df_single, par = c(lambda = 1)) cat("\nMLE:", coef(mle_single), "(true:", lambda_true, ")\n") # Closed-form MLE for comparison: d / T d <- sum(df_single$type == "exp_exact") total_T <- sum(df_single$t) cat("Closed-form MLE:", d / total_T, "\n") cat("95% CI:", confint(mle_single)[1, ], "\n") ``` The `optim`-based MLE matches the closed-form estimate $\hat\lambda = d/T$ (number of exact observations divided by total observation time), and the 95% confidence interval contains the true value $\lambda = 2$. ## Best Practices ### When to use which model type | Scenario | Recommended approach | |----------|---------------------| | Standard distribution, possibly censored | `likelihood_name()` | | Heterogeneous observation types | `likelihood_contr_model` | | Known analytical derivatives | Specialized model (e.g., `weibull_uncensored`) | | Closed-form MLE exists | Custom class with `fit` override (e.g., `exponential_lifetime`) | | Standard distribution as a contribution | Define `loglik.` using `likelihood_name` internally | ### Providing analytical derivatives Numerical differentiation (the default fallback) works but is slow --- each gradient evaluation requires $2r \times p$ function evaluations (where $r=6$ and $p$ is the number of parameters). When analytical derivatives are available, providing `score.` and `hess_loglik.` functions gives 10--100x speedups: ```{r analytical-example, eval=FALSE} # Slow (default): numerical differentiation of loglik.my_type loglik.my_type <- function(df, par, ...) { ... } # Fast: provide analytical derivatives score.my_type <- function(df, par, ...) { ... } hess_loglik.my_type <- function(df, par, ...) { ... } ``` The `likelihood_contr_model` checks for these functions in order: 1. Functions passed directly via `logliks`, `scores`, `hess_logliks` arguments 2. Functions found by name in the global environment (`loglik.`, etc.) 3. Numerical differentiation of the log-likelihood (last resort) ### Performance tip: the S3 cache The S3 wrappers (`loglik(model)`, `score(model)`, `hess_loglik(model)`) cache the data frame split and resolved dispatcher functions. During optimization, `optim` calls the returned function hundreds of times with the same `df` but varying `par` --- caching eliminates the repeated `split()` and `obs_type()` overhead. Always use the S3 interface for optimization: ```{r cache-tip, eval=FALSE} # Good: cached S3 wrapper ll <- loglik(model) result <- optim(par, fn = ll, df = df, control = list(fnscale = -1)) # Also good: using fit() which does this internally result <- fit(model)(df, par = par) # Slower: calling R6 method directly in a loop (no caching) # model$loglik(df, par) -- re-splits df every call ``` ```{r cleanup, include=FALSE} options(old_opts) ```