--- title: "Likelihood functions described" output: rmarkdown::html_vignette bibliography: references.bib csl: proc_b.csl vignette: > %\VignetteIndexEntry{Likelihood functions described} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} header-includes: \usepackage{amsmath} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ### Introduction {#top} This vignette gives details of the log-likelihood functions in this package. The likelihood functions are expressed in terms of survival functions; _S_(_t_), _f_(_t_), etc... The qualitative and quantitative behaviour of these survival functions depends on the probability distribution from which they arise (_Weibull, Gumbel_, etc...) and the value of the location and scale parameters (_a, b_) determining where and over what range mortality is distributed along the time axis. The location and scale parameters can be made functions themselves, e.g., of infectious dose, host age or gender, etc... The next sections outline how the likelihood functions are derived. These are followed by details of each likelihood function in this package and the assumptions they make about the data analysed. Jump to: [Likelihood functions in this package](#likelihood_functions_in_this_package) ### Log-likelihood expressions: no censoring Data collected during the course of a survival experiment provides information on the frequency of individuals dying between sampling intervals. These can be used to estimate the probability density function, _f_(_t_), for the probability an individual alive at time _t_₀ will die at time _t_. If the probability of dying at any one time _t_ is equal for all _n_ members of a population and the death of any one individual _i_ has no effect on the timing of death of any other individuals, the overall likelihood _L_ of dying at time _t_ is the product of the probability of each individual dying at time _t_, \begin{equation} L=\prod_{i=1}^{n}L_i \end{equation} where $L_i=f(t_i)$. It is often more convenient to work with the likelihood expression after log-transformation, \begin{equation} \log L = \sum_{i=1}^{n} \log L_i = \sum_{i=1}^{n} \log f \left( t_i \right) \end{equation} Maximum likelihood techniques can be applied to this expression to find the combination of which probability distribution and values of their associated location and scale parameters are most likely to describe the observed pattern for the frequency of individuals dying at times _t_ in an experiment. The observed data will provide maximum information for estimating the likelihood when the time of death of all individuals in an experiment is known. In this case, the frequencies of individuals dying at each time _t_ will sum to 1. However empirical studies are frequently terminated before all, or even most, individuals die. Consequently these individuals do not experience the event of interest and do contribute towards the estimation of _f_(_t_) as the time of their death is unknown. These are known as censored individuals, or more precisely, as right-censored individuals. They include individuals removed from populations during the course of an experiment, e.g. to control for infection success, those that escape, or are accidently killed, where the timing of the event is known. The next section indicates how likelihood models can adapt to right-censored individuals. ### Log-likelihood expressions: with censoring Although the time of death of right-censored individuals is unknown, it is known they survived at least until the time _t_ when they were censored. This latter information can contribute towards likelihood expressions as the probability density function _f_(_t_) equals the probability of surviving until time _t_, _S_(_t_), multiplied by the probability of dying at time _t_ for those surviving until time _t_, _h_(_t_), \begin{equation} f(t)=S(t) \cdot h(t) \end{equation} This expression can allow for right-censored individuals, \begin{equation} f(t) = \left[ h(t) \right]^d \cdot S(t) \end{equation} where _d_ is a death indicator taking a value of '1' for individuals dying during the experiment and '0' for those censored during or at the end of the experiment. Hence individuals dying during the experiment contribute towards the estimation of _f_(_t_) via _h_(_t_) and _S_(_t_), whereas those censored only contribute via _S_(_t_). By substitution, the log-likelihood expression above becomes \begin{equation} \log L = \sum_{i=1}^{n} \left\{ d \log \left[ h \left( t_i \right) \right] + \log \left[ S \left( t_i \right) \right] \right\} \end{equation} which can be used to analyse survival data, allowing for right-censored individuals. ### Log-likelihood expressions: relative survival, with censoring The likelihood expression above are suitable for situations in which the observed patterns of mortality in a host population are assumed to be determined by single source of mortality, background mortality. This is assumed to be the case for individuals in uninfected or control populations. Hence the __observed__ pattern of mortality in an uninfected treatment is expected to equal, \begin{align} S_{OBS.UNINF}(t) &= S_1(t) \\ \\ f_{OBS.UNINF}(t) &= f_1(t) \\ \\ h_{OBS.UNINF}(t) &= f_{OBS.UNINF}(t) \;/\; S_{OBS.UNINF}(t) \\ \\ &= h_1(t) \end{align} Under the assumptions of relative survival, infected individuals are assumed to experience two independent and mutually exclusive sources of mortality; background mortality and mortality due to infection. Consequently the __observed__ pattern of mortality in an infected treatment is expected to equal, \begin{align} S_{OBS.INF}(t) &= S_1(t) \cdot S_2(t) \\ \\ f_{OBS.INF}(t) &= f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t) \\ \\ h_{OBS.INF}(t) &= f_{OBS.INF}(t) \;/\; S_{OBS.INF}(t) \\ \\ &= h_1(t) + h_2(t) \end{align} where indices '1' and '2' indicate background mortality and mortality due to infection, respectively. Here it assumed all the individuals in an infected treatment are infected and virulence is homogeneous, i.e., as for background mortality, a single hazard function with a particular combination of values for its location and scale parameters can describe the pattern of mortality due to infection experienced by every member of the infected treatment or population. These expressions can be subtituted into the likelihood expression above to get an expression for the likelihood of observing mortality in an infected treatment at time _t_. As infected and uninfected treatments are assumed to experience the same background mortality, the likelihood expressions for the observed mortality in each treatment can be combined as, \begin{equation} \log L = \sum_{i=1}^{n} \left\{ d \log \left[ h_1(t_i) + g \cdot h_2(t_i) \right] + \log \left[ S_1(t_i) \right] + g \cdot \log \left[ S_2(t_i) \right] \right\} \end{equation} where _g_ is an infection-treatment indicator taking a value of '1' for infected treatments and '0' for uninfected treatments. Hence individuals in the infected treatments contribute towards estimates of background mortality and mortality due to infection, whereas the uninfected treatments only contribute towards the estimation of background mortality. ## Likelihood functions in this package {#likelihood_functions_in_this_package} The following describes the negative log-likelihood (_nll_) functions available in this package for estimating the rate of background mortality and virulence. The negative of the log-likelihood is returned as this is required as input for the maximum likelihood estimation of parameters by the function _mle2_ in the package _bbmle_ by Ben Bolker and the R Core Development team. In each case ; * Indices '1' and '2' denote background mortality and mortality due to infection, respectively * _d_ a death indicator; '1' died during experiment, '0' right-censored during or at end of experiment * _g_ an infection-treatment indicator; '1' infected treatment, '0' uninfected treatment. * All models assume each member of an uninfected treatment is uninfected, but not all models assume each member of an infected treatment is infected. * Some models assume all members of an infected treatment are infected, but allow for variation in virulence among members for the population; this variation can be discreately or continously distributed, and be directly observed or assumed to be present. * There is also a model allowing for recovery from infection. ##### The functions: {#back} [nll_basic](#nll_basic) | [nll_basic_logscale](#nll_basic_logscale) | [nll_controls](#nll_controls) | [nll_exposed_infected](#nll_exposed_infected) | [nll_frailty](#nll_frailty) | [nll_frailty_correlated](#nll_frailty_correlated) | [nll_frailty_logscale](#nll_frailty_logscale) | [nll_frailty_shared](#nll_frailty_shared) | [nll_proportional_virulence](#nll_proportional_virulence) | [nll_recovery](#nll_recovery) | [nll_recovery_II](#nll_recovery_II) | [nll_two_inf_subpops_obs](#nll_two_inf_subpops_obs) | [nll_two_inf_subpops_unobs](#nll_two_inf_subpops_unobs) ### nll_basic {#nll_basic} This function is for the 'basic' log-likelihood expression described above. It assumes all the individuals in the infected population are infected and they all experience the same pattern of mortality due to infection, i.e. virulence is homogeneous for the population. \begin{align} S_{OBS.INF}(t) &= S_1(t) \cdot S_2(t) \\ \\ f_{OBS.INF}(t) &= f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t) \\ \\ h_{OBS.INF}(t) &= f_{OBS.INF}(t) \;/\; S_{OBS.INF}(t) \\ \\ &= h_1(t) + h_2(t) \end{align} Hence the observed rate of mortality in the infected population at time _t_ is equal to the the rate of background mortality at time time _t_, _h_₁(_t_), plus the rate of mortality due to infection at time _t_, _h_₂(_t_), that is, the pathogen's virulence at time _t_. The log-likelihood expression for the probability of dying at time _t_ in the infected and uninfected treatments is, \begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[h_1(t_i) + g \cdot h_2(t_i)\right] +\log\left[S_1(t_i)\right] + g \cdot \log\left[S_2(t_i)\right] \right\rbrace \end{equation} [back to list](#back) ### nll_basic_logscale {#nll_basic_logscale} As for _nll_basic_, except input values of location and scale parameters are assumed to be on a logscale. [back to list](#back) ### nll_controls {#nll_controls} This function is for uninfected or control data only. It assumes the observed pattern of mortality is due only to background mortality, \begin{align} S_{OBS.UNINF}(t) &= S_1(t) \\ \\ f_{OBS.UNINF}(t) &= f_1(t) \\ \\ h_{OBS.UNINF}(t) &= f_1(t) \;/\; S_1(t) \\ \\ &= h_1(t) \end{align} The log-likelihood expression for the probability of dying at time _t_ in the uninfected treatment is, \begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[ h_1(t_i)\right] +\log\left[ S_1(t_i) \right] \right\rbrace \end{equation} [back to list](#back) ### nll_exposed_infected {#nll_exposed_infected} Exposure to infection does not garantee infection. This function assumes only a proportion _p_ of the individuals exposed to infection experience both background mortality and an increased rate of mortality due to infection, while the remaining proportion (1 - _p_) experience only background mortality. The expressions describing the observed patterns of mortality in the infected treatment are, \begin{align} S_{OBS.INF}(t) &= p \cdot [S_1(t) \cdot S_2(t)] + (1-p) \cdot [S_1(t)] \\ \\ f_{OBS.INF}(t) &= p \cdot [f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t)] + (1-p) \cdot f_1(t) \\ \\ h_{OBS.INF}(t) &= S_{OBS.INF}(t) \;/\; f_{OBS.INF}(t) \\ \\ \end{align} where _p_ is a constant to be estimated; 0 ≤ _p_ ≤ 1. Here the pattern of mortality experienced by members of the treatment exposed to infection is no longer homogeneous as it varies according to whether an individual is infected or not. The overall log-likelihood expression for the observed pattern of mortality in the infected and uninfected treatments is, \begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} infected & \Rightarrow d \log \left[ h_{OBS.INF}(t_i)\right] +\log\left[ S_{OBS.INF}(t_i) \right] \\ \\ uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log\left[ S_1(t_i) \right] \\ \end{cases} \\ \\ \end{align} This type of model is sometimes referred to as a 'cure' model [@Lambert_2007]. This is not because infected hosts recover from infection, but rather because there will be proportionately fewer infected individuals in the 'infected' population over time due to their higher rate of mortality, i.e., the 'infected' population is progressively 'cured' of its infected members. [back to list](#back) ### nll_frailty {#nll_frailty} This function assumes all the individuals in an infected treatment are infected, but there is unobserved variation in the pathogen's virulence. It is assumed there is an underlying rate of mortality due to infection, _h~V~_(_t_), which is multiplied by a constant, e.g., $\lambda$, where $\lambda$ is distributed as a continuous random variable with a mean value of 1. 'Frail' individuals with values of $\lambda$ > 1 tend to die earlier than those with $\lambda$ < 1 [@Aalen_1988; @Hougaard_1984]. In this package $\lambda$ is assumed to follow either the gamma or inverse Gaussian distribution. In the case of the gamma distribution, the hazard function for mortality due to infection, _h_~2~(_t_), is \begin{equation} h_2(t) = \frac{h_V(t)}{1 + \theta H_V(t)} \end{equation} where _H~V~_(_t_) is the cumulative hazard function for the underlying pattern of virulence at time _t_, _h~V~_(_t_), and $\theta$ is a constant describing the variance of the distribution in the rate of mortality and a parameter to be estimated. The corresponding cumulative survival function for mortality due to infection, _S_~2~(_t_), is \begin{equation} S_2(t) = [1 + \theta H_v(t)]^{-1/\theta} \end{equation} In the case of the inverse Gaussian distribution, the hazard function for mortality due to infection, _h_~2~(_t_), is \begin{equation} h_2(t) = \frac{h_V(t)}{[1 + 2 \theta H_V(t)]^{1/2}} \end{equation} and the cumulative survival function, _S_~2~(_t_), \begin{equation} S_2(t) = \exp \left\{ \frac{1}{\theta} \cdot \left( 1 - [1 + 2 \theta H_V(t)]^{1/2} \right) \right\} \end{equation} In both cases the overall log-likelihood expression for the observed mortality in infected and uninfected treatments is, \begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[ h_1(t_i) + g \cdot h_2(t_i) \right] +\log\left[ S_1(t_i) \right] + g \cdot \log[S_2(t_i)] \right\rbrace \end{equation} with the expressions for _h_~2~(_t_) and _S_~2~(_t_) corresponding with the probability distribution describing the unobserved variation in virulence. [back to list](#back) ### nll_frailty_correlated {#nll_frailty_correlated} This function allows for separate, but positively correlated, frailty effects acting on background mortality and mortality due to infection, where the strength of this correlation is a variable to be estimated [@Zahl_1997]. The observed rate of mortality due to infection at time _t_ in an infected population, _h~OBS.INF~_(_t_), equals \begin{equation} h_{OBS.INF}(t) = h_1(t) + h_2(t) - \rho \sqrt{\theta_1} \sqrt{\theta_2} \frac{h_1(t)H_V(t) + h_2(t)H_B(t)}{1 + \theta_B H_B(t) + \theta_V H_V(t)} \end{equation} where _h_~1~(_t_) and _h_~2~(_t_) are the population wide rates of mortality due to the background mortality and that due to infection at time _t_, respectively. _H~B~_(_t_) and _H~V~_(_t_) are the cumulative hazard functions for the underlying background mortality and the underlying virulence of the pathogen at time _t_, respectively; $\theta_B$ and $\theta_V$ are the variances of the unobserved variation in the background mortality and that due to infection, respectively, and constants to be estimated. $\rho$ is the strength of the positive correlation between the two frailty effects ($\rho$ ≥ 0). Estimates of $\rho$ will tend towards zero as the difference in the variance of the two frailty effects increases. The population wide rate of background mortality at time _t_, _h_~1~(_t_), is \begin{equation} h_1(t) = \frac{h_B(t)}{1 + \theta_B H_B(t)} \\ \end{equation} where _h~B~_(_t_) and _H~B~_(_t_) are the hazard and cumulative hazard functions for the underlying rate of background mortality at time _t_, respectively. The population wide rate of mortality due to infection at time _t_, _h_~2~(_t_), is \begin{equation} h_2(t) = \frac{h_V(t)}{1 + \theta_V H_V(t)} \end{equation} where _h~V~_(_t_) and _H~V~_(_t_) are the hazard and cumulative hazard functions for the underlying rate of mortality due to infection at time _t_, respectively. The observed cumulative survival of infected hosts at time _t_, _S~OBS.INF~_(_t_), is given by \begin{equation} S_{OBS.INF}(t) = \left[S_1(t)^{-\theta_B} + S_2(t)^{-\theta_V} - 1 \right]^{- \rho / \sqrt{\theta_B} \sqrt{\theta_V}} \cdot S_1(t)^{1 - \rho \sqrt{\theta_B} \sqrt{\theta_V}} \cdot S_2(t)^{1 - \rho \sqrt{\theta_V} \sqrt{\theta_B}} \\ \\ \end{equation} where \begin{equation} S_1(t) = \left[ 1 + \theta_B H_B(t) \right]^{-1 / \theta_B} \\ \end{equation} and \begin{equation} S_2(t) = \left[ 1 + \theta_V H_V(t) \right]^{-1 / \theta_V} \\ \end{equation} The expressions _S_~1~(_t_) and _S_~2~(_t_) correct for an error in the original paper where the cumulative hazard terms were multiplied by $\sqrt{\theta_i}$, instead of $\theta_i$; _i_ = _U, V_ [@Zahl_1997]. [back to list](#back) ### nll_frailty_logscale {#nll_frailty_logscale} This function is identical to the function nll_frailty except it values for location and scale parameters are on a logscale; this can help convergence during maximum likelihood estimation. [back to list](#back) ### nll_frailty_shared {#nll_frailty_shared} In this function it is assumed there is an underlying rate of background mortality, _h_~1~(_t_), and an underlying rate of mortality due to infection, _h_~2~(_t_), and both are multiplied by a constant, e.g., $\lambda$, \begin{equation} h(t) = \lambda \left[ h_1(t) + h_2(t) \right] \end{equation} where $\lambda$ is distributed as a continuous random variable drawn from a gamma distribution with a mean of 1 and variance $\theta$ [@Zahl_1997]. In this case, the observed rate of mortality in the infected treatment, _h~OBS.INF~_(_t_), at time _t_ will be, \begin{equation} h_{OBS.INF}(t) = \frac{h_1(t) + h_2(t)}{1 + \theta [H_1(t) + H_2(t)]} \end{equation} the observed cumulative survival at time _t_, _S~OBS.INF~_(_t_), will be, \begin{equation} S_{OBS.INF}(t) = \left(1 + \theta \left[H_1(t) + H_2(t) \right] \right)^{-1/\theta} \end{equation} and the corresponding likelihood model, \begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[\frac{h_1(t_i) + gh_2(t_i)}{1 + \theta[H_1(t_i) + gH_2(t_i)]} \right] +\log \left[(1 + \theta[H_1(t_i) + gH_2(t_i)])^{-1/\theta} \right] \right\rbrace \end{equation} where _g_ is an infection indicator taking a value of '1' for an infected treatment and '0' for an uninfected or control treatment. See the original paper [@Zahl_1997] for a discussion of the shortcomings of this type of model. [back to list](#back) ### nll_proportional_virulence {#nll_proportional_virulence} This function assumes a proportional hazards relationship for virulence among infected treatments within an experiment, such that, \begin{equation} h_A(t) \;/\; h_B(t) = c \end{equation} where _h~A~_(_t_) and _h~B~_(_t_) are the hazard functions describing pathogen virulence in infected treatments _A_ and _B_ at time _t_, respectively, and _c_ is a constant. A treatment of infected hosts is chosen as the reference population. The observed patterns of survival in this treatment are described in the same way as for the function _nll\_basic_, \begin{align} S_{OBS.REF}(t) &= S_1(t) \cdot S_{REF}(t) \\ \\ f_{OBS.REF}(t) &= f_1(t) \cdot S_{REF}(t) + f_{REF}(t) \cdot S_1(t) \\ \\ h_{OBS.REF}(t) &= f_{OBS.REF}(t) \;/\; S_{OBS.REF}(t) \\ \\ &= h_1(t) + h_{REF}(t) \end{align} where the indice _REF_ indicates the reference treatment or population of infected hosts. The patterns of survival and mortality in other infected treatments are estimated as a function of those for the reference population, \begin{align} S_{OBS.ALT}(t) &= S_1(t) \cdot \left[ S_{REF}(t) \right]^\theta \\ \\ f_{OBS.ALT}(t) &= f_1(t) \cdot \left[ S_{REF}(t) \right]^\theta + \theta \cdot \left[ S_{REF}(t) \right]^{\theta - 1} \cdot f_{REF}(t) \cdot S_1(t) \\ \\ h_{OBS.ALT}(t) &= f_{OBS.ALT}(t) \;/\; S_{OBS.ALT}(t) \\ \\ &= h_1(t) + \theta \cdot h_{REF}(t) \end{align} where the indice _ALT_ indicates and alternative infected treatment, and $\theta$ is a constant scaling the pathogen's virulence relative to that in the reference population; $\theta > 0$ The overall log-likelihood expression for the observed pattern of mortality in the infected and uninfected treatments is, \begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninf & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ ref & \Rightarrow d \log \left[ h_1(t_i) + h_{REF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{REF}(t_i) \right] \\ \\ alt & \Rightarrow d \log \left[ h_1(t_i) + \theta h_{REF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \theta \log \left[ S_{REF}(t_i) \right] \\ \end{cases} \\ \\ \end{align} where _uninf, ref_ and _alt_ refer to the uninfected, infected reference and alternative infected treatments, respectively. [back to list](#back) ### nll_recovery {#nll_recovery} This function allows for infected hosts that recover from infection. It assumes that all hosts in an infected treatment are initially infected and experience increased mortality due to infection. However hosts can recover from infection, such that, recovered individuals only experience background mortality equal to that of hosts in a matching uninfected or control treatment. The timing of recovery from infection is assumed to follow a probability distribution and to be independent of the probability distributions for the timing of background mortality and mortality due to infection. Hence the pattern of events in an infected treatment at time _t_, _S~INF.POP~_(_t_), can be expressed as the product of three independent probability distributions, \begin{equation} S_{INF.POP}(t) = S_1(t) \cdot S_2(t) \cdot S_3(t) \end{equation} where _S_~1~(_t_) is the cumulative survival function for background mortality at time _t_, _S_~2~(_t_) is the cumulative survival function for mortality due to infection at time _t_, and _S_~3~(_t_) is the cumulative probaility that an infection 'survives' until time _t_. Here the index 'INF.POP' is used rather than 'OBS.INF' as recovery from infection may not be an observed event. Differentiating the above with respect to time and taking the negative gives the probability density function, _f~INF.POP~_(_t_), for events occurring in the population at time _t_, \begin{equation} \begin{aligned} f_{INF.POP}(t) &= f_1 \left(t\right) \cdot S_2\left(t\right) \cdot S_3\left(t\right) \\ &+f_2 \left(t\right) \cdot S_1\left(t\right) \cdot S_3\left(t\right) \\ &+f_3 \left(t\right) \cdot S_1\left(t\right) \cdot S_2\left(t\right) \end{aligned} \end{equation} where the sum of the first two expressions gives the probability an infected host is infected and dies at time _t_, from either background mortality or mortality due to infection, and corresponds with data collected on the time of death of infected hosts. The third expression describes the probability an infected host is alive and recovers from infection at time _t_, and corresponds with data collected on the timing of recovery of infected hosts. Hence the expression can be estimated when the timing of recovery is known. This will not be the case if a host's recovery status is only determined after the host has died or been censored, as the data collected correspond with the time hosts recovered and subsequently survived before dying or being censored. However it is assumed recovered individuals experience the same background mortality as uninfected hosts. This information can be used to estimate the likelihood a recovered individual dying at time _t_, recovered at an earlier time and subsequently survived until time _t_, when it died or was censored. For example, in an experiment recording survival daily, the probability a recovered individual dies on the second day _t_~2~ can be estimated from, \begin{equation*} \begin{aligned} & \left[ f_3(t_2) \cdot S_1(t_2) \cdot S_2(t_2) \right] \cdot h_1(t_2) \; + \nonumber \\ & \left[ f_3(t_1) \cdot S_1(t_1) \cdot S_2(t_1) \right] \cdot \left[ S_1(t_2)\;/\;S_1(t_1) \right] \cdot h_1(t_2) \end{aligned} \end{equation*} where the first line gives the probability an individual survives until and recovers on the second day, multiplied by the background rate of mortality on day 2. The second line gives the probability an individual recovered on the first day, survived background mortality from day 1 to day 2, _S_~1~(_t_~2~) / _S_~1~(_t_~1~), and died of background mortality on day 2. Multiplication by _h_~1~(_t_~2~) would be omitted in cases where a recovered individual was censored at the end of day 2. Hence observed data for the times when recovered individuals die or are censored can be used to estimate the unobserved distribution of recovery times. The _nll_recovery_ function assumes all the individuals in an infected treatment were initially infected and their recovery status is known at the time of their death or censoring, i.e., still infected vs. recovered. This information is used along with data on the timing of death or censoring in a matching uninfected or control treatment to calculate the likelihood of the events being described by the model. NB This function requires the data to be specified in a specific format different to that of other functions. [back to list](#back) ### nll_recovery_II {#nll_recovery_II} This is essentially the same model _nll_recovery_, except it assumes there was no background mortality, such that, _S_~1~(_t_) = 1 and _f_~1~(_t_) = 0 throughout the experiment. In such cases the only observed dynamics will be the death of infected hosts due to infection, before they recover from infection, _f_~2~(_t_)_S_~1~(_t_)_S_~3~(_t_), or _f_~2~(_t_)_S_~3~(_t_) as _S_~1~(_t_) = 1. As recovered hosts are assumed to only experience the same background mortality as individuals in a matching control treatment, any recovered individuals will all survive to be right-censored at the end of the experiment. [back to list](#back) ### nll_two_inf_subpops_obs {#nll_two_inf_subpops_obs} This function assumes all the individuals in an infected treatment are infected but there is data identifying the presence of two discrete subpopulations, e.g., _A_ and _B_ in proportions _p_ and (1 - _p_), respectively, where the additional rate of mortality due to infection in the two subpopulations is different, e.g., hosts dying with/without visible signs of infection or those with viral titres above/below a threshold value. The observed patterns of mortality in the infected treatment as a whole will be, \begin{align} S_{OBS.INF}(t) &= p \cdot [ S_1(t) \cdot S_{2A}(t)] + (1-p) \cdot [S_1(t) \cdot S_{2B}(t)] \\ \\ f_{OBS.INF}(t) &= p \cdot [f_1(t) \cdot S_{2A}(t) + f_{2A}(t) \cdot S_1(t)] + (1-p) \cdot [f_1(t) \cdot S_{2B}(t) + f_{2B}(t) \cdot S_1(t)] \\ \\ h_{OBS.INF}(t) &= S_{OBS.INF}(t) \;/\; f_{OBS.INF}(t) \\ \\ \end{align} where the indices _2A_ and _2B_ denote the two discrete subpopulations of infected hosts. Mortality in the two subpopulations is assumed to act independently, consequently the likelihood expressions for each population can be estimated separately based on their membership of subpopulation _A_ or _B_, \begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ infected \; A & \Rightarrow d \log \left[ h_1(t_i) + h_{2A}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{2A}(t_i) \right] \\ \\ infected \; B & \Rightarrow d \log \left[ h_1(t_i) + h_{2B}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{2B}(t_i) \right] \\ \end{cases} \\ \\ \end{align} where each subpopulation shares the same background mortality as a matching control treatment. [back to list](#back) ### nll_two_inf_subpops_unobs {#nll_two_inf_subpops_unobs} This function is similar to the function _nll_two_obs_inf_subpops_, except the presence of two discrete subpopulations in proportions _p_ and (1 - _p_) is assumed rather than based on evidence, e.g., data on visual signs of infection or viral titres were not recorded. In this case, _p_ is a parameter to be estimated (0 ≤ _p_ ≤ 1). The observed patterns of mortality will be as described for _nll_two_obs_inf_subpops_, but the likelihood expression for the two presumed subpopulations can not be estimated separately as the data are not available to identify them, \begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ infected & \Rightarrow d \log \left[ h_1(t_i) + h_{OBS.INF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{OBS.INF}(t_i) \right] \\ \end{cases} \\ \\ \end{align} hence the patterns of mortality and the proportions of the two supposed subpopulations needs to be inferred from the pattern of mortality for the infected population as a whole. [back to list](#back) ### [Back to top](#top) ### References