---
title: "Confidence intervals"
output: rmarkdown::html_vignette
bibliography: references.bib
csl: proc_b.csl
vignette: >
  %\VignetteIndexEntry{Confidence intervals}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
header-includes: \usepackage{amsmath}
  # devtools::install(build_vignettes = TRUE)
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, message = FALSE}
library(anovir)
```


The negative log-likelihood (nll) functions in this package allow various 
survival models to be fit to observed survival data by maximum likelihood 
estimation. 
The actual variables estimated are the random variables defining the location 
and scale parameters of the survival models. 

This vignette illustrates;

(i) how the estimated means, variances and covariances of these variables can be combined using the delta method to estimate the approximate variance and 95% confidence intervals of a hazard function, and

(ii) how a log-transformation of the hazard function can avoid the lower bounds of these confidence intervals from being less than zero.

Both approaches are used by the function `anovir::conf_ints_virulence` to estimate the confidence intervals of a hazard function describing a pathogen's virulence.
This function, however, is limited to estimates arising from the default form of `anovir::nll_basic` where the location and scale parameters for background mortality and mortality due to infection are described by the variables, `a1,b1` and `a2,b2`, respectively. 

Three worked examples are provided where the number of random variables 
contributing to the parameters of the hazard function are 1, 2 or 3. 

The second example estimates the 95% confidence intervals of a hazard function describing a pathogen's virulence.
The two variables estimated are those defining the location and scale parameters 
for host mortality due to infection. 
This type of analysis is compatible with the function `anovir::conf_ints_virulence` 
and illustrates the sequence of steps taken by the function.

Finally, a worked example also shows how to calculate 95% confidence intervals on the natural scale for the results when they are calculated with `nll_basic_logscale`

### The delta method

The delta method provides a means to estimate the approximate variance of a function when the function is a function of one or more random variables, and where there is an estimate for the variance of each random variable. 
For example if _g_ is a function of _f_, where _f_ is a function of the random variables, $X_1,X_2,\dots,X_n$, such that,

$$ g = f(X_1,X_2,\dots,X_n)$$
and estimates for the variance of each $X_i$ are available, then by the delta method the first order approximation for the variance of _g_ is,

\begin{align}

\textrm{Var}\left(g\right) &\approx
\textrm{Var}\left[f\left(X_1, X_2,\dots, X_n \right) \right] \\
\\
&= \sum_{i=1}^{n} 
\left( \frac{\partial f}{\partial X_i} \right)^2 
\textrm{Var}\left( X_i \right)
+ 2 \sum_{i=1}^n \sum_{j=1}^{n} 
\left(\frac{\partial f}{\partial X_i}\right)
\left(\frac{\partial f}{\partial X_j}\right)
\textrm{Cov}\left(X_i, X_j \right)

\end{align}

where $\partial f / \partial X_i$ is the expression for the first partial derivative of $f(\cdot)$ with respect to $X_i$, while $\textrm{Var}(X_i)$ and $\textrm{Cov}(X_i,X_j)$ are terms for the variance and covariances of the random variables, respectively.

This first order approximation is equivalent to the function being described by a tangent line intersecting it at its mean. 
Higher order approximations are equivalent to describing the function around its mean with polynomial expressions to a higher degree. 
Such expressions are progressively more flexible and provide a better description of the function.
However the added value of approximations of order two or above rapidly goes towards zero if the function is approximately linear about its mean.
In such cases there is little benefit to going beyond a first order approximation.


### Setting confidence interval bounds

The approximate nature of a first order approximation for the variance of a hazard function means it can yield lower estimates that go below zero. 
This situation can be remedied by a back-calculation of the confidence intervals estimated on a logscale.

Let _L_(_t_) be the log transformation of the hazard function _h_(_t_),

$$ L(t) = \log \left[h(t)\right] $$
Define the lower and upper bounds for $L\left(t\right)$, based on its estimator,
$\hat{L}\left(t\right)$,

$$\left[\hat{L}(t)-A,\; \hat{L}(t)+A \right] $$
where _A_ is the distance from the mean to the lower or upper bounds of the confidence interval. The back-transformation from the log-scale gives the lower and upperbounds as, 
$$\left[\exp\left(\hat{L}[t]-A\right),\; \exp\left( \hat{L}[t]+A \right) \right] $$
this equals,

$$\left[ \exp\left(\hat{L}\left[t\right]\right) \cdot \exp \left( -A \right),\; \exp\left( \hat{L}\left[t\right] \right) \cdot \exp\left( A \right) \right]$$

The back-transformation of $\hat{L}\left(t\right)$ is, 
$$ \hat{h}\left(t\right) = \exp \left(\hat{L}\left[t\right] \right)$$

which can be substituted into the bounds above to give,

$$\left[ \hat{h}\left(t\right) \cdot \exp \left(-A\right),\; \hat{h}\left(t\right) \cdot \exp\left( A\right) \right]$$
For a 95% confidence interval, 

\begin{align}

A &= 1.96 \cdot \sqrt{\textrm{Var}\left[\hat{L}\left(t\right)\right]} \\
&= 1.96 \cdot \sqrt{\textrm{Var}\left[\log\left(\hat{h}\left[t\right]\right)\right]}

\end{align}

Use the delta method to estimate $\textrm{Var}\left[\log\left(\hat{h}\left[t\right]\right)\right]$ as a function of $\hat{h}\left(t\right)$; 

\begin{align}

X &= \hat{h}\left(t\right)  \\
Y &= \log \left(X\right) \\
\textrm{Var}\left(Y\right) &\approx \textrm{Var}\left(\log\left[X\right]\right) \\
\textrm{Var}\left(\log \left[X\right]\right)& = \left( \frac{\partial \log\left[X\right]}{\partial X}\right)^2 \cdot \textrm{Var}\left(X\right) \\
\textrm{Var}\left(\log \left[\hat{h}\left(t\right)\right]\right)&= \left( \frac{1}{X}\right)^2 \cdot \textrm{Var}\left(X\right) \\
& = \left( \frac{1}{\hat{h}\left(t\right)}\right)^2 \cdot \textrm{Var}\left(\hat{h}\left[t\right]\right)

\end{align}


Consequently the _A_ term above becomes;

\begin{align}

A &= 1.96 \cdot \sqrt{\left( \frac{1}{\hat{h}\left(t\right)} \right)^2 \cdot \textrm{Var}\left(\hat{h}\left[t\right]\right)} \\
&= \frac{1.96 \cdot \sqrt{\textrm{Var}\left[\hat{h}\left(t\right)\right]}}{\hat{h}\left(t\right)}

\end{align}

and the 95% confidence intervals,

$$\hat{h}\left(t\right) \cdot \exp \left( \frac{\pm 1.96 \cdot\sqrt{\textrm{Var}\left[\hat{h}\left(t\right)\right]}}{\hat{h}\left(t\right)} \right) $$

---

## Worked example: 1 random variable


This example illustrates the estimation of the 95% confidence intervals of a hazard function based on a single random variable.

Sy et al. [@Sy_2014] recorded the survival of caged adult female _Aedes aegypti_ mosquitoes over a three week period. 
The functions `anovir::nll_basic` and `bbmle::mle2` were used to estimate patterns of mosquito mortality based on survival functions for the Weibull distribution.

The variables estimated were _a1, b1, a2, b2_ which defined the location and scale parameters for background mortality and mortality due to infection, respectively.

An initial analysis suggested the scale parameter for background mortality (_b1_) was close to one, such that, the background rate of mortality could be described by the exponential distribution. 
In this case the hazard function for background mortality at time _t_, $\textit{h1}\left(t\right)$ is, 

$$\textit{h1} \left(t\right) =  \frac{1}{\exp \left(\textit{a1}\right)}$$
where _a1_ is the location parameter and a random variable to be estimated.

In this case the approximate variance of the hazard function by the delta method is,

\begin{align}
\textrm{Var}\left[\textit{h1}\left(t\right)\right] &\approx \left[ \frac{\partial \textit{h1}\left(t\right)}{\partial \textit{a1}}\right]^2 \textrm{Var}\left(\textit{a1}\right) \\
\\
&= \left[ \left( \frac{-1}{\exp \left[ \mu_\textit{a1}  \right]} \right)^2 \right] \sigma^2_\textit{a1}
\end{align}

where $\mu_{\textit{a1}}$ and $\sigma^2_{\textit{a1}}$ are the mean and variance of _a1_, respectively.

A revised model `m02` set, _b1_ = 1.0, and gave the following estimates for parameters; _a1, a2_ and _b2_


```{r include = FALSE}

  sy_times <- c(1,3,4,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21,23,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,7,21,22,23,7,14,21,22,23)
  sy_censor <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1)
  sy_inf <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1)
  sy_fq <- c(5,8,8,4,3,2,4,3,3,1,3,3,4,2,3,4,5,2,2,1,7,4,8,9,4,5,4,2,3,6,10,7,13,5,11,5,9,11,5,5,3,4,2,3,4,205,143,6,7,4,103,66)

  data_sy <- data.frame(sy_times, sy_inf, sy_censor, sy_fq)
  
  data_sy$fq <- data_sy$sy_fq
```

```{r include = FALSE, eval = TRUE, echo = FALSE, message = FALSE, warning = FALSE}

    m01_prep_function <- function(a1, b1, a2, b2){
      nll_basic(
        a1, b1, a2, b2,
        data = data_sy,
        time = sy_times,
        censor = sy_censor,
        infected_treatment = sy_inf,
        d1 = 'Weibull', d2 = 'Weibull')
        }

```
```{r include = FALSE, eval = TRUE, echo = FALSE, message = FALSE, warning = FALSE}
    m02 <- mle2(m01_prep_function,
             start = list(a1 = 4.8, a2 = 3.6, b2 = 0.6),
             fixed = list(b1 = 1.0)
             )
```

```{r include = TRUE}
  coef(m02)
```
and the variance-covariance matrix
```{r include = TRUE}
  vcov(m02)
```

The lower and upper bounds of the 95% confidence intervals for the hazard function can be estimated as follows;

(i) based on the delta method
```{r include = TRUE}
  mu_a1 <- coef(m02)[[1]] ; mu_a1
  var_a1 <- vcov(m02)[1,1] ; var_a1

  h1t <- 1/(exp(mu_a1)) ; h1t
  var_h1t <- (-1/exp(mu_a1))^2 * var_a1 ; var_h1t

  lower_ci <- h1t - 1.96*sqrt(var_h1t)
  upper_ci <- h1t + 1.96*sqrt(var_h1t)
  lower_ci ; h1t ; upper_ci
```

(ii) based on delta method and a log transformation of the hazard function
```{r include = TRUE}
  lower_ci2 <- h1t*exp(-1.96*sqrt(var_h1t)/h1t)
  upper_ci2 <- h1t*exp(+1.96*sqrt(var_h1t)/h1t)

  lower_ci2 ; h1t ; upper_ci2
```


---

## Worked example: 2 random variables 

In this example the confidence intervals for virulence are estimated for a Fréchet hazard function based on three random variables. 

An analysis of a subset of the data from Blanford et al [@Blanford_2012] found the virulence of three different isolates of the fungal pathogen \textit{Ma} were best described by hazard functions for the Fréchet distribution. 
Here the 95% confidence intervals for the virulence of strain _Ma07_ are estimated.

### Part I: estimate parameter variables

```{r include = TRUE, warning = FALSE}

library(anovir)

data01 <- data_blanford 

data01 <- subset(data01,
    (data01$block == 5) & (
      (data01$treatment == 'cont') |
      (data01$treatment == 'Ma07') 
    ) &
    (data01$day > 0)
    )

# create column 'g' as index of infected treatment
  data01$g <- data01$inf

  m01_prep_function <- function(a1, b1, a2, b2){
    nll_basic(a1, b1, a2, b2, data = data01, time = t, censor = censor,
              infected_treatment = g, d1 = 'Weibull', d2 = 'Fréchet')
    }

  m01 <- mle2(m01_prep_function,
              start = list(a1 = 2, b1 = 0.5, a2 = 3, b2 = 1)
              )

  summary(m01)

```

### Part II: calculate confidence intervals

```{r includes = TRUE}

# view estimates
  coef(m01) ; vcov(m01)

# assign means, variances & covariances 
# means
  a2 <- m01@coef[[3]]
  b2 <- m01@coef[[4]]
# variances
  var_a2 <- m01@vcov[3,3]
  var_b2 <- m01@vcov[4,4]
# covariances
  cov_a2_b2 <- m01@vcov[3,4]

# specify timespan to calculate
  t <- 1:14

# write an 'expression' for the Fréchet hazard function
  # in terms of 'a2', 'b2' 
  hazard_expression <- expression(
    (1 / (b2 * t)) * exp(-((log(t) - a2) / b2) - exp(-((log(t) - a2) / b2))) / (1 - exp(-exp(-((log(t) - a2) / b2))))
  )

# prepare expression for 'stats::deriv` 
  # NB needs names of variables for partial differentiation 
  dh2_dt <- stats::deriv(hazard_expression, c('a2', 'b2'))

  dh2_dt
  
# evaluate (calculate) hazard function 'h2'
  h2 <- eval(dh2_dt)

# collect estimate of hazard function 'h2' 
# and 'gradients' of partial derivatives
  dh2 <- attr(h2, 'gradient')

# rename columns for clarity
  colnames(dh2) <- c('dh2_da2', 'dh2_db2')

# collect time, estimates of hazard function at time 't'
# and partial derivatives 
  dh2 <- cbind(t, h2, dh2)  

# calculate variance of hazard function
# using delta function  
  var_h2 <- 
    dh2[,'dh2_da2']^2 * var_a2 +
    dh2[,'dh2_db2']^2 * var_b2 +
    2 * dh2[,'dh2_da2'] * dh2[,'dh2_db2'] * cov_a2_b2

# calculate standard deviation of estimate
  sd_h2 <- sqrt(var_h2)

# bind to other estimates
  dh2 <- cbind(dh2, sd_h2)

# calculate lower & upper bounds of 95% confidence interval
  lower_ci <- dh2[,'h2'] * exp(-1.96 * dh2[,'sd_h2'] / dh2[,'h2'])
  upper_ci <- dh2[,'h2'] * exp( 1.96 * dh2[,'sd_h2'] / dh2[,'h2'])

# bind together
  dh2_ma07 <- cbind(dh2, lower_ci, upper_ci)

```

```{r includes = TRUE, echo = FALSE, fig.height = 3.25, fig.width = 3.25, fig.show = 'hold'}

# plot data
  plot(dh2_ma07[,'t'], dh2_ma07[,'h2'],
    type = 'l', col = 'red', lty = 'solid',
    xlim = c(0, 14), ylim = c(0, 0.6),
    xlab = 'Time(days)', ylab = 'h(t) ±95% c.i.',
    main = 'Virulence: Ma07',
    axes = FALSE
    )
  axis(side = 1, seq(0, 14, by = 7))
  axis(side = 2, seq(0, 0.6, by = 0.1))
  lines(dh2_ma07[,'t'], dh2_ma07[,'lower_ci'],
    type = 'l', col = 'grey', lty = 'solid'
    )
  lines(dh2_ma07[,'t'], dh2_ma07[,'upper_ci'],
    type = 'l', col = 'grey', lty = 'solid'
    )
```

### Part III: compare results with those of `conf_ints_virulence`

The calculations made in Part II of this example could have been done by `anovir::conf_ints_virulence` using the output of `bbmle::mle2` 
for the location and scale parameters, their variance and covariance. 

```{r include = TRUE}

# values for a2, b2, var_a2, var_b2, cov_a2b2 were assigned 
# from results of 'bbmle::mle2' above
ls()

# tail of calculations above
tail(dh2_ma07)

# specify mle2object 'm01', Frechet distribution & maximum time
ci_matrix01 <- conf_ints_virulence(
  a2 = a2,
  b2 = b2,
  var_a2 = var_a2,
  var_b2 = var_b2,
  cov_a2b2 = cov_a2_b2,
  d2 = "Frechet", 
  tmax = 14)  

# tail of calculations from 'conf_ints_virulence'
tail(ci_matrix01)

# are the results the same? 
identical(dh2_ma07, ci_matrix01)

```


---

## Worked example: 3 random variables

In this example the confidence intervals for virulence are estimated for a Weibull hazard function based on three random variables. 

Analyses of the data from the study by Lorenz & Koella [@Lorenz_2011; @Lorenz_data_2011] suggested background mortality could be described by the Gumbel distribution and mortality due to infection by the Weibull distribution. 

The location parameter describing mortality due to infection `a2` was made a linear function of log[dose], requiring two random variables to estimated for its intercept and regression coefficient.


### Part I: estimate parameter variables

```{r includes = TRUE, warning = FALSE}

library(anovir)

# get & rename data
  data01 <- data_lorenz 
  head(data01)

# create new version of function 'nll_basic'
  nll_basic2 <- nll_basic

# check location/definition of parameter function a2 ('pfa2')
  body(nll_basic2)[[4]]

# replace 'a2' making 'pfa2' a linear function of log(dose)
  body(nll_basic2)[[4]] <- substitute(pfa2 <- a2i + a2ii * log(data01$Infectious.dose))

# check new version of 'pfa2'
  body(nll_basic2)[[4]]

# update formals, including names for time, censor, etc..
  formals(nll_basic2) <- alist(a1 = a1, b1 = b1, a2i = a2i, a2ii = a2ii, b2 = b2,
    data = data01, time = t, censor = censored, infected_treatment = g,
    d1 = 'Gumbel', d2 = 'Weibull')

# prepare 'prep_function' for analysis
  m01_prep_function <- function(a1, b1, a2i, a2ii, b2){
    nll_basic2(a1, b1, a2i, a2ii, b2)
  }

# send to 'bbmle::mle2' with starting values for variables
  m01 <- mle2(m01_prep_function,
      start = list(a1 = 23, b1 = 4, a2i = 4, a2ii = -0.1, b2 = 0.2)
      )

# summary of results
  summary(m01)

```

### Part II: calculate confidence intervals

```{r includes = TRUE}

# collect & assign results in mleobject 'm01'
  coef(m01)
  vcov(m01)

# means
           a2i <- m01@coef[[3]]
          a2ii <- m01@coef[[4]]
            b2 <- m01@coef[[5]]
# variances
       var_a2i <- m01@vcov[3,3]
      var_a2ii <- m01@vcov[4,4]
        var_b2 <- m01@vcov[5,5]
# covariances
  cov_a2i_a2ii <- m01@vcov[3,4]
    cov_a2i_b2 <- m01@vcov[3,5]
   cov_a2ii_b2 <- m01@vcov[4,5]

# specify timespan to calculate
  t <- 1:28

# specify dose treatment to calculate
  dose <- 5000

# write an 'expression' for hazard function
  hazard_expression <- expression(
    1 / (b2 * t) * exp(((log(t) - (a2i + a2ii * log(dose)))) / b2)
    )

# prepare expression for 'stats::deriv` 
# has names of variables for partial differentiation 
  dh2_dt <- stats::deriv(hazard_expression, c('a2i', 'a2ii', 'b2'))

# evaluate (calculate) hazard function 'h2'
  h2 <- eval(dh2_dt)

# collect estimate of hazard function 'h2' 
# and 'gradients' of partial derivatives
  dh2 <- attr(h2, 'gradient')

# rename columns for clarity
  colnames(dh2) <- c('dh2_da2i', 'dh2_da2ii', 'dh2_db2')

# collect time, estimates of hazard function at time 't'
# and partial derivatives 
  dh2 <- cbind(t, h2, dh2)

# calculate variance of hazard function
# using delta function  
  var_h2 <- 
    dh2[,'dh2_da2i']^2 * var_a2i +
    dh2[,'dh2_da2ii']^2 * var_a2ii +
    dh2[,'dh2_db2']^2 * var_b2 +
    2 * dh2[,'dh2_da2i'] * dh2[,'dh2_da2ii'] * cov_a2i_a2ii +
    2 * dh2[,'dh2_da2i'] * dh2[,'dh2_db2'] * cov_a2i_b2 +
    2 * dh2[,'dh2_da2ii'] * dh2[,'dh2_db2'] * cov_a2ii_b2 

# calculate standard deviation of estimate
  sd_h2 <- sqrt(var_h2)

# bind to other estimates
  dh2 <- cbind(dh2, sd_h2)

# calculate lower & upper bounds of 95% confidence interval
  lower_ci <- dh2[,'h2'] * exp(-1.96 * dh2[,'sd_h2'] / dh2[,'h2'])
  upper_ci <- dh2[,'h2'] * exp( 1.96 * dh2[,'sd_h2'] / dh2[,'h2'])

# bind together
  dh2_5000 <- cbind(dh2, lower_ci, upper_ci)

```

Repeat calculations for dose = 160000; not shown. 

Plot results;

```{r includes = TRUE, echo = FALSE}

# specify dose treatment to calculate
  dose <- 160000

# write an 'expression' for hazard function
  hazard_expression <- expression(
    1 / (b2 * t) * exp(((log(t) - (a2i + a2ii * log(dose)))) / b2)
    )

# prepare expression for 'stats::deriv` 
# has names of variables for partial differentiation 
  dh2_dt <- stats::deriv(hazard_expression, c('a2i', 'a2ii', 'b2'))

# evaluate (calculate) hazard function 'h2'
  h2 <- eval(dh2_dt)

# collect estimate of hazard function 'h2' 
# and 'gradients' of partial derivatives
  dh2 <- attr(h2, 'gradient')

# rename columns for clarity
  colnames(dh2) <- c('dh2_da2i', 'dh2_da2ii', 'dh2_db2')

# collect time, estimates of hazard function at time 't'
# and partial derivatives 
  dh2 <- cbind(t, h2, dh2)

# calculate variance of hazard function
# using delta function  
  var_h2 <- 
    var_a2i * dh2[,'dh2_da2i']^2 +
    var_a2ii * dh2[,'dh2_da2ii']^2 +
    var_b2 * dh2[,'dh2_db2']^2 +
    2 * cov_a2i_a2ii * dh2[,'dh2_da2i'] * dh2[,'dh2_da2ii'] +
    2 * cov_a2i_b2 * dh2[,'dh2_da2i'] * dh2[,'dh2_db2'] +
    2 * cov_a2ii_b2 * dh2[,'dh2_da2ii'] * dh2[,'dh2_db2']

# calculate standard deviation of estimate
  sd_h2 <- sqrt(var_h2)

# bind to other estimates
  dh2 <- cbind(dh2, sd_h2)

# calculate lower & upper bounds of 95% confidence interval
  lower_ci <- dh2[,'h2'] * exp(-1.96 * dh2[,'sd_h2'] / dh2[,'h2'])
  upper_ci <- dh2[,'h2'] * exp( 1.96 * dh2[,'sd_h2'] / dh2[,'h2'])

# bind together
  dh2_160000 <- cbind(dh2, lower_ci, upper_ci)
```

```{r includes = TRUE, echo = FALSE, fig.height = 3.25, fig.width = 3.25, fig.show = 'hold'}

# plot data
  plot(dh2_5000[,'t'], dh2_5000[,'h2'],
    type = 'l', col = 'red', lty = 'solid',
    xlim = c(0, 28), ylim = c(0, 1.2),
    xlab = 'Time(days)', ylab = 'h(t) ±95% c.i.',
    main = 'Virulence: dose 5000',
    axes = FALSE
    )
  axis(side = 1, seq(0, 28, by = 7))
  axis(side = 2, seq(0, 1.2, by = 0.2))
  lines(dh2_5000[,'t'], dh2_5000[,'lower_ci'],
    type = 'l', col = 'grey', lty = 'solid'
    )
  lines(dh2_5000[,'t'], dh2_5000[,'upper_ci'],
    type = 'l', col = 'grey', lty = 'solid'
    )

# plot data
  plot(dh2_160000[,'t'], dh2_160000[,'h2'],
    type = 'l', col = 'red', lty = 'solid',
    xlim = c(0, 28), ylim = c(0, 1.2),
    xlab = 'Time(days)', ylab = 'h(t) ±95% c.i.',
    main = 'Virulence: dose 160000',
    axes = FALSE
    )
  axis(side = 1, seq(0, 28, by = 7))
  axis(side = 2, seq(0, 1.2, by = 0.2))
  lines(dh2_160000[,'t'], dh2_160000[,'lower_ci'],
    type = 'l', col = 'grey', lty = 'solid'
    )
  lines(dh2_160000[,'t'], dh2_160000[,'upper_ci'],
    type = 'l', col = 'grey', lty = 'solid'
    )

```


---

## Confidence intervals back-calculated from `nll_basic_logscale` 

The `nll_basic_logscale` function assumes the values of the input variables are on a log-scale. 
Within the function these values are exponentiated before being sent to the likelihood function. 
The use of `conf_ints_virulence` will not provide the correct confidence intervals with the output from `nll_basic_logscale`.

The following code returns the same confidence intervals as that estimated for the same data without a log-transformation. 

### Part I: estimate parameter variables 

```{r includes = TRUE}

library(anovir)

  data01 <- subset(data_blanford,
    (data_blanford$block == 3) &
      ((data_blanford$treatment == 'cont') |
      (data_blanford$treatment == 'Bb06')) &
    (data_blanford$day > 0)
    )

    l01_prep_function_log <- function(a1 = a1, b1 = b1, a2 = a2, b2 = b2){
      nll_basic_logscale(
        a1 = a1, b1 = b1, a2 = a2, b2 = b2,
        data = data01,
        time = t,
        censor = censor,
        infected_treatment = inf,
        d1 = 'Weibull', d2 = 'Weibull')
        }

    l01 <- mle2(l01_prep_function_log,
             start = list(a1 = 1, b1 = 1, a2 = 1, b2 = 1)
             )

    summary(l01)
```


### Part II: calculate confidence intervals

```{r include = TRUE}

# collect and assign results
  coef(l01)
  vcov(l01)

        a2 <- coef(l01)[[3]]
        b2 <- coef(l01)[[4]]

    var_a2 <- vcov(l01)[3,3]
    var_b2 <- vcov(l01)[4,4]

  cov_a2_b2 <- vcov(l01)[3,4]

# set timescale for calculations
  t <- 1:15

# NB need to exponentiate the terms with 'a2' and 'b2'
# in hazard expression
  hazard_expression <- expression(
    (1/(exp(b2) * t)) * exp((log(t) - exp(a2)) / exp(b2))
  )

# remaining steps are the same as in Example 2 above
  dh2_dt <- stats::deriv(hazard_expression, c('a2', 'b2'))
  h2 <- eval(dh2_dt)
  dh2 <- attr(h2, 'gradient')
  colnames(dh2) <- c('dh2_da2', 'dh2_db2')
  dh2 <- cbind(t, h2, dh2) 
  var_h2 <- 
    dh2[,'dh2_da2']^2 * var_a2 +
    dh2[,'dh2_db2']^2 * var_b2 +
    2 * dh2[,'dh2_da2'] * dh2[,'dh2_db2'] * cov_a2_b2
  sd_h2 <- sqrt(var_h2)
  dh2 <- cbind(dh2, sd_h2)
  lower_ci <- dh2[,'h2'] * exp(-1.96 * dh2[,'sd_h2'] / dh2[,'h2'])
  upper_ci <- dh2[,'h2'] * exp( 1.96 * dh2[,'sd_h2'] / dh2[,'h2'])

  dh2_backlog <- cbind(dh2, lower_ci, upper_ci)
  dh2_backlog <- round(dh2_backlog,4)
  dh2_backlog

  # the confidence intervals are the same as those
  # estimated on the help page of
  # `conf_ints_virulence` for variables that are 
  # not assumed to be on a logscale

```


### References