---
title: "Predictability: Binary, Ordinal, and Continuous"
author: "Donny Williams"
date: "5/20/2020"
bibliography: ../inst/REFERENCES.bib
output: 
  rmarkdown::html_vignette:
    toc: yes
vignette: >
  %\VignetteIndexEntry{Predictability: Binary, Ordinal, and Continuous}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

# Background
This vignette describes a new feature to **BGGM** (`2.0.0`) that allows for 
computing network predictability for binary and ordinal data. Currently 
the available option is Bayesian $R^2$ [@gelman_r2_2019].


### R packages
```{r, eval = FALSE, message=FALSE}
# need the developmental version
if (!requireNamespace("remotes")) { 
  install.packages("remotes")   
}   

# install from github
remotes::install_github("donaldRwilliams/BGGM")
library(BGGM)
```

# Binary
The first example looks at Binary data, consisting of 1190 observations and 6 variables. The data are called `women_math` and the variable descriptions are provided in **BGGM**.

The model is estimated with
```{r, eval=FALSE}
# binary data
Y <- women_math

# fit model
fit <- estimate(Y, type = "binary")
```

and then predictability is computed
```{r, eval=FALSE}
r2 <- predictability(fit)

# print
r2

#> BGGM: Bayesian Gaussian Graphical Models 
#> --- 
#> Metric: Bayes R2
#> Type: binary 
#> --- 
#> Estimates:
#> 
#>  Node Post.mean Post.sd Cred.lb Cred.ub
#>     1     0.016   0.012   0.002   0.046
#>     2     0.103   0.023   0.064   0.150
#>     3     0.155   0.030   0.092   0.210
#>     4     0.160   0.021   0.118   0.201
#>     5     0.162   0.022   0.118   0.202
#>     6     0.157   0.028   0.097   0.208
#> ---
```


There are then two options for plotting. The first is with error bars, denoting the credible interval (i.e., `cred`),
```{r, message=FALSE, eval=FALSE}
plot(r2,
     type = "error_bar",
     size = 4,
     cred = 0.90)
```

![](../man/figures/binary_r2_error.png)

and the second is with a ridgeline plot
```{r, message=FALSE, eval=FALSE}
plot(r2,
     type = "ridgeline",
     cred = 0.50)
```

![](../man/figures/binary_r2_ridge.png)

# Ordinal
In the following, the `ptsd` data is used (5-level Likert). The variable descriptions are provided in **BGGM**. This is based on the polychoric partial correlations, with $R^2$ computed from the corresponding correlations (due to the correspondence between the correlation matrix and multiple regression).

```{r, eval=FALSE}
Y <- ptsd

fit <- estimate(Y + 1, type = "ordinal")
```

The only change is switching type from `"binary` to `ordinal`. One important 
point is the `+ 1`. This is required because for the ordinal approach the first 
category must be 1 (in `ptsd` the first category is coded as 0).

```{r, eval=FALSE}
r2 <- predictability(fit)

# print 
r2 

#> BGGM: Bayesian Gaussian Graphical Models 
#> --- 
#> Metric: Bayes R2
#> Type: ordinal 
#> --- 
#> Estimates:
#> 
#>  Node Post.mean Post.sd Cred.lb Cred.ub
#>     1     0.487   0.049   0.394   0.585
#>     2     0.497   0.047   0.412   0.592
#>     3     0.509   0.047   0.423   0.605
#>     4     0.524   0.049   0.441   0.633
#>     5     0.495   0.047   0.409   0.583
#>     6     0.297   0.043   0.217   0.379
#>     7     0.395   0.045   0.314   0.491
#>     8     0.250   0.042   0.173   0.336
#>     9     0.440   0.048   0.358   0.545
#>    10     0.417   0.044   0.337   0.508
#>    11     0.549   0.048   0.463   0.648
#>    12     0.508   0.048   0.423   0.607
#>    13     0.504   0.047   0.421   0.600
#>    14     0.485   0.043   0.411   0.568
#>    15     0.442   0.045   0.355   0.528
#>    16     0.332   0.039   0.257   0.414
#>    17     0.331   0.045   0.259   0.436
#>    18     0.423   0.044   0.345   0.510
#>    19     0.438   0.044   0.354   0.525
#>    20     0.362   0.043   0.285   0.454
#> ---
```

Here is the `error_bar` plot.
```{r, eval=FALSE}
plot(r2)
```

![](../man/figures/ordinal_r2_error.png)

Note that the plot object is a `ggplot` which allows for further customization (e.g,. adding the variable names, a title, etc.).

# Continuous
It is quite common to compute predictability assuming that the data are Gaussian. In the context of Bayesian GGMs, this was introduced in [@Williams2019]. This can also be implemented in **BGGM**.

```{r, eval=FALSE}
# fit model
fit <- estimate(Y)

# predictability
r2 <- predictability(fit)
```

`type` is missing which indicates that `continuous` is the default.

# Note
$R^2$ for binary and ordinal data is computed for the underlying latent variables. This is also the case
when `type = "mixed` (a semi-parametric copula). In future releases, there will be support for predicting 
the variables on the observed scale.

# References