This vignette can be cited as:

```
To cite package 'statsExpressions' in publications use:
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes
and Expressions with Statistical Details. Journal of Open Source
Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
A BibTeX entry for LaTeX users is
@Article{,
doi = {10.21105/joss.03236},
url = {https://doi.org/10.21105/joss.03236},
year = {2021},
publisher = {{The Open Journal}},
volume = {6},
number = {61},
pages = {3236},
author = {Indrajeet Patil},
title = {{statsExpressions: {R} Package for Tidy Dataframes and Expressions with Statistical Details}},
journal = {{Journal of Open Source Software}},
}
```

The `{statsExpressions}`

package has two key aims: to
provide a consistent syntax to do statistical analysis with tidy data,
and to provide statistical expressions (i.e., pre-formatted in-text
statistical results) for plotting functions. Currently, it supports
common types of statistical approaches and tests: parametric,
nonparametric, robust, and Bayesian *t*-test, one-way ANOVA,
correlation analyses, contingency table analyses, and meta-analyses. The
functions are pipe-friendly and compatible with tidy data.

Statistical packages exhibit substantial diversity in terms of their syntax and expected input and output data type. For example, some functions expect vectors as inputs, while others expect dataframes. Depending on whether it is a repeated measures design or not, functions from the same package might expect data to be in wide or tidy format. Some functions can internally omit missing values, while others do not. Furthermore, the statistical test objects returned by the test functions might not have all required information (e.g., degrees of freedom, significance, Bayes factor, etc.) accessible in a consistent data type. Depending on the specific test object and statistic in question, details may be returned as a list, a matrix, an array, or a dataframe. This diversity can make it difficult to easily access all needed information for hypothesis testing and estimation, and to switch from one statistical approach to another.

This is where `{statsExpressions}`

comes in: It can be
thought of as a unified portal through which most of the functionality
in these underlying packages can be accessed, with a simpler interface
and with tidy data format.

Unlike `{broom}`

(Robinson, Hayes, &
Couch, 2021) or `{parameters}`

(Lüdecke, Ben-Shachar, Patil, & Makowski,
2020), the goal of `{statsExpressions}`

is not to
convert model objects into tidy dataframes, but to provide a consistent
and easy syntax to carry out statistical tests. Additionally, none of
these packages return statistical expressions.

The package offers functions that allow users choose a statistical approach without changing the syntax (i.e., by only specifying a single argument). The functions always require a dataframe in tidy format (Wickham et al., 2019), and work with missing data. Moreover, they always return a dataframe that can be further utilized downstream in the workflow (such as visualization).

Function | Parametric | Non-parametric | Robust | Bayesian |
---|---|---|---|---|

`one_sample_test` |
✅ | ✅ | ✅ | ✅ |

`two_sample_test` |
✅ | ✅ | ✅ | ✅ |

`oneway_anova` |
✅ | ✅ | ✅ | ✅ |

`corr_test` |
✅ | ✅ | ✅ | ✅ |

`contingency_table` |
✅ | ✅ | - | ✅ |

`meta_analysis` |
✅ | - | ✅ | ✅ |

`{statsExpressions}`

internally relies on
`stats`

package for parametric and non-parametric (R Core
Team, 2021), `WRS2`

package for robust (Mair
& Wilcox, 2020), and `BayesFactor`

package for
Bayesian statistics (Morey & Rouder, 2020). The
random-effects meta-analysis is carried out using `metafor`

(parametric) (Viechtbauer, 2010),
`metaplus`

(robust) (Beath, 2016), and
`metaBMA`

(Bayesian) (Heck et al., 2019)
packages. Additionally, it relies on `easystats`

packages
(Ben-Shachar, Lüdecke, & Makowski, 2020; Lüdecke et al.,
2020; Lüdecke, Ben-Shachar, Patil, Waggoner, &
Makowski, 2021; Lüdecke, Waggoner, & Makowski, 2019; Makowski, Ben-Shachar,
& Lüdecke, 2019; Makowski, Ben-Shachar, Patil, & Lüdecke,
2020) to compute appropriate effect size/posterior estimates
and their confidence/credible intervals.

To illustrate the simplicity of this syntax, let’s say we want to run a one-way ANOVA. If we first run a non-parametric ANOVA and then decide to run a robust ANOVA instead, the syntax remains the same and the statistical approach can be modified by changing a single argument:

```
%>% oneway_anova(cyl, wt, type = "nonparametric")
mtcars #> # A tibble: 1 × 15
#> parameter1 parameter2 statistic df.error p.value
#> <chr> <chr> <dbl> <int> <dbl>
#> 1 wt cyl 22.8 2 0.0000112
#> method effectsize estimate conf.level conf.low
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Kruskal-Wallis rank sum test Epsilon2 (rank) 0.736 0.95 0.624
#> conf.high conf.method conf.iterations n.obs expression
#> <dbl> <chr> <int> <int> <list>
#> 1 1 percentile bootstrap 100 32 <language>
%>% oneway_anova(cyl, wt, type = "robust")
mtcars #> # A tibble: 1 × 12
#> statistic df df.error p.value
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12.7 2 12.2 0.00102
#> method
#> <chr>
#> 1 A heteroscedastic one-way ANOVA for trimmed means
#> effectsize estimate conf.level conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Explanatory measure of effect size 1.05 0.95 0.843 1.50
#> n.obs expression
#> <int> <list>
#> 1 32 <language>
```

These functions are also compatible with other popular data
manipulation packages. For example, we can use combination of
`dplyr`

and `{statsExpressions}`

to repeat the
same statistical analysis across grouping variables.

```
# running one-sample proportion test for `vs` at all levels of `am`
%>%
mtcars group_by(am) %>%
group_modify(~ contingency_table(.x, vs), .keep = TRUE) %>%
ungroup()
#> # A tibble: 2 × 14
#> am statistic df p.value method effec…¹ estim…² conf.…³ conf.…⁴ conf.…⁵
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1.32 1 0.251 Chi-squ… Pearso… 0.254 0.95 0 1
#> 2 1 0.0769 1 0.782 Chi-squ… Pearso… 0.0767 0.95 0 1
#> # … with 4 more variables: conf.method <chr>, conf.distribution <chr>,
#> # n.obs <int>, expression <list>, and abbreviated variable names ¹effectsize,
#> # ²estimate, ³conf.level, ⁴conf.low, ⁵conf.high
```

In addition to other details contained in the dataframe, there is
also a column titled `expression`

, which contains a
pre-formatted text with statistical details. These expressions (Figure
1) attempt to follow the gold standard in statistical reporting for both
Bayesian (Doorn et al., 2020) and Frequentist
(American Psychological Association and others,
2019) frameworks.

This expression be easily displayed in a plot (Figure 2). Displaying
statistical results in the context of a visualization is indeed a
philosophy adopted by the `{ggstatsplot}`

package (Patil,
2021), and `{statsExpressions}`

functions as its
statistical processing backend.

```
# needed libraries
library(ggplot2)
# creating a dataframe
<- oneway_anova(iris, Species, Sepal.Length, type = "nonparametric")
res
# create a ridgeplot using `ggridges` package
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_boxplot() + # use 'expression' column to display results in the subtitle
labs(
x = "Penguin Species",
y = "Body mass (in grams)",
title = "Kruskal-Wallis Rank Sum Test",
subtitle = res$expression[[1]]
)
```

`{statsExpressions}`

is licensed under the GNU General
Public License (v3.0), with all source code stored at GitHub.
In the spirit of honest and open science, requests and suggestions for
fixes, feature updates, as well as general questions and concerns are
encouraged via direct interaction with contributors and developers by
filing an issue
while respecting *Contribution
Guidelines*.

I would like to acknowledge the support of Mina Cikara, Fiery
Cushman, and Iyad Rahwan during the development of this project.
`{statsExpressions}`

relies heavily on the `easystats`

ecosystem, a collaborative project created to facilitate the usage of
`R`

for statistical analyses. Thus, I would like to thank the
members of
easystats as well as the users.

American Psychological Association and others. (2019). *Publication Manual of the American Psychological
Association* (7th Edition.). American Psychological
Association.

Beath, K. J. (2016). metaplus: An
R package for the analysis of robust meta-analysis and
meta-regression. *R Journal*, *8*(1), 5–16. doi:10.32614/RJ-2016-001

Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). effectsize: Estimation of effect size indices and
standardized parameters. *Journal of Open Source Software*,
*5*(56), 2815. doi:10.21105/joss.02815

Doorn, J. van, Bergh, D. van den, Böhm, U., Dablander, F., Derks, K.,
Draws, T., Etz, A., et al. (2020). The JASP guidelines for conducting
and reporting a bayesian analysis. *Psychonomic Bulletin &
Review*, 1–14. doi:10.3758/s13423-020-01798-5

Heck, W., D., Gronau, F., Q., Wagenmakers, &, & E.-J. (2019).
*metaBMA: Bayesian model averaging for random and fixed effects
meta-analysis*. Retrieved from https://CRAN.R-project.org/package=metaBMA

Lüdecke, D., Ben-Shachar, M. S., Patil, I., & Makowski, D. (2020).
parameters: Extracting, computing and
exploring the parameters of statistical models using R.
*Journal of Open Source Software*, *5*(53), 2445. doi:10.21105/joss.02445

Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., &
Makowski, D. (2021). performance: An
R package for assessment, comparison and testing of
statistical models. *Journal of Open Source Software*,
*6*(60), 3139. doi:10.21105/joss.03139

Lüdecke, D., Waggoner, P., & Makowski, D. (2019). insight: A unified interface to access information
from model objects in R. *Journal of Open Source
Software*, *4*(38), 1412. doi:10.21105/joss.01412

Mair, P., & Wilcox, R. (2020). Robust
Statistical Methods in R Using the WRS2 Package. *Behavior
Research Methods*, *52*, 464–488. doi:10.3758/s13428-019-01246-w

Makowski, D., Ben-Shachar, M. S., & Lüdecke, D. (2019). bayestestR:
Describing effects and their uncertainty, existence and significance
within the bayesian framework. *Journal of Open Source Software*,
*4*(40), 1541. doi:10.21105/joss.01541

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2020).
Methods and algorithms for correlation analysis in r. *Journal of
Open Source Software*, *5*(51), 2306. doi:10.21105/joss.02306

Morey, R. D., & Rouder, J. N. (2020). *BayesFactor: Computation
of bayes factors for common designs*. Retrieved from https://richarddmorey.github.io/BayesFactor/

Patil, I. (2021). Visualizations with statistical details: The
’ggstatsplot’ approach. *PsyArxiv*. doi:10.31234/osf.io/p7mku

R Core Team. (2021). *R: A language and environment for statistical
computing*. Vienna, Austria: R Foundation for Statistical Computing.
Retrieved from https://www.R-project.org/

Robinson, D., Hayes, A., & Couch, S. (2021). *Broom: Convert
statistical objects into tidy tibbles*. Retrieved from https://CRAN.R-project.org/package=broom

Viechtbauer, W. (2010). Conducting meta-analyses in R with
the metafor package. *Journal of
Statistical Software*, *36*(3), 1–48. Retrieved from https://www.jstatsoft.org/v36/i03/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D.,
François, R., Grolemund, G., et al. (2019). Welcome to the tidyverse. *Journal of Open Source
Software*, *4*(43), 1686. doi:10.21105/joss.01686