This package provides a complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS) as described in Riley et al. (2016) <doi:10.2105/AJPH.2015.302962>. Imports raw survey data ('CSV', Excel', Stata', SPSS'), applies WHO-standard cleaning and recoding, sets up complex survey designs, computes all standard NCD indicators (tobacco, alcohol, diet, physical activity, anthropometry, blood pressure, biochemical), and generates publication-ready tables, visualisations, and Word'/'HTML reports (fact sheet, data book, country report).
Download, navigate and analyse the Student-Life dataset. The Student-Life dataset contains passive and automatic sensing data from the phones of a class of 48 Dartmouth college students. It was collected over a 10 week term. Additionally, the dataset contains ecological momentary assessment results along with pre-study and post-study mental health surveys. The intended use is to assess mental health, academic performance and behavioral trends. The raw dataset and additional information is available at <https://studentlife.cs.dartmouth.edu/>.
This package provides methods for inference using stacked multiple imputations augmented with weights. The vignette provides example R code for implementation in general multiple imputation settings. For additional details about the estimation algorithm, we refer the reader to Beesley, Lauren J and Taylor, Jeremy M G (2020) â A stacked approach for chained equations multiple imputation incorporating the substantive modelâ <doi:10.1111/biom.13372>, and Beesley, Lauren J and Taylor, Jeremy M G (2021) â Accounting for not-at-random missingness through imputation stackingâ <arXiv:2101.07954>.
This package provides a pilot matching design to automatically stratify and match large datasets. The manual_stratify() function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify() function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match() function then does optimal matching of the data set in parallel within strata.
This package provides tools for testing, monitoring and dating structural changes in (linear) regression models. It features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.
This package provides a lightweight tool that provides a reproducible workflow for selecting and executing appropriate statistical analysis in one-way or two-way experimental designs. The package automatically checks for data normality, conducts parametric (ANOVA) or non-parametric (Kruskal-Wallis) tests, performs post-hoc comparisons with Compact Letter Displays (CLD), and generates publication-ready boxplots, faceted plots, and heatmaps. It is designed for researchers seeking fast, automated statistical summaries and visualization. Based on established statistical methods including Shapiro and Wilk (1965) <doi:10.2307/2333709>, Kruskal and Wallis (1952) <doi:10.1080/01621459.1952.10483441>, Tukey (1949) <doi:10.2307/3001913>, Fisher (1925) <ISBN:0050021702>, and Wickham (2016) <ISBN:978-3-319-24277-4>.
Statistical performance measures used in the econometric literature to evaluate conditional covariance/correlation matrix estimates (MSE, MAE, Euclidean distance, Frobenius distance, Stein distance, asymmetric loss function, eigenvalue loss function and the loss function defined in Eq. (4.6) of Engle et al. (2016) <doi:10.2139/ssrn.2814555>). Additionally, compute Eq. (3.1) and (4.2) of Li et al. (2016) <doi:10.1080/07350015.2015.1092975> to compare the factor loading matrix. The statistical performance measures implemented have been previously used in, for instance, Laurent et al. (2012) <doi:10.1002/jae.1248>, Amendola et al. (2015) <doi:10.1002/for.2322> and Becker et al. (2015) <doi:10.1016/j.ijforecast.2013.11.007>.
This package provides an efficient method to recover the missing block of an approximately low-rank matrix. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design [Cai T, Cai TT, Zhang A (2016) <doi:10.1080/01621459.2015.1021005>]. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. The main function in our package, smc.FUN(), is for recovery of the missing block A22 of an approximately low-rank matrix A given the other blocks A11, A12, A21.
This package creates and fits staged event tree probability models, which are probabilistic graphical models capable of representing asymmetric conditional independence statements for categorical variables. Includes functions to create, plot and fit staged event trees from data, as well as many efficient structure learning algorithms. References: Carli F, Leonelli M, Riccomagno E, Varando G (2022). <doi: 10.18637/jss.v102.i06>. Collazo R. A., Görgen C. and Smith J. Q. (2018, ISBN:9781498729604). Görgen C., Bigatti A., Riccomagno E. and Smith J. Q. (2018) <arXiv:1705.09457>. Thwaites P. A., Smith, J. Q. (2017) <arXiv:1510.00186>. Barclay L. M., Hutton J. L. and Smith J. Q. (2013) <doi:10.1016/j.ijar.2013.05.006>. Smith J. Q. and Anderson P. E. (2008) <doi:10.1016/j.artint.2007.05.004>.
This package provides functions to calculate step- and cadence-based metrics from timestamped accelerometer and wearable device data. Supports CSV and AGD files from ActiGraph devices, CSV files from Fitbit devices, and step counts derived with R package GGIR <https://github.com/wadpac/GGIR>, with automatic handling of epoch lengths from 1 to 60 seconds. Metrics include total steps, cadence peaks, minutes and steps in predefined cadence bands, and time and steps in moderate-to-vigorous physical activity (MVPA). Methods and thresholds are informed by the literature, e.g., Tudor-Locke and Rowe (2012) <doi:10.2165/11599170-000000000-00000>, Barreira et al. (2012) <doi:10.1249/MSS.0b013e318254f2a3>, and Tudor-Locke et al. (2018) <doi:10.1136/bjsports-2017-097628>. The package record is also available on Zenodo (2023) <doi:10.5281/zenodo.7858094>.
The C++ header files of the Stan project are provided by this package. There is a shared object containing part of the CVODES library, but it is not accessible from R. r-stanheaders is only useful for developers who want to utilize the LinkingTo directive of their package's DESCRIPTION file to build on the Stan library without incurring unnecessary dependencies.
The Stan project develops a probabilistic programming language that implements full or approximate Bayesian statistical inference via Markov Chain Monte Carlo or variational methods and implements (optionally penalized) maximum likelihood estimation via optimization. The Stan library includes an advanced automatic differentiation scheme, templated statistical and linear algebra functions that can handle the automatically differentiable scalar types (and doubles, ints, etc.), and a parser for the Stan language. The r-rstan package provides user-facing R functions to parse, compile, test, estimate, and analyze Stan models.
stJoincount facilitates the application of join count analysis to spatial transcriptomic data generated from the 10x Genomics Visium platform. This tool first converts a labeled spatial tissue map into a raster object, in which each spatial feature is represented by a pixel coded by label assignment. This process includes automatic calculation of optimal raster resolution and extent for the sample. A neighbors list is then created from the rasterized sample, in which adjacent and diagonal neighbors for each pixel are identified. After adding binary spatial weights to the neighbors list, a multi-categorical join count analysis is performed to tabulate "joins" between all possible combinations of label pairs. The function returns the observed join counts, the expected count under conditions of spatial randomness, and the variance calculated under non-free sampling. The z-score is then calculated as the difference between observed and expected counts, divided by the square root of the variance.
This package provides a small collection of data on graduate statistics programs from the United States.
This package provides functions for creating, displaying, and evaluating stopping rules for safety monitoring in clinical studies.
An interface to explore trends in Twitter data using the Storywrangler Application Programming Interface (API), which can be found here: <https://github.com/janeadams/storywrangler>.
Explore and analyse the genealogy of textual or musical traditions, from their variants, with various stemmatological methods, mainly the disagreement-based algorithms suggested by Camps and Cafiero (2015) <doi:10.1484/M.LECTIO-EB.5.102565>.
This package provides drop-in replacements for functions from the stringr package, with the same user interface. These functions have no external dependencies and can be copied directly into your package code using the staticimports package.
Collection of stepwise procedures to conduct multiple hypotheses testing. The details of the stepwise algorithm can be found in Romano and Wolf (2007) <DOI:10.1214/009053606000001622> and Hsu, Kuan, and Yen (2014) <DOI:10.1093/jjfinec/nbu014>.
Fast multi-trait and multi-trail Genome Wide Association Studies (GWAS) following the method described in Zhou and Stephens. (2014), <doi:10.1038/nmeth.2848>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.
This package provides tools for Genotype by Environment Interaction (GEI) analysis, using statistical models and visualizations to assess genotype performance across environments. It helps researchers explore interaction effects, stability, and adaptability in multi-environment trials, identifying the best-performing genotypes in different conditions. Which Win Where!
This package provides a comprehensive logging framework for R applications that provides hierarchical logging levels, database integration, and contextual logging capabilities. The package supports SQLite storage for persistent logs, provides colour-coded console output for better readability, includes parallel processing support, and implements structured error reporting with JSON formatting.
Provide various functions and tools to help fit models for estimating treatment effects in stepped wedge cluster randomized trials. Implements methods described in Kenny, Voldal, Xia, and Heagerty (2022) "Analysis of stepped wedge cluster randomized trials in the presence of a time-varying treatment effect", <doi:10.1002/sim.9511>.
This package provides functions for stratified sampling and assigning custom labels to data, ensuring randomness within groups. The package supports various sampling methods such as stratified, cluster, and systematic sampling. It allows users to apply transformations and customize the sampling process. This package can be useful for statistical analysis and data preparation tasks.