LegATo is a suite of open-source software tools for longitudinal microbiome analysis. It is extendable to several different study forms with optimal ease-of-use for researchers. Microbiome time-series data presents distinct challenges including complex covariate dependencies and variety of longitudinal study designs. This toolkit will allow researchers to determine which microbial taxa are affected over time by perturbations such as onset of disease or lifestyle choices, and to predict the effects of these perturbations over time, including changes in composition or stability of commensal bacteria.
The goal of the package aldvmm is to fit adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package aldvmm uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) <doi:10.1177/1536867X1501500307> using normal component distributions and a multinomial logit model of probabilities of component membership.
This package provides tools for bioinformatics modeling using recursive transformer-inspired architectures, autoencoders, random forests, XGBoost, and stacked ensemble models. Includes utilities for cross-validation, calibration, benchmarking, and threshold optimization in predictive modeling workflows. The methodology builds on ensemble learning (Breiman 2001 <doi:10.1023/A:1010933404324>), gradient boosting (Chen and Guestrin 2016 <doi:10.1145/2939672.2939785>), autoencoders (Hinton and Salakhutdinov 2006 <doi:10.1126/science.1127647>), and recursive transformer efficiency approaches such as Mixture-of-Recursions (Bae et al. 2025 <doi:10.48550/arXiv.2507.10524>).
This package provides a Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit.
This package provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) <doi:10.1007/s10519-021-10055-x>], calculation of relatedness coefficients using path-tracing methods [Wright (1922) <doi:10.1086/279872>; McArdle & McDonald (1984) <doi:10.1111/j.2044-8317.1984.tb00802.x>], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) <doi:10.1101/2024.12.19.629449>]. For a full overview, see [Garrison et al. (2024) <doi:10.21105/joss.06203>].
An interactive image editing tool that can be added as part of the HTML in Shiny, R markdown or any type of HTML document. Often times, plots, photos are embedded in the web application/file. drawer can take screenshots of these image-like elements, or any part of the HTML document and send to an image editing space called canvas to allow users immediately edit the screenshot(s) within the same document. Users can quickly combine, compare different screenshots, upload their own images and maybe make a scientific figure.
Implementation of some Deep Learning methods. Includes multilayer perceptron, different activation functions, regularisation strategies, stochastic gradient descent and dropout. Thanks go to the following references for helping to inspire and develop the package: Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach (2016, ISBN:978-0262035613) Deep Learning. Terrence J. Sejnowski (2018, ISBN:978-0262038034) The Deep Learning Revolution. Grant Sanderson (3brown1blue) <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi> Neural Networks YouTube playlist. Michael A. Nielsen <http://neuralnetworksanddeeplearning.com/> Neural Networks and Deep Learning.
An estimation method that can use computer simulations to approximate maximum-likelihood estimates even when the likelihood function can not be evaluated directly. It can be applied whenever it is feasible to conduct many simulations, but works best when the data is approximately Poisson distributed. It was originally designed for demographic inference in evolutionary biology (Naduvilezhath et al., 2011 <doi:10.1111/j.1365-294X.2011.05131.x>, Mathew et al., 2013 <doi:10.1002/ece3.722>). It has optional support for conducting coalescent simulation using the coala package.
Multiple moderation analysis for two-instance repeated measures designs, with up to three simultaneous moderators (dichotomous and/or continuous) with additive or multiplicative relationship. Includes analyses of simple slopes and conditional effects at (automatically determined or manually set) values of the moderator(s), as well as an implementation of the Johnson-Neyman procedure for determining regions of significance in single moderator models. Based on Montoya, A. K. (2018) "Moderation analysis in two-instance repeated measures designs: Probing methods and multiple moderator models" <doi:10.3758/s13428-018-1088-6> .
An R interface to pikchr (<https://pikchr.org>, pronounced â pictureâ ), a PIC'-like markup language for creating diagrams within technical documentation. Originally developed by Brian Kernighan, PIC has been adapted into pikchr by D. Richard Hipp, the creator of SQLite'. pikchr is designed to be embedded in fenced code blocks of Markdown or other documentation markup languages, making it ideal for generating diagrams in text-based formats. This package allows R users to seamlessly integrate the descriptive syntax of pikchr for diagram creation directly within the R environment.
This package provides a central decision in a parametric regression is how to specify the relation between an dependent variable and each explanatory variable. This package provides a semi-parametric tool for comparing different transformations of an explanatory variables in a parametric regression. The functions is relevant in a situation, where you would use a box-cox or Box-Tidwell transformations. In contrast to the classic power-transformations, the methods in this package allows for theoretical driven user input and the possibility to compare with a non-parametric transformation.
Smoothing signals and computing their derivatives is a common requirement in signal processing workflows. Savitzky-Golay filters are a established method able to do both (Savitzky and Golay, 1964 <doi:10.1021/ac60214a047>). This package implements one dimensional Savitzky-Golay filters that can be applied to vectors and matrices (either row-wise or column-wise). Vectorization and memory allocations have been profiled to reduce computational fingerprint. Short filter lengths are implemented in the direct space, while longer filters are implemented in frequency space, using a Fast Fourier Transform (FFT).
Computes the effective range of a smoothing matrix, which is a measure of the distance to which smoothing occurs. This is motivated by the application of spatial splines for adjusting for unmeasured spatial confounding in regression models, but the calculation of effective range can be applied to smoothing matrices in other contexts. For algorithmic details, see Rainey and Keller (2024) "spconfShiny: an R Shiny application..." <doi:10.1371/journal.pone.0311440> and Keller and Szpiro (2020) "Selecting a Scale for Spatial Confounding Adjustment" <doi:10.1111/rssa.12556>.
Visualize Variance is an intuitive shiny applications tailored for agricultural research data analysis, including one-way and two-way analysis of variance, correlation, and other essential statistical tools. Users can easily upload their datasets, perform analyses, and download the results as a well-formatted document, streamlining the process of data analysis and reporting in agricultural research.The experimental design methods are based on classical work by Fisher (1925) and Scheffe (1959). The correlation visualization approaches follow methods developed by Wei & Simko (2021) and Friendly (2002) <doi:10.1198/000313002533>.
Efficient Bayesian generalized linear models with time-varying coefficients as in Helske (2022, <doi:10.1016/j.softx.2022.101016>). Gaussian, Poisson, and binomial observations are supported. The Markov chain Monte Carlo (MCMC) computations are done using Hamiltonian Monte Carlo provided by Stan, using a state space representation of the model in order to marginalise over the coefficients for efficient sampling. For non-Gaussian models, the package uses the importance sampling type estimators based on approximate marginal MCMC as in Vihola, Helske, Franks (2020, <doi:10.1111/sjos.12492>).
Genetically modified organisms (GMOs) and cell lines are widely used models in all kinds of biological research. As part of characterising these models, DNA sequencing technology and bioinformatics analyses are used systematically to study their genomes. Therefore, large volumes of data are generated and various algorithms are applied to analyse this data, which introduces a challenge on representing all findings in an informative and concise manner. `gmoviz` provides users with an easy way to visualise and facilitate the explanation of complex genomic editing events on a larger, biologically-relevant scale.
VarCon is an R package which converts the positional information from the annotation of an single nucleotide variation (SNV) (either referring to the coding sequence or the reference genomic sequence). It retrieves the genomic reference sequence around the position of the single nucleotide variation. To asses, whether the SNV could potentially influence binding of splicing regulatory proteins VarCon calcualtes the HEXplorer score as an estimation. Besides, VarCon additionally reports splice site strengths of splice sites within the retrieved genomic sequence and any changes due to the SNV.
Experience studies are used by actuaries to explore historical experience across blocks of business and to inform assumption setting activities. This package provides functions for preparing data, creating studies, visualizing results, and beginning assumption development. Experience study methods, including exposure calculations, are described in: Atkinson & McGarry (2016) "Experience Study Calculations" <https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf>. The limited fluctuation credibility method used by the exp_stats() function is described in: Herzog (1999, ISBN:1-56698-374-6) "Introduction to Credibility Theory".
Count transformation models featuring parameters interpretable as discrete hazard ratios, odds ratios, reverse-time discrete hazard ratios, or transformed expectations. An appropriate data transformation for a count outcome and regression coefficients are simultaneously estimated by maximising the exact discrete log-likelihood using the computational framework provided in package mlt', technical details are given in Siegfried & Hothorn (2020) <DOI:10.1111/2041-210X.13383>. The package also contains an experimental implementation of multivariate count transformation models with an application to multi-species distribution models <DOI:10.48550/arXiv.2201.13095>.
Collection of ancillary functions and utilities for Partial Linear Single Index Models for Environmental mixture analyses, which currently provides functions for scalar outcomes. The outputs of these functions include the single index function, single index coefficients, partial linear coefficients, mixture overall effect, exposure main and interaction effects, and differences of quartile effects. In the future, we will add functions for binary, ordinal, Poisson, survival, and longitudinal outcomes, as well as models for time-dependent exposures. See Wang et al (2020) <doi:10.1186/s12940-020-00644-4> for an overview.
This package provides an implementation of two-dimensional functional principal component analysis (FPCA), Marginal FPCA, and Product FPCA for repeated functional data. Marginal and Product FPCA implementations are done for both dense and sparsely observed functional data. References: Chen, K., Delicado, P., & Müller, H. G. (2017) <doi:10.1111/rssb.12160>. Chen, K., & Müller, H. G. (2012) <doi:10.1080/01621459.2012.734196>. Hall, P., Müller, H.G. and Wang, J.L. (2006) <doi:10.1214/009053606000000272>. Yao, F., Müller, H. G., & Wang, J. L. (2005) <doi:10.1198/016214504000001745>.
Create network-style visualizations of pairwise relationships using custom edge glyphs built on top of ggplot2'. The package supports both statistical and non-statistical data and allows users to represent directed relationships. This enables clear, publication-ready graphics for exploring and communicating relational structures in a wide range of domains. The method was first used in Abu-Akel et al. (2021) <doi:10.1371/journal.pone.0245100>. Code is released under the MIT License; included datasets are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0).
Fast, model-agnostic implementation of different H-statistics introduced by Jerome H. Friedman and Bogdan E. Popescu (2008) <doi:10.1214/07-AOAS148>. These statistics quantify interaction strength per feature, feature pair, and feature triple. The package supports multi-output predictions and can account for case weights. In addition, several variants of the original statistics are provided. The shape of the interactions can be explored through partial dependence plots or individual conditional expectation plots. DALEX explainers, meta learners ('mlr3', tidymodels', caret') and most other models work out-of-the-box.
Rank-based tests for enrichment of KOG (euKaryotic Orthologous Groups) classes with up- or down-regulated genes based on a continuous measure. The meta-analysis is based on correlation of KOG delta-ranks across datasets (delta-rank is the difference between mean rank of genes belonging to a KOG class and mean rank of all other genes). With binary measure (1 or 0 to indicate significant and non-significant genes), one-tailed Fisher's exact test for over-representation of each KOG class among significant genes will be performed.