An R interface for the Java Machine Learning for Language Toolkit (mallet) <http://mallet.cs.umass.edu/> to estimate probabilistic topic models, such as Latent Dirichlet Allocation. We can use the R package to read textual data into mallet from R objects, run the Java implementation of mallet directly in R, and extract results as R objects. The Mallet toolkit has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava
package to connect to a JVM.
Support JSON flattening in a long data frame way, where the nesting keys will be stored in the absolute path. It also provides an easy way to summarize the basic description of a JSON list. The idea of mojson is to transform a JSON object in an absolute serialization way, which means the early key-value pairs will appear in the heading rows of the resultant data frame. mojson also provides an alternative way of comparing two different JSON lists, returning the left/inner/right-join style results.
Unit testing for Monte Carlo methods, particularly Markov Chain Monte Carlo (MCMC) methods, are implemented as extensions of the testthat package. The MCMC methods check whether the MCMC chain has the correct invariant distribution. They do not check other properties of successful samplers such as whether the chain can reach all points, i.e. whether is recurrent. The tests require the ability to sample from the prior and to run steps of the MCMC chain. The methodology is described in Gandy and Scott (2020) <arXiv:2001.06465>
.
Power calculations are a critical component of any research study to determine the minimum sample size necessary to detect differences between multiple groups. Here we present an R package, PASSED', that performs power and sample size calculations for the test of two-sample means or ratios with data following beta, gamma (Chang et al. (2011), <doi:10.1007/s00180-010-0209-1>), normal, Poisson (Gu et al. (2008), <doi:10.1002/bimj.200710403>), binomial, geometric, and negative binomial (Zhu and Lakkis (2014), <doi:10.1002/sim.5947>) distributions.
Assessment of the prevalence of plastic debris in bird nests based on bootstrap replicates. The package allows for calculating bootstrapped 95% confidence intervals for the estimated prevalence of debris. Combined with a Bayesian approach, the resampling simulations can be also used to define appropriate sample sizes to detect prevalence of plastics. The method has wide application, and can also be applied to estimate confidence intervals and define sample sizes for the prevalence of plastics ingested by any other organisms. The method is described in Tavares et al. (Submitted).
Estimate Bayesian nested mixture models via Markov Chain Monte Carlo methods. Specifically, the package implements the common atoms model (Denti et al., 2023), its finite version (D'Angelo et al., 2023), and a hybrid finite-infinite model. All models use Gaussian mixtures with a normal-inverse-gamma prior distribution on the parameters. Additional functions are provided to help analyzing the results of the fitting procedure. References: Denti, Camerlenghi, Guindani, Mira (2023) <doi:10.1080/01621459.2021.1933499>, Dâ Angelo, Canale, Yu, Guindani (2023) <doi:10.1111/biom.13626>.
This package provides a collection of functions for preparing data and fitting Bayesian count spatial regression models, with a specific focus on the Gamma-Count (GC) model. The GC model is well-suited for modeling dispersed count data, including under-dispersed or over-dispersed counts, or counts with equivalent dispersion, using Integrated Nested Laplace Approximations (INLA). The package includes functions for generating data from the GC model, as well as spatially correlated versions of the model. See Nadifar, Baghishani, Fallah (2023) <doi:10.1007/s13253-023-00550-5>.
The framework proposed in Jenul et al., (2022) <doi:10.1007/s10994-022-06221-9>, together with an interactive Shiny dashboard. UBayFS
is an ensemble feature selection technique embedded in a Bayesian statistical framework. The method combines data and user knowledge, where the first is extracted via data-driven ensemble feature selection. The user can control the feature selection by assigning prior weights to features and penalizing specific feature combinations. UBayFS
can be used for common feature selection as well as block feature selection.
The goal of the package aldvmm is to fit adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package aldvmm uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) <doi:10.1177/1536867X1501500307> using normal component distributions and a multinomial logit model of probabilities of component membership.
This package provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) <doi:10.1007/s10519-021-10055-x>], calculation of relatedness coefficients using path-tracing methods [Wright (1922) <doi:10.1086/279872>; McArdle
& McDonald
(1984) <doi:10.1111/j.2044-8317.1984.tb00802.x>], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) <doi:10.1101/2024.12.19.629449>]. For a full overview, see [Garrison et al. (2024) <doi:10.21105/joss.06203>].
This package provides a Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit.
CliFlo
is a web portal to the New Zealand National Climate Database and provides public access (via subscription) to around 6,500 various climate stations (see <https://cliflo.niwa.co.nz/> for more information). Collating and manipulating data from CliFlo
(hence clifro) and importing into R for further analysis, exploration and visualisation is now straightforward and coherent. The user is required to have an internet connection, and a current CliFlo
subscription (free) if data from stations, other than the public Reefton electronic weather station, is sought.
Implementation of some Deep Learning methods. Includes multilayer perceptron, different activation functions, regularisation strategies, stochastic gradient descent and dropout. Thanks go to the following references for helping to inspire and develop the package: Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach (2016, ISBN:978-0262035613) Deep Learning. Terrence J. Sejnowski (2018, ISBN:978-0262038034) The Deep Learning Revolution. Grant Sanderson (3brown1blue) <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>
Neural Networks YouTube
playlist. Michael A. Nielsen <http://neuralnetworksanddeeplearning.com/> Neural Networks and Deep Learning.
An interactive image editing tool that can be added as part of the HTML in Shiny, R markdown or any type of HTML document. Often times, plots, photos are embedded in the web application/file. drawer can take screenshots of these image-like elements, or any part of the HTML document and send to an image editing space called canvas to allow users immediately edit the screenshot(s) within the same document. Users can quickly combine, compare different screenshots, upload their own images and maybe make a scientific figure.
An estimation method that can use computer simulations to approximate maximum-likelihood estimates even when the likelihood function can not be evaluated directly. It can be applied whenever it is feasible to conduct many simulations, but works best when the data is approximately Poisson distributed. It was originally designed for demographic inference in evolutionary biology (Naduvilezhath et al., 2011 <doi:10.1111/j.1365-294X.2011.05131.x>, Mathew et al., 2013 <doi:10.1002/ece3.722>). It has optional support for conducting coalescent simulation using the coala package.
Multiple moderation analysis for two-instance repeated measures designs, with up to three simultaneous moderators (dichotomous and/or continuous) with additive or multiplicative relationship. Includes analyses of simple slopes and conditional effects at (automatically determined or manually set) values of the moderator(s), as well as an implementation of the Johnson-Neyman procedure for determining regions of significance in single moderator models. Based on Montoya, A. K. (2018) "Moderation analysis in two-instance repeated measures designs: Probing methods and multiple moderator models" <doi:10.3758/s13428-018-1088-6> .
This package provides a central decision in a parametric regression is how to specify the relation between an dependent variable and each explanatory variable. This package provides a semi-parametric tool for comparing different transformations of an explanatory variables in a parametric regression. The functions is relevant in a situation, where you would use a box-cox or Box-Tidwell transformations. In contrast to the classic power-transformations, the methods in this package allows for theoretical driven user input and the possibility to compare with a non-parametric transformation.
An R interface to pikchr (<https://pikchr.org>, pronounced â pictureâ ), a PIC'-like markup language for creating diagrams within technical documentation. Originally developed by Brian Kernighan, PIC has been adapted into pikchr by D. Richard Hipp, the creator of SQLite'. pikchr is designed to be embedded in fenced code blocks of Markdown or other documentation markup languages, making it ideal for generating diagrams in text-based formats. This package allows R users to seamlessly integrate the descriptive syntax of pikchr for diagram creation directly within the R environment.
Smoothing signals and computing their derivatives is a common requirement in signal processing workflows. Savitzky-Golay filters are a established method able to do both (Savitzky and Golay, 1964 <doi:10.1021/ac60214a047>). This package implements one dimensional Savitzky-Golay filters that can be applied to vectors and matrices (either row-wise or column-wise). Vectorization and memory allocations have been profiled to reduce computational fingerprint. Short filter lengths are implemented in the direct space, while longer filters are implemented in frequency space, using a Fast Fourier Transform (FFT).
Computes the effective range of a smoothing matrix, which is a measure of the distance to which smoothing occurs. This is motivated by the application of spatial splines for adjusting for unmeasured spatial confounding in regression models, but the calculation of effective range can be applied to smoothing matrices in other contexts. For algorithmic details, see Rainey and Keller (2024) "spconfShiny
: an R Shiny application..." <doi:10.1371/journal.pone.0311440> and Keller and Szpiro (2020) "Selecting a Scale for Spatial Confounding Adjustment" <doi:10.1111/rssa.12556>.
Visualize Variance is an intuitive shiny applications tailored for agricultural research data analysis, including one-way and two-way analysis of variance, correlation, and other essential statistical tools. Users can easily upload their datasets, perform analyses, and download the results as a well-formatted document, streamlining the process of data analysis and reporting in agricultural research.The experimental design methods are based on classical work by Fisher (1925) and Scheffe (1959). The correlation visualization approaches follow methods developed by Wei & Simko (2021) and Friendly (2002) <doi:10.1198/000313002533>.
Efficient Bayesian generalized linear models with time-varying coefficients as in Helske (2022, <doi:10.1016/j.softx.2022.101016>). Gaussian, Poisson, and binomial observations are supported. The Markov chain Monte Carlo (MCMC) computations are done using Hamiltonian Monte Carlo provided by Stan, using a state space representation of the model in order to marginalise over the coefficients for efficient sampling. For non-Gaussian models, the package uses the importance sampling type estimators based on approximate marginal MCMC as in Vihola, Helske, Franks (2020, <doi:10.1111/sjos.12492>).
Analysis of Ct values from high throughput quantitative real-time PCR (qPCR) assays across multiple conditions or replicates. The input data can be from spatially-defined formats such ABI TaqMan Low Density Arrays or OpenArray; LightCycler from Roche Applied Science; the CFX plates from Bio-Rad Laboratories; conventional 96- or 384-well plates; or microfluidic devices such as the Dynamic Arrays from Fluidigm Corporation. HTqPCR handles data loading, quality assessment, normalization, visualization and parametric or non-parametric testing for statistical significance in Ct values between features (e.g. genes, microRNAs).
Performance analysis workflow that combines the power of the R language (and the tidyverse realm) and many auxiliary tools to provide a consistent, flexible, extensible, fast, and versatile framework for the performance analysis of task-based applications that run on top of the StarPU runtime (with its MPI (Message Passing Interface) layer for multi-node support). Its goal is to provide a fruitful prototypical environment to conduct performance analysis hypothesis-checking for task-based applications that run on heterogeneous (multi-GPU, multi-core) multi-node HPC (High-performance computing) platforms.