Generate privacy-preserving synthetic datasets that mirror structure, types, factor levels, and missingness; export bundles for LLM workflows (data plus JSON schema and guidance); and build fake data directly from SQL database tables without reading real rows. Methods are related to approaches in Nowok, Raab and Dibben (2016) <doi:10.32614/RJ-2016-019> and the foundation-model overview by Bommasani et al. (2021) <doi:10.48550/arXiv.2108.07258>.
This package provides a nature-inspired metaheuristic algorithm based on the echolocation behavior of microbats that uses frequency tuning to optimize problems in both continuous and discrete dimensions. This R package makes it easy to implement the standard bat algorithm on any user-supplied function. The algorithm was first developed by Xin-She Yang in 2010 (<DOI:10.1007/978-3-642-12538-6_6>, <DOI:10.1109/CINTI.2014.7028669>).
MTrackJ is an ImageJ plugin for motion tracking and analysis (see <https://imagescience.org/meijering/software/mtrackj/>). This package reads and writes MTrackJ Data Files ('.mdf', see <https://imagescience.org/meijering/software/mtrackj/format/>). It supports 2D data and read/writes cluster, point, and channel information. If desired, generates track identifiers that are unique over the clusters. See the project page for more information and examples.
Allows the user to convert PDF tables to formats more amenable to analysis ('.csv', .xml', or .xlsx') by wrapping the PDFTables API. In order to use the package, the user needs to sign up for an API account on the PDFTables website (<https://pdftables.com/pdf-to-excel-api>). The package works by taking a PDF file as input, uploading it to PDFTables, and returning a file with the extracted data.
Defines functions to describe regression models using only pre-computed summary statistics (i.e. means, variances, and covariances) in place of individual participant data. Possible models include linear models for linear combinations, products, and logical combinations of phenotypes. Implements methods presented in Wolf et al. (2021) <doi:10.3389/fgene.2021.745901> Wolf et al. (2020) <doi:10.1142/9789811215636_0063> and Gasdaska et al. (2019) <doi:10.1142/9789813279827_0036>.
Collaborative writing and editing of R Markdown (or Sweave) documents. The local .Rmd (or .Rnw) is uploaded as a plain-text file to Google Drive. By taking advantage of the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing process. After integrating all authorsâ contributions, the final document can be downloaded and rendered locally.
This package implements a variety of methods for batch correction of single-cell (RNA sequencing) data. This includes methods based on detecting mutually nearest neighbors, as well as several efficient variants of linear regression of the log-expression values. Functions are also provided to perform global rescaling to remove differences in depth between batches, and to perform a principal components analysis that is robust to differences in the numbers of cells across batches.
CENTIPEDE applies a hierarchical Bayesian mixture model to infer regions of the genome that are bound by particular transcription factors. It starts by identifying a set of candidate binding sites, and then aims to classify the sites according to whether each site is bound or not bound by a transcription factor. CENTIPEDE is an unsupervised learning algorithm that discriminates between two different types of motif instances using as much relevant information as possible.
The r-phylogram package is a tool for for developing phylogenetic trees as deeply-nested lists known as "dendrogram" objects. It provides functions for conversion between "dendrogram" and "phylo" class objects, as well as several tools for command-line tree manipulation and import/export via Newick parenthetic text. This improves accessibility to the comprehensive range of object-specific analytical and tree-visualization functions found across a wide array of bioinformatic R packages.
Genomic analysis of model organisms often requires the use of databases based on human data or making comparisons to patient-derived resources. This requires converting genes between human and non-human analogues. The babelgene R package provides predicted gene orthologs/homologs for frequently studied model organisms in an R-friendly tidy/long format. The package integrates orthology assertion predictions sourced from multiple databases as compiled by the HGNC Comparison of Orthology Predictions (HCOP).
This package provides tools to compute marginal effects from statistical models and return the result as tidy data frames. These data frames are ready to use with the ggplot2 package. Marginal effects can be calculated for many different models. Interaction terms, splines and polynomial terms are also supported. The two main functions are ggpredict() and ggeffect(). There is a generic plot() method to plot the results using ggplot2.
In putative Transcription Factor Binding Sites (TFBSs) identification from sequence/alignments, we are interested in the significance of certain match scores. TFMPvalue provides the accurate calculation of a p-value with a score threshold for position weight matrices, or the score with a given p-value. It is an interface to code originally made available by Helene Touzet and Jean-Stephane Varre, 2007, Algorithms Mol Biol:2, 15. Touzet and Varre (2007).
This package provides sleep duration estimates using a Pruned Dynamic Programming (PDP) algorithm that efficiently identifies change-points. PDP applied to physical activity data can identify transitions from wakefulness to sleep and vice versa. Baek, Jonggyu, Banker, Margaret, Jansen, Erica C., She, Xichen, Peterson, Karen E., Pitchford, E. Andrew, Song, Peter X. K. (2021) An Efficient Segmentation Algorithm to Estimate Sleep Duration from Actigraphy Data <doi:10.1007/s12561-021-09309-3>.
Perform parallel factor analysis (PARAFAC: Hitchcock, 1927) <doi:10.1002/sapm192761164> on fluorescence excitation-emission matrices: handle scattering signal and inner filter effect, scale the dataset, fit the model; perform split-half validation or jack-knifing. Modified approaches such as Whittaker interpolation, randomised split-half, and fluorescence and scattering model estimation are also available. The package has a low dependency footprint and has been tested on a wide range of R versions.
This package provides functions for analyzing and visualizing complex macroevolutionary dynamics on phylogenetic trees. It is a companion package to the command line program BAMM (Bayesian Analysis of Macroevolutionary Mixtures) and is entirely oriented towards the analysis, interpretation, and visualization of evolutionary rates. Functionality includes visualization of rate shifts on phylogenies, estimating evolutionary rates through time, comparing posterior distributions of evolutionary rates across clades, comparing diversification models using Bayes factors, and more.
Converts numbers to continued fractions and back again. A solver for Pell's Equation is provided. The method for calculating roots in continued fraction form is provided without published attribution in such places as Professor Emeritus Jonathan Lubin, <http://www.math.brown.edu/jlubin/> and his post to StackOverflow, <https://math.stackexchange.com/questions/2215918> , or Professor Ron Knott, e.g., <https://r-knott.surrey.ac.uk/Fibonacci/cfINTRO.html> .
This package provides functions for fitting Cox proportional hazards models for grouped time-to-event data, where the shared group-specific frailties have a discrete nonparametric distribution. The methods proposed in the package is described by Gasperoni, F., Ieva, F., Paganoni, A. M., Jackson, C. H., Sharples, L. (2018) <doi:10.1093/biostatistics/kxy071>. There are also functions for simulating from these models, with a nonparametric or a parametric baseline hazard function.
This package provides a meta-package that installs and loads a set of packages from easystats ecosystem in a single step. This collection of packages provide a unifying and consistent framework for statistical modeling, visualization, and reporting. Additionally, it provides articles targeted at instructors for teaching easystats', and a dashboard targeted at new R users for easily conducting statistical analysis by accessing summary results, model fit indices, and visualizations with minimal programming.
Fast scalable Gaussian process approximations, particularly well suited to spatial (aerial, remote-sensed) and environmental data, described in more detail in Katzfuss and Guinness (2017) <doi:10.48550/arXiv.1708.06302>. Package also contains a fast implementation of the incomplete Cholesky decomposition (IC0), based on Schaefer et al. (2019) <doi:10.48550/arXiv.1706.02205> and MaxMin ordering proposed in Guinness (2018) <doi:10.48550/arXiv.1609.05372>.
This package provides functions to calculate indices used to score immunoglobulin A (IgA) binding of bacteria in IgA sequencing (IgA-Seq) experiments. This includes the original Kau and Palm indices and more recent methods as described in Jackson et al. (2020) <doi:10.1101/2020.08.19.257501>. Additionally the package contains a function to simulate IgA-Seq data and an example experimental data set for method testing.
This package provides a graphical user interface with an integrated diagrammer for latent variable models from the lavaan package. It offers two core functions: first, lavaangui() launches a web application that allows users to specify models by drawing path diagrams, fitting them, assessing model fit, and more; second, plot_lavaan() creates interactive path diagrams from models specified in lavaan'. Karch (2024) <doi: 10.1080/10705511.2024.2420678> contains a tutorial.
This package provides sampling and density functions for matrix variate normal, t, and inverted t distributions; ML estimation for matrix variate normal and t distributions using the EM algorithm, including some restrictions on the parameters; and classification by linear and quadratic discriminant analysis for matrix variate normal and t distributions described in Thompson et al. (2019) <doi:10.1080/10618600.2019.1696208>. Performs clustering with matrix variate normal and t mixture models.
This package provides methods for quality control and exploratory analysis of surface water quality data collected in Massachusetts, USA. Functions are developed to facilitate data formatting for the Water Quality Exchange Network <https://www.epa.gov/waterdata/water-quality-data-upload-wqx> and reporting of data quality objectives to state agencies. Quality control methods are from Massachusetts Department of Environmental Protection (2020) <https://www.mass.gov/orgs/massachusetts-department-of-environmental-protection>.
Defines predict function that transforms output from a Tweedie Generalized Linear Mixed Model (using glmmTMB'), Generalized Additive Model (using mgcv'), or spatio-temporal Generalized Linear Mixed Model (using package tinyVAST'), and returns predicted proportions (and standard errors) across a grouping variable from an equivalent multivariate-logit Tweedie model. These predicted proportions can then be used for standard plotting and diagnostics. See Thorson et al. 2022 <doi:10.1002/ecy.3637>.