This package provides a set of functions to build a scoring model from beginning to end, leading the user to follow an efficient and organized development process, reducing significantly the time spent on data exploration, variable selection, feature engineering, binning and model selection among other recurrent tasks. The package also incorporates monotonic and customized binning, scaling capabilities that transforms logistic coefficients into points for a better business understanding and calculates and visualizes classic performance metrics of a classification model.
This package provides tools for obtaining, processing, and visualizing spectral reflectance data for the user-defined land or water surface classes for visual exploring in which wavelength the classes differ. Input should be a shapefile with polygons of surface classes (it might be different habitat types, crops, vegetation, etc.). The Sentinel-2 L2A satellite mission optical bands pixel data are obtained through the Google Earth Engine service (<https://earthengine.google.com/>) and used as a source of spectral data.
Collection of phylogenetic tree statistics, collected throughout the literature. All functions have been written to maximize computation speed. The package includes umbrella functions to calculate all statistics, all balance associated statistics, or all branching time related statistics. Furthermore, the treestats package supports summary statistic calculations on Ltables, provides speed-improved coding of branching times, Ltable conversion and includes algorithms to create intermediately balanced trees. Full description can be found in Janzen (2024) <doi:10.1016/j.ympev.2024.108168>.
This package provides tools for analyzing the relationship between direct prices (based on labor values) and prices of production using Bayesian generalized linear models, panel data methods, partial least squares regression, canonical correlation analysis, and panel vector autoregression. Includes functions for model comparison, out-of-sample validation, and structural break detection. Here, methods use raw accounting data with explicit temporal structure, following Gomez Julian (2023) <doi:10.17605/OSF.IO/7J8KF> and standard econometric techniques for panel data analysis.
An implementation of the 1-Sample Wilcoxon Sign rank test for medians. It includes 2 functions, W_stat(), which computes the exact probabilities of the Wilcoxon Sign Rank Test Statistic, W. The second function, Wilcox.m.test() allows the user to conduct the 1-Sample Wilcoxon Sign Rank hypothesis test for medians, this also allows the user to conduct the hypothesis test for the normal approximation, based on the techniques of Bickel and Doksum (1973, ISBN:013850363X).
The AHP method (Analytic Hierarchy Process) is a multi-criteria decision-making method addressing choice and outranking problems. The method enables to perform the analysis of alternatives in each type of criterion and then provides a global performance of each alternative in the decision context. The main difference of this package is the possibility of evaluating the alternatives using quantitative data, by numerical representation, and qualitative data, using the Saaty scale, providing preference relation between variables by a pairwise evaluation.
Computation of the alpha-shape and alpha-convex hull of a given sample of points in the plane. The concepts of alpha-shape and alpha-convex hull generalize the definition of the convex hull of a finite set of points. The programming is based on the duality between the Voronoi diagram and Delaunay triangulation. The package also includes a function that returns the Delaunay mesh of a given sample of points and its dual Voronoi diagram in one single object.
The "Hit and Run" Markov Chain Monte Carlo method for sampling uniformly from convex shapes defined by linear constraints, and the "Shake and Bake" method for sampling from the boundary of such shapes. Includes specialized functions for sampling normalized weights with arbitrary linear constraints. Tervonen, T., van Valkenhoef, G., Basturk, N., and Postmus, D. (2012) <doi:10.1016/j.ejor.2012.08.026>. van Valkenhoef, G., Tervonen, T., and Postmus, D. (2014) <doi:10.1016/j.ejor.2014.06.036>.
An implementation for multivariate functional additive mixed models (multiFAMM), see Volkmann et al. (2021, <arXiv:2103.06606>). It builds on developed methods for univariate sparse functional regression models and multivariate functional principal component analysis. This package contains the function to run a multiFAMM and some convenience functions useful when working with large models. An additional package on GitHub contains more convenience functions to reproduce the analyses of the corresponding paper (<https://github.com/alexvolkmann/multifammPaper>).
The National Ecological Observatory Network (NEON) provides access to its numerous data products through its REST API, <https://data.neonscience.org/data-api/>. This package provides a high-level user interface for downloading and storing NEON data products. Unlike neonUtilities', this package will avoid repeated downloading, provides persistent storage, and improves performance. neonstore can also construct a local duckdb database of stacked tables, making it possible to work with tables that are far to big to fit into memory.
Extend the tidymodels ecosystem <https://www.tidymodels.org/> to enable the creation of predictive models with offset terms. Models with offsets are most useful when working with count data or when fitting an adjustment model on top of an existing model with a prior expectation. The former situation is common in insurance where data is often weighted by exposures. The latter is common in life insurance where industry mortality tables are often used as a starting point for setting assumptions.
Generates Plus Code of geometric objects or data frames that contain them, giving the possibility to specify the precision of the area. The main feature of the package comes from the open-source code developed by Google Inc. present in the repository <https://github.com/google/open-location-code/blob/main/java/src/main/java/com/google/openlocationcode/OpenLocationCode.java>. For details about Plus Code', visit <https://maps.google.com/pluscodes/> or <https://github.com/google/open-location-code>.
It estimates the parameters of spatio-temporal models with censored or missing data using the SAEM algorithm (Delyon et al., 1999). This algorithm is a stochastic approximation of the widely used EM algorithm and is particularly valuable for models in which the E-step lacks a closed-form expression. It also provides a function to compute the observed information matrix using the method developed by Louis (1982). To assess the performance of the fitted model, case-deletion diagnostics are provided.
This package contains statistical methods to analyze graphs, such as graph parameter estimation, model selection based on the Graph Information Criterion, statistical tests to discriminate two or more populations of graphs, correlation between graphs, and clustering of graphs. References: Takahashi et al. (2012) <doi:10.1371/journal.pone.0049949>, Fujita et al. (2017) <doi:10.3389/fnins.2017.00066>, Fujita et al. (2017) <doi:10.1016/j.csda.2016.11.016>, Fujita et al. (2019) <doi:10.1093/comnet/cnz028>.
This package implements SplitWise', a hybrid regression approach that transforms numeric variables into either single-split (0/1) dummy variables or retains them as continuous predictors. The transformation is followed by stepwise selection to identify the most relevant variables. The default iterative mode adaptively explores partial synergies among variables to enhance model performance, while an alternative univariate mode applies simpler transformations independently to each predictor. For details, see Kurbucz et al. (2025) <doi:10.48550/arXiv.2505.15423>.
This package provides a set of functions with a common framework for age-depth model management, stratigraphic visualization, and common statistical transformations. The focus of the package is stratigraphic visualization, for which ggplot2 components are provided to reproduce the scales, geometries, facets, and theme elements commonly used in publication-quality stratigraphic diagrams. Helpers are also provided to reproduce the exploratory statistical summaries that are frequently included on stratigraphic diagrams. See Dunnington et al. (2021) <doi:10.18637/jss.v101.i07>.
Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', Stan', rstanarm', brms', MCMCglmm', coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, ggplot2 geoms and stats are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.
Since the early 1970s eyewitness testimony researchers have recognised the importance of estimating properties such as lineup bias (is the lineup biased against the suspect, leading to a rate of choosing higher than one would expect by chance?), and lineup size (how many reasonable choices are in fact available to the witness? A lineup is supposed to consist of a suspect and a number of additional members, or foils, whom a poor-quality witness might mistake for the perpetrator). Lineup measures are descriptive, in the first instance, but since the earliest articles in the literature researchers have recognised the importance of reasoning inferentially about them. This package contains functions to compute various properties of laboratory or police lineups, and is intended for use by researchers in forensic psychology and/or eyewitness testimony research. Among others, the r4lineups package includes functions for calculating lineup proportion, functional size, various estimates of effective size, diagnosticity ratio, homogeneity of the diagnosticity ratio, ROC curves for confidence x accuracy data and the degree of similarity of faces in a lineup.
The commonly used methods for relative quantification of gene expression levels obtained in real-time PCR (Polymerase Chain Reaction) experiments are the delta Ct methods, encompassing 2^-dCt and 2^-ddCt methods, originally proposed by Kenneth J. Livak and Thomas D. Schmittgen (2001) <doi:10.1006/meth.2001.1262>. The main idea is to normalise gene expression values using endogenous control gene, present gene expression levels in linear form by using the 2^-(value)^ transformation, and calculate differences in gene expression levels between groups of samples (or technical replicates of a single sample). The RQdeltaCT package offers functions that cover both methods for comparison of either independent groups of samples or groups with paired samples, together with importing expression datasets, performing multi-step quality control of data, enabling numerous data visualisations, enrichment of the standard workflow with additional useful analyses (correlation analysis, Receiver Operating Characteristic analysis, logistic regression), and conveniently export obtained results in table and image formats. The package has been designed to be friendly to non-experts in R programming.
Capture Hi-C is a set of techniques that enable the detection of genomic interactions involving regions of interest, known as baits. By focusing on selected loci, these approaches reduce sequencing costs while maintaining high resolution at the level of restriction fragments. HiCaptuRe provides tools to import, annotate, manipulate, and export Capture Hi-C data. The package accounts for the specific structure of bait–otherEnd interactions, facilitates integration with other omics datasets, and enables comparison across samples and conditions.
pathlinkR is an R package designed to facilitate analysis of RNA-Seq results. Specifically, our aim with pathlinkR was to provide a number of tools which take a list of DE genes and perform different analyses on them, aiding with the interpretation of results. Functions are included to perform pathway enrichment, with muliplte databases supported, and tools for visualizing these results. Genes can also be used to create and plot protein-protein interaction networks, all from inside of R.
Image segmentation is the process of identifying the borders of individual objects (in this case cells) within an image. This allows for the features of cells such as marker expression and morphology to be extracted, stored and analysed. simpleSeg provides functionality for user friendly, watershed based segmentation on multiplexed cellular images in R based on the intensity of user specified protein marker channels. simpleSeg can also be used for the normalization of single cell data obtained from multiple images.
This package performs CACE (Complier Average Causal Effect analysis) on either a single study or meta-analysis of datasets with binary outcomes, using either complete or incomplete noncompliance information. Our package implements the Bayesian methods proposed in Zhou et al. (2019) <doi:10.1111/biom.13028>, which introduces a Bayesian hierarchical model for estimating CACE in meta-analysis of clinical trials with noncompliance, and Zhou et al. (2021) <doi:10.1080/01621459.2021.1900859>, with an application example on Epidural Analgesia.
This package provides a new robust principal component analysis algorithm is implemented that relies upon the Cauchy Distribution. The algorithm is suitable for high dimensional data even if the sample size is less than the number of variables. The methodology is described in this paper: Fayomi A., Pantazis Y., Tsagris M. and Wood A.T.A. (2024). "Cauchy robust principal component analysis with applications to high-dimensional data sets". Statistics and Computing, 34: 26. <doi:10.1007/s11222-023-10328-x>.