In a clinical trial with repeated measures designs, outcomes are often taken from subjects at fixed time-points. The focus of the trial may be to compare the mean outcome in two or more groups at some pre-specified time after enrollment. In the presence of missing data auxiliary assumptions are necessary to perform such comparisons. One commonly employed assumption is the missing at random assumption (MAR). The samon package allows the user to perform a (parameterized) sensitivity analysis of this assumption. In particular it can be used to examine the sensitivity of tests in the difference in outcomes to violations of the MAR assumption. The sensitivity analysis can be performed under two scenarios, a) where the data exhibit a monotone missing data pattern (see the samon() function), and, b) where in addition to a monotone missing data pattern the data exhibit intermittent missing values (see the samonIM() function).
This package implements data-driven identification methods for structural vector autoregressive (SVAR) models as described in Lange et al. (2021) <doi:10.18637/jss.v097.i05>. Based on an existing VAR model object (provided by e.g. VAR() from the vars package), the structural impact matrix is obtained via data-driven identification techniques (i.e. changes in volatility (Rigobon, R. (2003) <doi:10.1162/003465303772815727>), patterns of GARCH (Normadin, M., Phaneuf, L. (2004) <doi:10.1016/j.jmoneco.2003.11.002>), independent component analysis (Matteson, D. S, Tsay, R. S., (2013) <doi:10.1080/01621459.2016.1150851>), least dependent innovations (Herwartz, H., Ploedt, M., (2016) <doi:10.1016/j.jimonfin.2015.11.001>), smooth transition in variances (Luetkepohl, H., Netsunajev, A. (2017) <doi:10.1016/j.jedc.2017.09.001>) or non-Gaussian maximum likelihood (Lanne, M., Meitz, M., Saikkonen, P. (2017) <doi:10.1016/j.jeconom.2016.06.002>)).
Assume that a temporal process is composed of contiguous segments with differing slopes and replicated noise-corrupted time series measurements are observed. The unknown mean of the data generating process is modelled as a piecewise linear function of time with an unknown number of change-points. The package infers the joint posterior distribution of the number and position of change-points as well as the unknown mean parameters per time-series by MCMC sampling. A-priori, the proposed model uses an overfitting number of mean parameters but, conditionally on a set of change-points, only a subset of them influences the likelihood. An exponentially decreasing prior distribution on the number of change-points gives rise to a posterior distribution concentrating on sparse representations of the underlying sequence, but also available is the Poisson distribution. See Papastamoulis et al (2019) <doi:10.1515/ijb-2018-0052> for a detailed presentation of the method.
To overcome the memory limitations for fitting linear (LM) and Generalized Linear Models (GLMs) to large data sets, this package implements the Divide and Recombine (D&R) strategy. It basically divides the entire large data set into suitable subsets manageable in size and then fits model to each subset. Finally, results from each subset are aggregated to obtain the final estimate. This package also supports fitting GLMs to data sets that cannot fit into memory and provides methods for fitting GLMs under linear regression, binomial regression, Poisson regression, and multinomial logistic regression settings. Respective models are fitted using different D&R strategies as described by: Xi, Lin, and Chen (2009) <doi:10.1109/TKDE.2008.186>, Xi, Lin and Chen (2006) <doi:10.1109/TKDE.2006.196>, Zuo and Li (2018) <doi:10.4236/ojs.2018.81003>, Karim, M.R., Islam, M.A. (2019) <doi:10.1007/978-981-13-9776-9>.
Helps find meaningful patterns in complex genetic experiments. First gimap takes data from paired CRISPR (Clustered regularly interspaced short palindromic repeats) screens that has been pre-processed to counts table of paired gRNA (guide Ribonucleic Acid) reads. The input data will have cell counts for how well cells grow (or don't grow) when different genes or pairs of genes are disabled. The output of the gimap package is genetic interaction scores which are the distance between the observed CRISPR score and the expected CRISPR score. The expected CRISPR scores are what we expect for the CRISPR values to be for two unrelated genes. The further away an observed CRISPR score is from its expected score the more we suspect genetic interaction. The work in this package is based off of original research from the Alice Berger lab at Fred Hutchinson Cancer Center (2021) <doi:10.1016/j.celrep.2021.109597>.
Projects mean squared out-of-sample error for a linear regression based upon the methodology developed in Rohlfs (2022) <doi:10.48550/arXiv.2209.01493>. It consumes as inputs the lm object from an estimated OLS regression (based on the "training sample") and a data.frame of out-of-sample cases (the "test sample") that have non-missing values for the same predictors. The test sample may or may not include data on the outcome variable; if it does, that variable is not used. The aim of the exercise is to project what what mean squared out-of-sample error can be expected given the predictor values supplied in the test sample. Output consists of a list of three elements: the projected mean squared out-of-sample error, the projected out-of-sample R-squared, and a vector of out-of-sample "hat" or "leverage" values, as defined in the paper.
This package performs Bayesian nonparametric density estimation using Martingale posterior distributions including the Copula Resampling (CopRe) algorithm. Also included are a Gibbs sampler for the marginal Gibbs-type mixture model and an extension to include full uncertainty quantification via a predictive sequence resampling (SeqRe) algorithm. The CopRe and SeqRe samplers generate random nonparametric distributions as output, leading to complete nonparametric inference on posterior summaries. Routines for calculating arbitrary functionals from the sampled distributions are included as well as an important algorithm for finding the number and location of modes, which can then be used to estimate the clusters in the data using, for example, k-means. Implements work developed in Moya B., Walker S. G. (2022). <doi:10.48550/arxiv.2206.08418>, Fong, E., Holmes, C., Walker, S. G. (2021) <doi:10.48550/arxiv.2103.15671>, and Escobar M. D., West, M. (1995) <doi:10.1080/01621459.1995.10476550>.
Analysis of elliptical tubes with applications in biological modeling. The package is based on the references: Taheri, M., Pizer, S. M., & Schulz, J. (2024) "The Mean Shape under the Relative Curvature Condition." Journal of Computational and Graphical Statistics <doi:10.1080/10618600.2025.2535600> and arXiv <doi:10.48550/arXiv.2404.01043>. Mohsen Taheri Shalmani (2024) "Shape Statistics via Skeletal Structures", PhD Thesis, University of Stavanger, Norway <doi:10.13140/RG.2.2.34500.23685>. Key features include constructing discrete elliptical tubes, calculating transformations, validating structures under the Relative Curvature Condition (RCC), computing means, and generating simulations. Supports intrinsic and non-intrinsic mean calculations and transformations, size estimation, plotting, and random sample generation based on a reference tube. The intrinsic approach relies on the interior path of the original non-convex space, incorporating the RCC, while the non-intrinsic approach uses a basic robotic arm transformation that disregards the RCC.
Collection of tools to work with European basketball data. Functions available are related to friendly web scraping, data management and visualization. Data were obtained from <https://www.euroleaguebasketball.net/euroleague/>, <https://www.euroleaguebasketball.net/eurocup/> and <https://www.acb.com/>, following the instructions of their respectives robots.txt files, when available. Box score data are available for the three leagues. Play-by-play and spatial shooting data are also available for the Spanish league. Methods for analysis include a population pyramid, 2D plots, circular plots of players percentiles, plots of players monthly/yearly stats, team heatmaps, team shooting plots, team four factors plots, cross-tables with the results of regular season games, maps of nationalities, combinations of lineups, possessions-related variables, timeouts, performance by periods, personal fouls, offensive rebounds and different types of shooting charts. Please see Vinue (2020) <doi:10.1089/big.2018.0124> and Vinue (2024) <doi:10.1089/big.2023.0177>.
Patients Mental Health (MH) status, Substance Use (SU) status, and concurrent MH/SU status in the American/Canadian Healthcare Administrative Databases can be identified. The detection is based on given parameters of interest by clinicians including the list of plausible ICD MH/SU codes (3/4/5 characters), the required number of visits of hospital for MH/SU , the required number of visits of service physicians for MH/SU, and the maximum time span within MH visits, within SU visits, and, between MH and SU visits. Methods are described in: Khan S <https://pubmed.ncbi.nlm.nih.gov/29044442/>, Keen C, et al. (2021) <doi:10.1111/add.15580>, Lavergne MR, et al. (2022) <doi:10.1186/s12913-022-07759-z>, Casillas, S M, et al. (2022) <doi:10.1016/j.abrep.2022.100464>, CIHI (2022) <https://www.cihi.ca/en>, CDC (2024) <https://www.cdc.gov>, WHO (2019) <https://icd.who.int/en>.
This package provides generalized L-moments estimation methods for the generalized extreme value ('GEV') distribution. Implements both stationary GEV and non-stationary GEV11 models where location and scale parameters vary with time. Includes various penalty functions ('Martins'-'Stedinger', Park, Cannon, Coles'-Dixon) for shape parameter regularization. Also provides model averaging estimation ('ma.gev') that combines MLE and L-moment methods with multiple weighting schemes for robust high quantile estimation. The GLME methodology is described in Shin et al. (2025a) <doi:10.48550/arXiv.2512.20385>. The non-stationary L-moment method is based on Shin et al. (2025b) <doi:10.1007/s42952-025-00325-3>. The model averaging method is described in Shin et al. (2026) <doi:10.1007/s00477-025-03167-x>. See also Hosking (1990) <doi:10.1111/j.2517-6161.1990.tb01775.x> for L-moments theory and Martins and Stedinger (2000) <doi:10.1029/1999WR900330> for penalized likelihood methods.
Companion package of Carrion-i-Silvestre & Sansó (2026): "Testing for Constant Unconditional Variance in Heavy-Tailed Time Series". It implements the Modified Iterative Cumulative Sum of Squares Algorithm, which is an extension of the Iterative Cumulative Sum of Squares (ICSS) Algorithm of Inclan and Tiao (1994), and it checks for changes in the unconditional variance of a time series controlling for the tail index of the underlying distribution. The fourth order moment is estimated non-parametrically to avoid the size problems when the innovations are non-Gaussian (see, Sansó et al., 2004). Critical values and p-values are generated using a Generalized Extreme Value distribution approach. References Carrion-i-Silvestre J.J & Sansó A (2026) <doi:10.1080/03610918.2026.2615207>. Inclan C & Tiao G.C (1994) <doi:10.1080/01621459.1994.10476824>, Sansó A & Aragó V & Carrion-i-Silvestre J.L (2004) <https://dspace.uib.es/xmlui/bitstream/handle/11201/152078/524035.pdf>.
The routine gof_test() in this package runs the goodness-of-fit test using various test statistic for multivariate data. Models under the null hypothesis can either be simple or allow for parameter estimation. p values are found via the parametric bootstrap (simulation). The routine gof_test_adjusted_pvalues() runs several tests and then finds a p value adjusted for simultaneous inference. The routine gof_power() allows the estimation of the power of the tests. hybrid_test() and hybrid_power() do the same by first generating a Monte Carlo data set under the null hypothesis and then running a number of two-sample methods. The routine run.studies() allows a user to quickly study the power of a new method and how it compares to those included in the package via a large number of case studies. For details of the methods and references see the included vignettes.
Annotates single-cell and spatial-transcriptomic (ST) data using context-matching marker datasets. It creates a unified marker list (`Markers_list`) from multiple sources: built-in curated databases ('Cellmarker2', PanglaoDB', ScType', scIBD', TCellSI', PCTIT', PCTAM'), Seurat objects with cell labels, or user-provided Excel tables. SlimR first uses adaptive machine learning for parameter optimization, and then offers two automated annotation approaches: cluster-based and per-cell'. Cluster-based annotation assigns one label per cluster, expression-based probability calculation, and AUC validation. Per-cell annotation assigns labels to individual cells using three scoring methods with adaptive thresholds and ratio-based confidence filtering, plus optional UMAP spatial smoothing, making it ideal for heterogeneous clusters and rare cell types. The package also supports semi-automated workflows with heatmaps, feature plots, and combined visualizations for manual annotation. For more information, see the package documentation at <https://github.com/zhaoqing-wang/SlimR>.
This package creates geographic map tiles from geospatial map files or non-geographic map tiles from simple image files. This package provides a tile generator function for creating map tile sets for use with packages such as leaflet'. In addition to generating map tiles based on a common raster layer source, it also handles the non-geographic edge case, producing map tiles from arbitrary images. These map tiles, which have a non-geographic, simple coordinate reference system (CRS), can also be used with leaflet when applying the simple CRS option. Map tiles can be created from an input file with any of the following extensions: tif, grd and nc for spatial maps and png, jpg and bmp for basic images. This package requires Python and the gdal library for Python'. Windows users are recommended to install OSGeo4W (<https://trac.osgeo.org/osgeo4w/>) as an easy way to obtain the required gdal support for Python'.
The genetic algorithm can be used directly to find the similarity of users and more effectively to increase the efficiency of the collaborative filtering method. By identifying the nearest neighbors to the active user, before the genetic algorithm, and by identifying suitable starting points, an effective method for user-based collaborative filtering method has been developed. This package uses an optimization algorithm (continuous genetic algorithm) to directly find the optimal similarities between active users (users for whom current recommendations are made) and others. First, by determining the nearest neighbor and their number, the number of genes in a chromosome is determined. Each gene represents the neighbor's similarity to the active user. By estimating the starting points of the genetic algorithm, it quickly converges to the optimal solutions. The positive point is the independence of the genetic algorithm on the number of data that for big data is an effective help in solving the problem.
This package provides a simulation-based tool made to help researchers to become familiar with multilevel variations, and to build up sampling designs for their study. This tool has two main objectives: First, it provides an educational tool useful for students, teachers and researchers who want to learn to use mixed-effects models. Users can experience how the mixed-effects model framework can be used to understand distinct biological phenomena by interactively exploring simulated multilevel data. Second, it offers research opportunities to those who are already familiar with mixed-effects models, as it enables the generation of data sets that users may download and use for a range of simulation-based statistical analyses such as power and sensitivity analysis of multilevel and multivariate data [Allegue, H., Araya-Ajoy, Y.G., Dingemanse, N.J., Dochtermann N.A., Garamszegi, L.Z., Nakagawa, S., Reale, D., Schielzeth, H. and Westneat, D.F. (2016) <doi: 10.1111/2041-210X.12659>].
Fit multiclass Classification version of Bayesian Adaptive Smoothing Splines (CBASS) to data using reversible jump MCMC. The multiclass classification problem consists of a response variable that takes on unordered categorical values with at least three levels, and a set of inputs for each response variable. The CBASS model consists of a latent multivariate probit formulation, and the means of the latent Gaussian random variables are specified using adaptive regression splines. The MCMC alternates updates of the latent Gaussian variables and the spline parameters. All the spline parameters (variables, signs, knots, number of interactions), including the number of basis functions used to model each latent mean, are inferred. Functions are provided to process inputs, initialize the chain, run the chain, and make predictions. Predictions are made on a probabilistic basis, where, for a given input, the probabilities of each categorical value are produced. See Marrs and Francom (2023) "Multiclass classification using Bayesian multivariate adaptive regression splines" Under review.
Joint and Individual Variation Explained (JIVE) is a method for decomposing multiple datasets obtained on the same subjects into shared structure, structure unique to each dataset, and noise. The two most common implementations are R.JIVE, an iterative approach, and AJIVE, which uses principal angle analysis. JIVE estimates subspaces but interpreting these subspaces can be challenging with AJIVE or R.JIVE. We expand upon insights into AJIVE as a canonical correlation analysis (CCA) of principal component scores. This reformulation, which we call CJIVE, 1) provides an ordering of joint components by the degree of correlation between corresponding canonical variables; 2) uses a computationally efficient permutation test for the number of joint components, which provides a p-value for each component; and 3) can be used to predict subject scores for out-of-sample observations. Please cite the following article when utilizing this package: Murden, R., Zhang, Z., Guo, Y., & Risk, B. (2022) <doi:10.3389/fnins.2022.969510>.
An R interface to FLINT <https://flintlib.org/>, a C library for number theory. FLINT extends GNU MPFR <https://www.mpfr.org/> and GNU MP <https://gmplib.org/> with support for operations on standard rings (the integers, the integers modulo n, finite fields, the rational, p-adic, real, and complex numbers) as well as matrices and polynomials over rings. FLINT implements midpoint-radius interval arithmetic, also known as ball arithmetic, in the real and complex numbers, enabling computation in arbitrary precision with rigorous propagation of rounding and other errors; see Johansson (2017) <doi:10.1109/TC.2017.2690633>. Finally, FLINT provides ball arithmetic implementations of many special mathematical functions, with high coverage of reference works such as the NIST Digital Library of Mathematical Functions <https://dlmf.nist.gov/>. The R interface defines S4 classes, generic functions, and methods for representation and basic operations as well as plain R functions mirroring and vectorizing entry points in the C library.
Toolbox for different kinds of spatio-temporal analyses to be performed on observed point patterns, following the growing stream of literature on point process theory. This R package implements functions to perform different kinds of analyses on point processes, proposed in the papers (Siino, Adelfio, and Mateu 2018<doi:10.1007/s00477-018-1579-0>; Siino et al. 2018<doi:10.1002/env.2463>; Adelfio et al. 2020<doi:10.1007/s00477-019-01748-1>; Dâ Angelo, Adelfio, and Mateu 2021<doi:10.1016/j.spasta.2021.100534>; Dâ Angelo, Adelfio, and Mateu 2022<doi:10.1007/s00362-022-01338-4>; Dâ Angelo, Adelfio, and Mateu 2023<doi:10.1016/j.csda.2022.107679>). The main topics include modeling, statistical inference, and simulation issues on spatio-temporal point processes on Euclidean space and linear networks. Version 1.0.0 has been updated for accompanying the journal publication D Angelo and Adelfio 2025 <doi:10.18637/jss.v113.i10>.
Due to a limited availability of observed high-resolution precipitation records with adequate length, simulations with stochastic precipitation models are used to generate series for subsequent studies [e.g. Khaliq and Cunmae, 1996, <doi:10.1016/0022-1694(95)02894-3>, Vandenberghe et al., 2011, <doi:10.1029/2009WR008388>]. This package contains an R implementation of the original Bartlett-Lewis rectangular pulse model (BLRPM), developed by Rodriguez-Iturbe et al. (1987) <doi:10.1098/rspa.1987.0039>. It contains a function for simulating a precipitation time series based on storms and cells generated by the model with given or estimated model parameters. Additionally BLRPM parameters can be estimated from a given or simulated precipitation time series. The model simulations can be plotted in a three-layer plot including an overview of generated storms and cells by the model (which can also be plotted individually), a continuous step-function and a discrete precipitation time series at a chosen aggregation level.
This package contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models. The package is primarily related to the publications Komárek (2009, Comp. Stat. and Data Anal.) <doi:10.1016/j.csda.2009.05.006> and Komárek and Komárková (2014, J. of Stat. Soft.) <doi:10.18637/jss.v059.i12>. It also implements methods published in Komárek and Komárková (2013, Ann. of Appl. Stat.) <doi:10.1214/12-AOAS580>, Hughes, Komárek, Bonnett, Czanner, Garcà a-Fiñana (2017, Stat. in Med.) <doi:10.1002/sim.7397>, Jaspers, Komárek, Aerts (2018, Biom. J.) <doi:10.1002/bimj.201600253> and Hughes, Komárek, Czanner, Garcà a-Fiñana (2018, Stat. Meth. in Med. Res) <doi:10.1177/0962280216674496>.
The debar sequence processing pipeline is designed for denoising high throughput sequencing data for the animal DNA barcode marker cytochrome c oxidase I (COI). The package is designed to detect and correct insertion and deletion errors within sequencer outputs. This is accomplished through comparison of input sequences against a profile hidden Markov model (PHMM) using the Viterbi algorithm (for algorithm details see Durbin et al. 1998, ISBN: 9780521629713). Inserted base pairs are removed and deleted base pairs are accounted for through the introduction of a placeholder character. Since the PHMM is a probabilistic representation of the COI barcode, corrections are not always perfect. For this reason debar censors base pairs adjacent to reported indel sites, turning them into placeholder characters (default is 7 base pairs in either direction, this feature can be disabled). Testing has shown that this censorship results in the correct sequence length being restored, and erroneous base pairs being masked the vast majority of the time (>95%).