This package provides a unified framework for generating, submitting, and analyzing pairwise comparisons of writing quality using large language models (LLMs). The package supports live and/or batch evaluation workflows across multiple providers ('OpenAI', Anthropic', Google Gemini', Together AI', and locally-hosted Ollama models), includes bias-tested prompt templates and a flexible template registry, and offers tools for constructing forward and reversed comparison sets to analyze consistency and positional bias. Results can be modeled using Bradleyâ Terry (1952) <doi:10.2307/2334029> or Elo rating methods to derive writing quality scores. For information on the method of pairwise comparisons, see Thurstone (1927) <doi:10.1037/h0070288> and Heldsinger & Humphry (2010) <doi:10.1007/BF03216919>. For information on Elo ratings, see Clark et al. (2018) <doi:10.1371/journal.pone.0190393>.
Pharmacokinetics is the study of drug absorption, distribution, metabolism, and excretion. The pharmacokinetics model explains that how the drug concentration change as the drug moves through the different compartments of the body. For pharmacokinetic modeling and analysis, it is essential to understand the basic pharmacokinetic parameters. All parameters are considered, but only some of parameters are used in the model. Therefore, we need to convert the estimated parameters to the other parameters after fitting the specific pharmacokinetic model. This package is developed to help this converting work. For more detailed explanation of pharmacokinetic parameters, see "Gabrielsson and Weiner" (2007), "ISBN-10: 9197651001"; "Benet and Zia-Amirhosseini" (1995) <DOI: 10.1177/019262339502300203>; "Mould and Upton" (2012) <DOI: 10.1038/psp.2012.4>; "Mould and Upton" (2013) <DOI: 10.1038/psp.2013.14>.
This package provides tools for fitting, predicting, and visualizing nonlinear relationships in single-level, multilevel, and longitudinal regression models. Nonlinear functional forms are represented using natural cubic splines from splines and smooth terms from mgcv'. The package offers a unified interface for specifying nonlinear effects, interactions with time variables, random-intercept clustering structures, and additional linear covariates. Utilities are included to generate prediction grids and produce effect plots, facilitating interpretation and visualization of nonlinear relationships in applied regression workflows. The implementation builds on established methods for spline-based regression and mixed-effects modeling (Hastie and Tibshirani, 1990 <doi:10.1201/9780203738535>; Bates et al., 2015 <doi:10.18637/jss.v067.i01>; Wood, 2017 <doi:10.1201/9781315370279>). Applications include hierarchical and longitudinal data structures common in education, health, and social science research.
This package performs repeated nested cross-validation for Cox Proportionate Hazards, Cox Lasso, Survival Random Forest, and their ensemble. Returns internally validated concordance index, time-dependent area under the curve, Brier score, calibration slope, and statistical testing of non-linear ensemble outperforming the baseline Cox model. In this, it helps researchers to quantify the gain of using a more complex survival model, or justify its redundancy. Equally, it shows the performance value of the non-linear and interaction terms, and may highlight the need of further feature transformation. Further details can be found in Shamsutdinova, Stamate, Roberts, & Stahl (2022) "Combining Cox Model and Tree-Based Algorithms to Boost Performance and Preserve Interpretability for Health Outcomes" <doi:10.1007/978-3-031-08337-2_15>, where the method is described as Ensemble 1.
This package provides robust and efficient methods for estimating causal effects in a target population using a multi-source dataset, including those of Dahabreh et al. (2019) <doi:10.1111/biom.13716>, Robertson et al. (2021) <doi:10.48550/arXiv.2104.05905>, and Wang et al. (2024) <doi:10.48550/arXiv.2402.02684>. The multi-source data can be a collection of trials, observational studies, or a combination of both, which have the same data structure (outcome, treatment, and covariates). The target population can be based on an internal dataset or an external dataset where only covariate information is available. The causal estimands available are average treatment effects and subgroup treatment effects. See Wang et al. (2025) <doi:10.1017/rsm.2025.5> for a detailed guide on using the package.
This package provides functions for evaluating the stability of low-dimensional embeddings and cluster assignments in singleâ cell RNA sequencing (scRNAâ seq) datasets. Starting from a principal component analysis (PCA) object, users can generate multiple replicates of tâ Distributed Stochastic Neighbor Embedding (tâ SNE) or Uniform Manifold Approximation and Projection (UMAP) embeddings. Embedding stability is quantified by computing pairwise Kendallâ s Tau correlations across replicates and summarizing the distribution of correlation coefficients. In addition to dimensionality reduction, scStability assesses clustering consistency using either Louvain or Leiden algorithms and calculating the Normalized Mutual Information (NMI) between all pairs of cluster assignments. For background on UMAP and t-SNE algorithms, see McInnes et al. (2020, <doi:10.21105/joss.00861>) and van der Maaten & Hinton (2008, <https://github.com/lvdmaaten/bhtsne>), respectively.
This package provides a framework for automated machine learning. Concretely, the focus is on the optimisation of bagging workflows. A bagging workflows is composed by three phases: (i) generation: which and how many predictive models to learn; (ii) pruning: after learning a set of models, the worst ones are cut off from the ensemble; and (iii) integration: how the models are combined for predicting a new observation. autoBagging optimises these processes by combining metalearning and a learning to rank approach to learn from metadata. It automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. A complete description of the method can be found in: Pinto, F., Cerqueira, V., Soares, C., Mendes-Moreira, J. (2017): "autoBagging: Learning to Rank Bagging Workflows with Metalearning" arXiv preprint arXiv:1706.09367.
Extends the Seurat classes and functions to support Genomic Data Structure (GDS) files as a DelayedArray backend for data representation. It relies on the implementation of GDS-based DelayedMatrix in the SCArray package to represent single cell RNA-seq data. The common optimized algorithms leveraging GDS-based and single cell-specific DelayedMatrix (SC_GDSMatrix) are implemented in the SCArray package. SCArray.sat introduces a new SCArrayAssay class (derived from the Seurat Assay), which wraps raw counts, normalized expressions and scaled data matrix based on GDS-specific DelayedMatrix. It is designed to integrate seamlessly with the Seurat package to provide common data analysis in the SeuratObject-based workflow. Compared with Seurat, SCArray.sat significantly reduces the memory usage without downsampling and can be applied to very large datasets.
This package provides a (mildly) opinionated set of functions to help assess medication adherence for researchers working with medication claims data. Medication adherence analyses have several complex steps that are often convoluted and can be time-intensive. The focus is to create a set of functions using "tidy principles" geared towards transparency, speed, and flexibility while working with adherence metrics. All functions perform exactly one task with an intuitive name so that a researcher can handle details (often achieved with vectorized solutions) while we handle non-vectorized tasks common to most adherence calculations such as adjusting fill dates and determining episodes of care. The methodologies in referenced in this package come from Canfield SL, et al (2019) "Navigating the Wild West of Medication Adherence Reporting in Specialty Pharmacy" <doi:10.18553/jmcp.2019.25.10.1073>.
Statistical downscaling and bias correction of climate predictions. It includes implementations of commonly used methods such as Analogs, Linear Regression, Logistic Regression, and Bias Correction techniques, as well as interpolation functions for regridding and point-based applications. It facilitates the production of high-resolution and local-scale climate information from coarse-scale predictions, which is essential for impact analyses. The package can be applied in a wide range of sectors and studies, including agriculture, water management, energy, heatwaves, and other climate-sensitive applications. The package was developed within the framework of the European Union Horizon Europe projects Impetus4Change (101081555) and ASPECT (101081460), the Wellcome Trust supported HARMONIZE project (224694/Z/21/Z), and the Spanish national project BOREAS (PID2022-140673OA-I00). Implements the methods described in Duzenli et al. (2024) <doi:10.5194/egusphere-egu24-19420>.
This package provides a collection of functions for exploratory chemometrics of 2D spectroscopic data sets such as COSY (correlated spectroscopy) and HSQC (heteronuclear single quantum coherence) 2D NMR (nuclear magnetic resonance) spectra. ChemoSpec2D deploys methods aimed primarily at classification of samples and the identification of spectral features which are important in distinguishing samples from each other. Each 2D spectrum (a matrix) is treated as the unit of observation, and thus the physical sample in the spectrometer corresponds to the sample from a statistical perspective. In addition to chemometric tools, a few tools are provided for plotting 2D spectra, but these are not intended to replace the functionality typically available on the spectrometer. ChemoSpec2D takes many of its cues from ChemoSpec and tries to create consistent graphical output and to be very user friendly.
Conducting Bayesian Optimal Interval (BOIN) design for phase I dose-finding trials. simFastBOIN provides functions for pre-computing decision tables, conducting trial simulations, and evaluating operating characteristics. The package uses vectorized operations and the Iso::pava() function for isotonic regression to achieve efficient performance while maintaining full compatibility with BOIN methodology. Version 1.3.2 adds p_saf and p_tox parameters for customizable safety and toxicity thresholds. Version 1.3.1 fixes Date field. Version 1.2.1 adds comprehensive roxygen2 documentation and enhanced print formatting with flexible table output options. Version 1.2.0 integrated C-based PAVA for isotonic regression. Version 1.1.0 introduced conservative MTD selection (boundMTD) and flexible early stopping rules (n_earlystop_rule). Methods are described in Liu and Yuan (2015) <doi:10.1111/rssc.12089>.
This package creates and fits staged event tree probability models, which are probabilistic graphical models capable of representing asymmetric conditional independence statements for categorical variables. Includes functions to create, plot and fit staged event trees from data, as well as many efficient structure learning algorithms. References: Carli F, Leonelli M, Riccomagno E, Varando G (2022). <doi: 10.18637/jss.v102.i06>. Collazo R. A., Görgen C. and Smith J. Q. (2018, ISBN:9781498729604). Görgen C., Bigatti A., Riccomagno E. and Smith J. Q. (2018) <arXiv:1705.09457>. Thwaites P. A., Smith, J. Q. (2017) <arXiv:1510.00186>. Barclay L. M., Hutton J. L. and Smith J. Q. (2013) <doi:10.1016/j.ijar.2013.05.006>. Smith J. Q. and Anderson P. E. (2008) <doi:10.1016/j.artint.2007.05.004>.
This package provides functions to calculate step- and cadence-based metrics from timestamped accelerometer and wearable device data. Supports CSV and AGD files from ActiGraph devices, CSV files from Fitbit devices, and step counts derived with R package GGIR <https://github.com/wadpac/GGIR>, with automatic handling of epoch lengths from 1 to 60 seconds. Metrics include total steps, cadence peaks, minutes and steps in predefined cadence bands, and time and steps in moderate-to-vigorous physical activity (MVPA). Methods and thresholds are informed by the literature, e.g., Tudor-Locke and Rowe (2012) <doi:10.2165/11599170-000000000-00000>, Barreira et al. (2012) <doi:10.1249/MSS.0b013e318254f2a3>, and Tudor-Locke et al. (2018) <doi:10.1136/bjsports-2017-097628>. The package record is also available on Zenodo (2023) <doi:10.5281/zenodo.7858094>.
This package contains tools for survey statistics (especially in educational assessment) for datasets with replication designs (jackknife, bootstrap, replicate weights; see Kolenikov, 2010; Pfefferman & Rao, 2009a, 2009b, <doi:10.1016/S0169-7161(09)70003-3>, <doi:10.1016/S0169-7161(09)70037-9>); Shao, 1996, <doi:10.1080/02331889708802523>). Descriptive statistics, linear and logistic regression, path models for manifest variables with measurement error correction and two-level hierarchical regressions for weighted samples are included. Statistical inference can be conducted for multiply imputed datasets and nested multiply imputed datasets and is in particularly suited for the analysis of plausible values (for details see George, Oberwimmer & Itzlinger-Bruneforth, 2016; Bruneforth, Oberwimmer & Robitzsch, 2016; Robitzsch, Pham & Yanagida, 2016). The package development was supported by BIFIE (Federal Institute for Educational Research, Innovation and Development of the Austrian School System; Salzburg, Austria).
This package provides a set of procedures for parametric and non-parametric modelling of the dependence structure of multivariate extreme-values is provided. The statistical inference is performed with non-parametric estimators, likelihood-based estimators and Bayesian techniques. It adapts the methodologies of Beranger and Padoan (2015) <doi:10.48550/arXiv.1508.05561>, Marcon et al. (2016) <doi:10.1214/16-EJS1162>, Marcon et al. (2017) <doi:10.1002/sta4.145>, Marcon et al. (2017) <doi:10.1016/j.jspi.2016.10.004> and Beranger et al. (2021) <doi:10.1007/s10687-019-00364-0>. This package also allows for the modelling of spatial extremes using flexible max-stable processes. It provides simulation algorithms and fitting procedures relying on the Stephenson-Tawn likelihood as per Beranger at al. (2021) <doi:10.1007/s10687-020-00376-1>.
This package provides tools for causal structure learning from observational data, with emphasis on temporally ordered variables. The package implements the Temporal Peterâ Clark (TPC) algorithm (Petersen, Osler & Ekstrøm, 2021; <doi:10.1093/aje/kwab087>), the Temporal Greedy Equivalence Search (TGES) algorithm (Larsen, Ekstrøm & Petersen, 2025; <doi:10.48550/arXiv.2502.06232>) and Temporal Fast Causal Inference (TFCI). It provides a unified framework for specifying background knowledge, which can be incorporated into the implemented algorithms from the R packages bnlearn (Scutari, 2010; <doi:10.18637/jss.v035.i03>) and pcalg (Kalish et al., 2012; <doi:10.18637/jss.v047.i11>), as well as the Java library Tetrad (Scheines et al., 1998; <doi:10.1207/s15327906mbr3301_3>). The package further includes utilities for visualization, comparison, and evaluation of graph structures, facilitating performance evaluation and methodological studies.
This package implements an approach aimed at assessing the accuracy and effectiveness of raw scores obtained in scales that contain locally dependent items. The program uses as input the calibration (structural) item estimates obtained from fitting extended unidimensional factor-analytic solutions in which the existing local dependencies are included. Measures of reliability (Omega) and information are proposed at three levels: (a) total score, (b) bivariate-doublet, and (c) item-by-item deletion, and are compared to those that would be obtained if all the items had been locally independent. All the implemented procedures can be obtained from: (a) linear factor-analytic solutions in which the item scores are treated as approximately continuous, and (b) non-linear solutions in which the item scores are treated as ordered-categorical. A detailed guide can be obtained at the following url.
Reads data from Bruker OPUS binary files of Fourier-Transform infrared spectrometers of the company Bruker Optics GmbH & Co. This package is released independently from Bruker, and Bruker and OPUS are registered trademarks of Bruker Optics GmbH & Co. KG. <https://www.bruker.com/en/products-and-solutions/infrared-and-raman/opus-spectroscopy-software/latest-release.html>. It lets you import both measurement data and parameters from OPUS files. The main method is `read_opus()`, which reads one or multiple OPUS files into a standardized list class. Behind the scenes, the reader parses the file header for assigning spectral blocks and reading binary data from the respective byte positions, using a reverse engineering approach. Infrared spectroscopy combined with chemometrics and machine learning is an established method to scale up chemical diagnostics in various industries and scientific fields.
Integrated species distribution modeling is a rising field in quantitative ecology thanks to significant rises in the quantity of data available, increases in computational speed and the proven benefits of using such models. Despite this, the general software to help ecologists construct such models in an easy-to-use framework is lacking. We therefore introduce the R package PointedSDMs': which provides the tools to help ecologists set up integrated models and perform inference on them. There are also functions within the package to help run spatial cross-validation for model selection, as well as generic plotting and predicting functions. An introduction to these methods is discussed in Issac, Jarzyna, Keil, Dambly, Boersch-Supan, Browning, Freeman, Golding, Guillera-Arroita, Henrys, Jarvis, Lahoz-Monfort, Pagel, Pescott, Schmucki, Simmonds and Oâ Hara (2020) <doi:10.1016/j.tree.2019.08.006>.
Computes segregation indices, including the Index of Dissimilarity, as well as the information-theoretic indices developed by Theil (1971) <isbn:978-0471858454>, namely the Mutual Information Index (M) and Theil's Information Index (H). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. The package also provides a method to decompose differences in segregation as described by Elbers (2021) <doi:10.1177/0049124121986204>. The package includes standard error estimation by bootstrapping, which also corrects for small sample bias. The package also contains functions for visualizing segregation patterns.
Anscombe's quartet are a set of four two-variable datasets that have several common summary statistics but which have very different joint distributions. This becomes apparent when the data are plotted, which illustrates the importance of using graphical displays in Statistics. This package enables the creation of datasets that have identical marginal sample means and sample variances, sample correlation, least squares regression coefficients and coefficient of determination. The user supplies an initial dataset, which is shifted, scaled and rotated in order to achieve target summary statistics. The general shape of the initial dataset is retained. The target statistics can be supplied directly or calculated based on a user-supplied dataset. The datasauRus package <https://cran.r-project.org/package=datasauRus> provides further examples of datasets that have markedly different scatter plots but share many sample summary statistics.
This package provides a conservative, assumption-aware statistical consistency checker for published research results. Parses test statistics, effect sizes, and confidence intervals from text, PDF, HTML, and Word documents across multiple citation styles including American Psychological Association (APA), Harvard, Frontiers, PLOS ONE, Scientific Reports, Nature Human Behaviour, PeerJ, eLife, PNAS, and others. Recomputes effect sizes using all plausible variants when design is ambiguous, and validates internal consistency. Supports t-tests, F-tests/ANOVA, correlations, chi-square, z-tests, regression, and nonparametric tests. Provides statcheck'-compatible API functions for batch processing of files and directories. Explicitly tracks all assumptions and uncertainty in output. Detects decision errors (significance reversals) similar to statcheck'. Note: this package is under active development and results should be independently verified. Use is at the sole responsibility of the user. Contributions and verification reports are welcome.
Detection of differentially expressed genes (DEGs) from the comparison of two biological conditions (treated vs. untreated, diseased vs. normal, mutant vs. wild-type) among different levels of gene expression (transcriptome ,translatome, proteome), using several statistical methods: Rank Product, Translational Efficiency, t-test, Limma, ANOTA, DESeq, edgeR. Possibility to plot the results with scatterplots, histograms, MA plots, standard deviation (SD) plots, coefficient of variation (CV) plots. Detection of significantly enriched post-transcriptional regulatory factors (RBPs, miRNAs, etc) and Gene Ontology terms in the lists of DEGs previously identified for the two expression levels. Comparison of GO terms enriched only in one of the levels or in both. Calculation of the semantic similarity score between the lists of enriched GO terms coming from the two expression levels. Visual examination and comparison of the enriched terms with heatmaps, radar plots and barplots.