This package provides functions for classifying sparseness in 2 x 2 categorical data where one or more cells have zero counts. The classification uses three widely applied summary measures: Risk Difference (RD), Relative Risk (RR), and Odds Ratio (OR). Helps in selecting suitable continuity corrections for zero cells in multi-centre or meta-analysis studies. Also supports sensitivity analysis and can detect phenomena such as Simpson's paradox. The methodology is based on Subbiah and Srinivasan (2008) <doi:10.1016/j.spl.2008.06.023>.
This is the R API for the nfer formalism (<http://nfer.io/>). nfer was developed to specify event stream abstractions for spacecraft telemetry such as the Mars Science Laboratory. Users write rules using a syntax that borrows heavily from Allen's Temporal Logic that, when applied to an event stream, construct a hierarchy of temporal intervals with data. The R API supports loading rules from a file or mining them from historical data. Traces of events or pools of intervals are provided as data frames.
Functions, examples and data from the first and the second edition of "Numerical Methods and Optimization in Finance" by M. Gilli, D. Maringer and E. Schumann (2019, ISBN:978-0128150658). The package provides implementations of optimisation heuristics (Differential Evolution, Genetic Algorithms, Particle Swarm Optimisation, Simulated Annealing and Threshold Accepting), and other optimisation tools, such as grid search and greedy search. There are also functions for the valuation of financial instruments such as bonds and options, for portfolio selection and functions that help with stochastic simulations.
This package provides a standardized workflow to reconstruct spatial configurations of altitude-bounded biogeographic systems over time. For example, tabs can model how island archipelagos expand or contract with changing sea levels or how alpine biomes shift in response to tree line movements. It provides functionality to account for various geophysical processes such as crustal deformation and other tectonic changes, allowing for a more accurate representation of biogeographic system dynamics. For more information see De Groeve et al. (2025) <doi:10.21425/fob.18.151677>.
We provide a toolbox to fit and simulate a univariate or multivariate damped random walk process that is also known as an Ornstein-Uhlenbeck process or a continuous-time autoregressive model of the first order, i.e., CAR(1) or CARMA(1, 0). This process is suitable for analyzing univariate or multivariate time series data with irregularly-spaced observation times and heteroscedastic measurement errors. When it comes to the multivariate case, the number of data points (measurements/observations) available at each observation time does not need to be the same, and the length of each time series can vary. The number of time series data sets that can be modeled simultaneously is limited to ten in this version of the package. We use Kalman-filtering to evaluate the resulting likelihood function, which leads to a scalable and efficient computation in finding maximum likelihood estimates of the model parameters or in drawing their posterior samples. Please pay attention to loading the data if this package is used for astronomical data analyses; see the details in the manual. Also see Hu and Tak (2020) <arXiv:2005.08049>.
This package provides smooth additive quantile regression models, fitted using the methods of Fasiolo et al. (2017). Differently from quantreg, the smoothing parameters are estimated automatically by marginal loss minimization, while the regression coefficients are estimated using either PIRLS or Newton algorithm. The learning rate is determined so that the Bayesian credible intervals of the estimated effects have approximately the correct coverage. The main function is qgam() which is similar to gam() in the mgcv package, but fits non-parametric quantile regression models.
Rasqal is a C library that handles Resource Description Framework (RDF) query language syntaxes, query construction and execution of queries returning results as bindings, boolean, RDF graphs/triples or syntaxes. The supported query languages are SPARQL Query 1.0, SPARQL Query 1.1, SPARQL Update 1.1 (no executing) and the Experimental SPARQL extensions (LAQRS). Rasqal can write binding query results in the SPARQL XML, SPARQL JSON, CSV, TSV, HTML, ASCII tables, RDF/XML and Turtle/N3 and read them in SPARQL XML, RDF/XML and Turtle/N3.
Genome-wide association studies (GWAS) is a widely used tool for identification of genetic variants associated with phenotypes and diseases, though complex diseases featuring many genetic variants with small effects present difficulties for traditional these studies. By leveraging pleiotropy, the statistical power of a single GWAS can be increased. This package provides functions for fitting graph-GPA, a statistical framework to prioritize GWAS results by integrating pleiotropy. GGPA package provides user-friendly interface to fit graph-GPA models, implement association mapping, and generate a phenotype graph.
Efficient methods for Bayesian inference of state space models via Markov chain Monte Carlo (MCMC) based on parallel importance sampling type weighted estimators (Vihola, Helske, and Franks, 2020, <doi:10.1111/sjos.12492>), particle MCMC, and its delayed acceptance version. Gaussian, Poisson, binomial, negative binomial, and Gamma observation densities and basic stochastic volatility models with linear-Gaussian state dynamics, as well as general non-linear Gaussian models and discretised diffusion models are supported. See Helske and Vihola (2021, <doi:10.32614/RJ-2021-103>) for details.
How to fit a straight line through a set of points with errors in both coordinates? The bfsl package implements the York regression (York, 2004 <doi:10.1119/1.1632486>). It provides unbiased estimates of the intercept, slope and standard errors for the best-fit straight line to independent points with (possibly correlated) normally distributed errors in both x and y. Other commonly used errors-in-variables methods, such as orthogonal distance regression, geometric mean regression or Deming regression are special cases of the bfsl solution.
Facilitates the identification of counterfactual queries in structural causal models via the ID* and IDC* algorithms by Shpitser, I. and Pearl, J. (2007, 2008) <doi:10.48550/arXiv.1206.5294>, <https://jmlr.org/papers/v9/shpitser08a.html>. Provides a simple interface for defining causal diagrams and counterfactual conjunctions. Construction of parallel worlds graphs and counterfactual graphs is carried out automatically based on the counterfactual query and the causal diagram. See Tikka, S. (2023) <doi:10.32614/RJ-2023-053> for a tutorial of the package.
Fitting (hierarchical) hidden Markov models to financial data via maximum likelihood estimation. See Oelschläger, L. and Adam, T. "Detecting Bearish and Bullish Markets in Financial Time Series Using Hierarchical Hidden Markov Models" (2021, Statistical Modelling) <doi:10.1177/1471082X211034048> for a reference on the method. A user guide is provided by the accompanying software paper "fHMM: Hidden Markov Models for Financial Time Series in R", Oelschläger, L., Adam, T., and Michels, R. (2024, Journal of Statistical Software) <doi:10.18637/jss.v109.i09>.
This package implements statistical methods for group factor analysis, focusing on estimating the number of global and local factors and extracting them. Several algorithms are implemented, including Canonical Correlation-based Estimation by Choi et al. (2021) <doi:10.1016/j.jeconom.2021.09.008>, Generalised Canonical Correlation Estimation by Lin and Shin (2023) <doi:10.2139/ssrn.4295429>, Circularly Projected Estimation by Chen (2022) <doi:10.1080/07350015.2022.2051520>, and the Aggregated Projection Method by Hu et al. (2025) <doi:10.1080/01621459.2025.2491154>.
Multi modality data matrices are factorized conjointly into the multiplication of a shared sub-matrix and multiple modality specific sub-matrices, group sparse constraint is applied to the shared sub-matrix to capture the homogeneous and heterogeneous information, respectively. Then the samples are classified by clustering the shared sub-matrix with kmeanspp(), a new version of kmeans() developed here to obtain concordant results. The package also provides the cluster number estimation by rotation cost. Moreover, cluster specific features could be retrieved using hypergeometric tests.
This package provides an extension of the shadow-test approach to computerized adaptive testing (CAT) implemented in the TestDesign package for the assessment framework involving multiple tests administered periodically throughout the year. This framework is referred to as the Multiple Administrations Adaptive Testing (MAAT) and supports multiple item pools vertically scaled and multiple phases (stages) of CAT within each test. Between phases and tests, transitioning from one item pool (and associated constraints) to another is allowed as deemed necessary to enhance the quality of measurement.
Multivariate estimation and testing, currently a package for testing parametric data. To deal with parametric data, various multivariate normality tests and outlier detection are performed and visualized using the ggplot2 package. Homogeneity tests for covariance matrices are also possible, as well as the Hotelling's T-square test and the multivariate analysis of variance test. We are exploring additional tests and visualization techniques, such as profile analysis and randomized complete block design, to be made available in the future and making them easily accessible to users.
This package provides tools for the practical management of financial portfolios: backtesting investment and trading strategies, computing profit/loss and returns, analysing trades, handling lists of transactions, reporting, and more. The package provides a small set of reliable, efficient and convenient tools for processing and analysing trade/portfolio data. The manual provides all the details; it is available from <https://enricoschumann.net/R/packages/PMwR/manual/PMwR.html>. Examples and descriptions of new features are provided at <https://enricoschumann.net/notes/PMwR/>.
Calculates the Probability Plot Correlation Coefficient (PPCC) between a continuous variable X and a specified distribution. The corresponding composite hypothesis test that was first introduced by Filliben (1975) <doi: 10.1080/00401706.1975.10489279> can be performed to test whether the sample X is element of either the Normal, log-Normal, Exponential, Uniform, Cauchy, Logistic, Generalized Logistic, Gumbel (GEVI), Weibull, Generalized Extreme Value, Pearson III (Gamma 2), Mielke's Kappa, Rayleigh or Generalized Logistic Distribution. The PPCC test is performed with a fast Monte-Carlo simulation.
Create, transform, and summarize custom random variables with distribution functions (analogues of p*()', d*()', q*()', and r*() functions from base R). Two types of distributions are supported: "discrete" (random variable has finite number of output values) and "continuous" (infinite number of values in the form of continuous random variable). Functions for distribution transformations and summaries are available. Implemented approaches often emphasize approximate and numerical solutions: all distributions assume finite support and finite values of density function; some methods implemented with simulation techniques.
The aim of the package is to provide some basic functions for doing statistics with one dimensional Fuzzy Data (in the form of polygonal fuzzy numbers). In particular, the package contains functions for the basic operations on the class of fuzzy numbers (sum, scalar product, mean, median, Hukuhara difference) as well as for calculating (Bertoluzza) distance and sample variance. Moreover a function to simulate fuzzy random variables and bootstrap tests for the equality of means is included. Version 2.1 fixes some bugs of previous versions.
Implementation of the tree-guided feature selection and logic aggregation approach introduced in Chen et al. (2024) <doi:10.1080/01621459.2024.2326621>. The method enables the selection and aggregation of large-scale rare binary features with a known hierarchical structure using a convex, linearly-constrained regularized regression framework. The package facilitates the application of this method to both linear regression and binary classification problems by solving the optimization problem via the smoothing proximal gradient descent algorithm (Chen et al. (2012) <doi:10.1214/11-AOAS514>).
The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is de-escalated only if an unacceptable level of toxicity is experienced.
Specialized toolkit for processing biological and fisheries data from Peru's anchovy (Engraulis ringens) fishery. Provides functions to analyze fishing logbooks, calculate biological indicators (length-weight relationships, juvenile percentages), generate spatial fishing indicators, and visualize regulatory measures from Peru's Ministry of Production. Features automated data processing from multiple file formats, coordinate validation, spatial analysis of fishing zones, and tools for analyzing fishing closure announcements and regulatory compliance. Includes built-in datasets of Peruvian coastal coordinates and parallel lines for analyzing fishing activities within regulatory zones.
The main purpose of this package is to provide the algorithmic complexity for short strings, an approximation of the Kolmogorov Complexity of a short string using the coding theorem method. While the database containing the complexity is provided in the data only package acss.data, this package provides functions accessing the data such as prob_random returning the posterior probability that a given string was produced by a random process. In addition, two traditional (but problematic) measures of complexity are also provided: entropy and change complexity.