The strength of evidence provided by epidemiological and observational studies is inherently limited by the potential for unmeasured confounding. We focus on three key quantities: the observed bound of the confidence interval closest to the null, the relationship between an unmeasured confounder and the outcome, for example a plausible residual effect size for an unmeasured continuous or binary confounder, and the relationship between an unmeasured confounder and the exposure, for example a realistic mean difference or prevalence difference for this hypothetical confounder between exposure groups. Building on the methods put forth by Cornfield et al. (1959), Bross (1966), Schlesselman (1978), Rosenbaum & Rubin (1983), Lin et al. (1998), Lash et al. (2009), Rosenbaum (1986), Cinelli & Hazlett (2020), VanderWeele & Ding (2017), and Ding & VanderWeele (2016), we can use these quantities to assess how an unmeasured confounder may tip our result to insignificance.
Learning and inference over dynamic Bayesian networks of arbitrary Markovian order. Extends some of the functionality offered by the bnlearn package to learn the networks from data and perform exact inference. It offers three structure learning algorithms for dynamic Bayesian networks: Trabelsi G. (2013) <doi:10.1007/978-3-642-41398-8_34>, Santos F.P. and Maciel C.D. (2014) <doi:10.1109/BRC.2014.6880957>, Quesada D., Bielza C. and Larrañaga P. (2021) <doi:10.1007/978-3-030-86271-8_14>. It also offers the possibility to perform forecasts of arbitrary length. A tool for visualizing the structure of the net is also provided via the visNetwork package. Further detailed information and examples can be found in our Journal of Statistical Software paper Quesada D., Larrañaga P. and Bielza C. (2025) <doi:10.18637/jss.v115.i06>.
This package provides the probability density function (PDF), cumulative distribution function (CDF), the first-order and second-order partial derivatives of the PDF, and a fitting function for the diffusion decision model (DDM; e.g., Ratcliff & McKoon, 2008, <doi:10.1162/neco.2008.12-06-420>) with across-trial variability in the drift rate. Because the PDF, its partial derivatives, and the CDF of the DDM both contain an infinite sum, they need to be approximated. fddm implements all published approximations (Navarro & Fuss, 2009, <doi:10.1016/j.jmp.2009.02.003>; Gondan, Blurton, & Kesselmeier, 2014, <doi:10.1016/j.jmp.2014.05.002>; Blurton, Kesselmeier, & Gondan, 2017, <doi:10.1016/j.jmp.2016.11.003>; Hartmann & Klauer, 2021, <doi:10.1016/j.jmp.2021.102550>) plus new approximations. All approximations are implemented purely in C++ providing faster speed than existing packages.
This package provides a variety of latent Markov models, including hidden Markov models, hidden semi-Markov models, state-space models and continuous-time variants can be formulated and estimated within the same framework via directly maximising the likelihood function using the so-called forward algorithm. Applied researchers often need custom models that standard software does not easily support. Writing tailored R code offers flexibility but suffers from slow estimation speeds. We address these issues by providing easy-to-use functions (written in C++ for speed) for common tasks like the forward algorithm. These functions can be combined into custom models in a Lego-type approach, offering up to 10-20 times faster estimation via standard numerical optimisers. To aid in building fully custom likelihood functions, several vignettes are included that show how to simulate data from and estimate all the above model classes.
This package provides functions to fit point process models using the Palm likelihood. First proposed by Tanaka, Ogata, and Stoyan (2008) <DOI:10.1002/bimj.200610339>, maximisation of the Palm likelihood can provide computationally efficient parameter estimation for point process models in situations where the full likelihood is intractable. This package is chiefly focused on Neyman-Scott point processes, but can also fit the void processes proposed by Jones-Todd et al. (2019) <DOI:10.1002/sim.8046>. The development of this package was motivated by the analysis of capture-recapture surveys on which individuals cannot be identified---the data from which can conceptually be seen as a clustered point process (Stevenson, Borchers, and Fewster, 2019 <DOI:10.1111/biom.12983>). As such, some of the functions in this package are specifically for the estimation of cetacean density from two-camera aerial surveys.
This package contains functions for statistical data analysis based on spatially-clustered techniques. The package allows estimating the spatially-clustered spatial regression models presented in Cerqueti, Maranzano \& Mattera (2024), "Spatially-clustered spatial autoregressive models with application to agricultural market concentration in Europe", arXiv preprint 2407.15874 <doi:10.48550/arXiv.2407.15874>. Specifically, the current release allows the estimation of the spatially-clustered linear regression model (SCLM), the spatially-clustered spatial autoregressive model (SCSAR), the spatially-clustered spatial Durbin model (SCSEM), and the spatially-clustered linear regression model with spatially-lagged exogenous covariates (SCSLX). From release 0.0.2, the library contains functions to estimate spatial clustering based on Adiajacent Matrix K-Means (AMKM) as described in Zhou, Liu \& Zhu (2019), "Weighted adjacent matrix for K-means clustering", Multimedia Tools and Applications, 78 (23) <doi:10.1007/s11042-019-08009-x>.
An advanced version of package s2dverification'. Intended for seasonal to decadal (s2d) climate forecast verification, but also applicable to other types of forecasts or general climate analysis. This package is specifically designed for comparing experimental and observational datasets. It provides functionality for data retrieval, post-processing, skill score computation against observations, and visualization. Compared to s2dverification', s2dv is more compatible with the package startR', able to use multiple cores for computation and handle multi-dimensional arrays with a higher flexibility. The Climate Data Operators (CDO) version used in development is 1.9.8. Implements methods described in Wilks (2011) <doi:10.1016/B978-0-12-385022-5.00008-7>, DelSole and Tippett (2016) <doi:10.1175/MWR-D-15-0218.1>, Kharin et al. (2012) <doi:10.1029/2012GL052647>, Doblas-Reyes et al. (2003) <doi:10.1007/s00382-003-0350-4>.
Flexibly implements Integral Projection Models using a mathematical(ish) syntax. This package will not help with the vital rate modeling process, but will help convert those regression models into an IPM. ipmr handles density dependence and environmental stochasticity, with a couple of options for implementing the latter. In addition, provides functions to avoid unintentional eviction of individuals from models. Additionally, provides model diagnostic tools, plotting functionality, stochastic/deterministic simulations, and analysis tools. Integral projection models are described in depth by Easterling et al. (2000) <doi:10.1890/0012-9658(2000)081[0694:SSSAAN]2.0.CO;2>, Merow et al. (2013) <doi:10.1111/2041-210X.12146>, Rees et al. (2014) <doi:10.1111/1365-2656.12178>, and Metcalf et al. (2015) <doi:10.1111/2041-210X.12405>. Williams et al. (2012) <doi:10.1890/11-2147.1> discuss the problem of unintentional eviction.
Implementation of a transfer learning framework employing distribution mapping based domain transfer. Uses the renowned concept of histogram matching (see Gonzalez and Fittes (1977) <doi:10.1016/0094-114X(77)90062-3>, Gonzalez and Woods (2008) <isbn:9780131687288>) and extends it to include distribution measures like kernel density estimates (KDE; see Wand and Jones (1995) <isbn:978-0-412-55270-0>, Jones et al. (1996) <doi:10.2307/2291420). In the typical application scenario, one can use the underlying sample distributions (histogram or KDE) to generate a map between two distinct but related domains to transfer the target data to the source domain and utilize the available source data for better predictive modeling design. Suitable for the case where a one-to-one sample matching is not possible, thus one needs to transform the underlying data distribution to utilize the more available data for modeling.
This is a small, lightweight package that lets users investigate the distribution of genotypes in genotype-by-sequencing (GBS) data where they expect (by and large) Hardy-Weinberg equilibrium, in order to assess rates of genotyping errors and the dependence of those rates on read depth. It implements a Markov chain Monte Carlo (MCMC) sampler using Rcpp to compute a Bayesian estimate of what we call the heterozygote miscall rate for restriction-associated digest (RAD) sequencing data and other types of reduced representation GBS data. It also provides functions to generate plots of expected and observed genotype frequencies. Some background on these topics can be found in a recent paper "Recent advances in conservation and population genomics data analysis" by Hendricks et al. (2018) <doi:10.1111/eva.12659>, and another paper describing the MCMC approach is in preparation with Gordon Luikart and Thierry Gosselin.
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi: 10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.
Multidimensional scaling (MDS) functions for various tasks that are beyond the beta stage and way past the alpha stage. Currently, options are available for weights, restrictions, classical scaling or principal coordinate analysis, transformations (linear, power, Box-Cox, spline, ordinal), outlier mitigation (rdop), out-of-sample estimation (predict), negative dissimilarities, fast and faster executions with low memory footprints, penalized restrictions, cross-validation-based penalty selection, supplementary variable estimation (explain), additive constant estimation, mixed measurement level distance calculation, restricted classical scaling, etc. More will come in the future. References. Busing (2024) "A Simple Population Size Estimator for Local Minima Applied to Multidimensional Scaling". Manuscript submitted for publication. Busing (2025) "Node Localization by Multidimensional Scaling with Iterative Majorization". Manuscript submitted for publication. Busing (2025) "Faster Multidimensional Scaling". Manuscript in preparation. Barroso and Busing (2025) "e-RDOP, Relative Density-Based Outlier Probabilities, Extended to Proximity Mapping". Manuscript submitted for publication.
An end-to-end toolkit for land use and land cover classification using big Earth observation data. Builds satellite image data cubes from cloud collections. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Enables merging of multi-source imagery (SAR, optical, DEM). Includes functions for quality assessment of training samples using self-organized maps and to reduce training samples imbalance. Provides machine learning algorithms including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolution neural networks, and temporal attention encoders. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference. Enables best practices for estimating area and assessing accuracy of land change. Includes object-based spatio-temporal segmentation for space-time OBIA. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
This package provides a Bayesian latent space model for complex networks, either weighted or unweighted. Given an observed input graph, the estimates for the latent coordinates of the nodes are obtained through a Bayesian MCMC algorithm. The overall likelihood of the graph depends on a fundamental probability equation, which is defined so that ties are more likely to exist between nodes whose latent space coordinates are close. The package is mainly based on the model by Hoff, Raftery and Handcock (2002) <doi:10.1198/016214502388618906> and contains some extra features (e.g., removal of the Procrustean step, weights implemented as coefficients of the latent distances, 3D plots). The original code related to the above model was retrieved from <https://www.stat.washington.edu/people/pdhoff/Code/hoff_raftery_handcock_2002_jasa/>. Users can inspect the MCMC simulation, create and customize insightful graphical representations or apply clustering techniques.
Mapping, spatial analysis, and statistical modeling of microdata from sources such as the Demographic and Health Surveys <https://www.dhsprogram.com/> and Integrated Public Use Microdata Series <https://www.ipums.org/>. It can also be extended to other datasets. The package supports spatial correlation index construction and visualization, along with empirical Bayes approximation of regression coefficients in a multistage setup. The main functionality is repeated regression â for example, if we have to run regression for n groups, the group ID should be vertically composed into the variable for the parameter `location_var`. It can perform various kinds of regression, such as Generalized Regression Models, logit, probit, and more. Additionally, it can incorporate interaction effects. The key benefit of the package is its ability to store the regression results performed repeatedly on a dataset by the group ID, along with respective p-values and map those estimates.
Entropy weighted k-means (ewkm) by Liping Jing, Michael K. Ng and Joshua Zhexue Huang (2007) <doi:10.1109/TKDE.2007.1048> is a weighted subspace clustering algorithm that is well suited to very high dimensional data. Weights are calculated as the importance of a variable with regard to cluster membership. The two-level variable weighting clustering algorithm tw-k-means (twkm) by Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang and Yunming Ye (2013) <doi:10.1109/TKDE.2011.262> introduces two types of weights, the weights on individual variables and the weights on variable groups, and they are calculated during the clustering process. The feature group weighted k-means (fgkm) by Xiaojun Chen, Yunminng Ye, Xiaofei Xu and Joshua Zhexue Huang (2012) <doi:10.1016/j.patcog.2011.06.004> extends this concept by grouping features and weighting the group in addition to weighting individual features.
This package implements the Single Transferable Vote (STV) electoral system, with clear explanatory graphics. The core function stv() uses Meek's method, the purest expression of the simple principles of STV, but which requires electronic counting. It can handle votes expressing equal preferences for subsets of the candidates. A function stv.wig() implementing the Weighted Inclusive Gregory method, as used in Scottish council elections, is also provided, and with the same options, as described in the manual. The required vote data format is as an R list: a function pref.data() is provided to transform some commonly used data formats into this format. References for methodology: Hill, Wichmann and Woodall (1987) <doi:10.1093/comjnl/30.3.277>, Hill, David (2006) <https://www.votingmatters.org.uk/ISSUE22/I22P2.pdf>, Mollison, Denis (2023) <arXiv:2303.15310>, (see also the package manual pref_pkg_manual.pdf).
This package provides functions that allow you to generate and compare power spectral density (PSD) plots given time series data. Fast Fourier Transform (FFT) is used to take a time series data, analyze the oscillations, and then output the frequencies of these oscillations in the time series in the form of a PSD plot.Thus given a time series, the dominant frequencies in the time series can be identified. Additional functions in this package allow the dominant frequencies of multiple groups of time series to be compared with each other. To see example usage with the main functions of this package, please visit this site: <https://yhhc2.github.io/psdr/articles/Introduction.html>. The mathematical operations used to generate the PSDs are described in these sites: <https://www.mathworks.com/help/matlab/ref/fft.html>. <https://www.mathworks.com/help/signal/ug/power-spectral-density-estimates-using-fft.html>.
Accelerated destructive degradation tests (ADDT) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the collected data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparison. This package implements the traditional method based on the least-squares method, the parametric method based on maximum likelihood estimation, and the semiparametric method based on spline methods, and the corresponding methods for estimating TI for polymeric materials. The traditional approach is a two-step approach that is currently used in industrial standards, while the parametric method is widely used in the statistical literature. The semiparametric method is newly developed. Both the parametric and semiparametric approaches allow one to do statistical inference such as quantifying uncertainties in estimation, hypothesis testing, and predictions. Publicly available datasets are provided illustrations. More details can be found in Jin et al. (2017).
An implementation of a taxonomy of models of restricted diffusion in biological tissues parametrized by the tissue geometry (axis, diameter, density, etc.). This is primarily used in the context of diffusion magnetic resonance (MR) imaging to model the MR signal attenuation in the presence of diffusion gradients. The goal is to provide tools to simulate the MR signal attenuation predicted by these models under different experimental conditions. The package feeds a companion shiny app available at <https://midi-pastrami.apps.math.cnrs.fr> that serves as a graphical interface to the models and tools provided by the package. Models currently available are the ones in Neuman (1974) <doi:10.1063/1.1680931>, Van Gelderen et al. (1994) <doi:10.1006/jmrb.1994.1038>, Stanisz et al. (1997) <doi:10.1002/mrm.1910370115>, Soderman & Jonsson (1995) <doi:10.1006/jmra.1995.0014> and Callaghan (1995) <doi:10.1006/jmra.1995.1055>.
Reconstruction of paleoclimate niches using phylogenetic comparative methods and projection reconstructed niches onto paleoclimate maps. The user can specify various models of trait evolution or estimate the best fit model, include fossils, use one or multiple phylogenies for inference, and make animations of shifting suitable habitat through time. This model was first used in Lawing and Polly (2011), and further implemented in Lawing et al (2016) and Rivera et al (2020). Lawing and Polly (2011) <doi:10.1371/journal.pone.0028554> "Pleistocene climate, phylogeny and climate envelope models: An integrative approach to better understand species response to climate change" Lawing et al (2016) <doi:10.1086/687202> "Including fossils in phylogenetic climate reconstructions: A deep time perspective on the climatic niche evolution and diversification of spiny lizards (Sceloporus)" Rivera et al (2020) <doi:10.1111/jbi.13915> "Reconstructing historical shifts in suitable habitat of Sceloporus lineages using phylogenetic niche modelling.".
Time series area-level models for small area estimation. The package supplements the functionality of the sae package. Specifically, it includes EBLUP fitting of the Rao-Yu model in the original form without a spatial component. The package also offers a modified ("dynamic") version of the Rao-Yu model, replacing the assumption of stationarity. Both univariate and multivariate applications are supported. Of particular note is the allowance for covariance of the area-level sample estimates over time, as encountered in rotating panel designs such as the U.S. National Crime Victimization Survey or present in a time-series of 5-year estimates from the American Community Survey. Key references to the methods include J.N.K. Rao and I. Molina (2015, ISBN:9781118735787), J.N.K. Rao and M. Yu (1994) <doi:10.2307/3315407>, and R.E. Fay and R.A. Herriot (1979) <doi:10.1080/01621459.1979.10482505>.
Machine coded genetic algorithm (MCGA) is a fast tool for real-valued optimization problems. It uses the byte representation of variables rather than real-values. It performs the classical crossover operations (uniform) on these byte representations. Mutation operator is also similar to classical mutation operator, which is to say, it changes a randomly selected byte value of a chromosome by +1 or -1 with probability 1/2. In MCGAs there is no need for encoding-decoding process and the classical operators are directly applicable on real-values. It is fast and can handle a wide range of a search space with high precision. Using a 256-unary alphabet is the main disadvantage of this algorithm but a moderate size population is convenient for many problems. Package also includes multi_mcga function for multi objective optimization problems. This function sorts the chromosomes using their ranks calculated from the non-dominated sorting algorithm.
Calculate the probability density functions (PDFs) for two threshold evidence accumulation models (EAMs). These are defined using the following Stochastic Differential Equation (SDE), dx(t) = v(x(t),t)*dt+D(x(t),t)*dW, where x(t) is the accumulated evidence at time t, v(x(t),t) is the drift rate, D(x(t),t) is the noise scale, and W is the standard Wiener process. The boundary conditions of this process are the upper and lower decision thresholds, represented by b_u(t) and b_l(t), respectively. Upper threshold b_u(t) > 0, while lower threshold b_l(t) < 0. The initial condition of this process x(0) = z where b_l(t) < z < b_u(t). We represent this as the relative start point w = z/(b_u(0)-b_l(0)), defined as a ratio of the initial threshold location. This package generates the PDF using the same approach as the python package it is based upon, PyBEAM by Murrow and Holmes (2023) <doi:10.3758/s13428-023-02162-w>. First, it converts the SDE model into the forwards Fokker-Planck equation dp(x,t)/dt = d(v(x,t)*p(x,t))/dt-0.5*d^2(D(x,t)^2*p(x,t))/dx^2, then solves this equation using the Crank-Nicolson method to determine p(x,t). Finally, it calculates the flux at the decision thresholds, f_i(t) = 0.5*d(D(x,t)^2*p(x,t))/dx evaluated at x = b_i(t), where i is the relevant decision threshold, either upper (i = u) or lower (i = l). The flux at each thresholds f_i(t) is the PDF for each threshold, specifically its PDF. We discuss further details of this approach in this package and PyBEAM publications. Additionally, one can calculate the cumulative distribution functions of and sampling from the EAMs.