Mixture Nested Effects Models (mnem) is an extension of Nested Effects Models and allows for the analysis of single cell perturbation data provided by methods like Perturb-Seq (Dixit et al., 2016) or Crop-Seq (Datlinger et al., 2017). In those experiments each of many cells is perturbed by a knock-down of a specific gene, i.e. several cells are perturbed by a knock-down of gene A, several by a knock-down of gene B, ... and so forth. The observed read-out has to be multi-trait and in the case of the Perturb-/Crop-Seq gene are expression profiles for each cell. mnem uses a mixture model to simultaneously cluster the cell population into k clusters and and infer k networks causally linking the perturbed genes for each cluster. The mixture components are inferred via an expectation maximization algorithm.
This package implements the EM algorithm with one-step Gradient Descent method to estimate the parameters of the Block-Basu bivariate Pareto distribution with location and scale. We also found parametric bootstrap and asymptotic confidence intervals based on the observed Fisher information of scale and shape parameters, and exact confidence intervals for location parameters. Details are in Biplab Paul and Arabin Kumar Dey (2023) <doi:10.48550/arXiv.1608.02199> "An EM algorithm for absolutely continuous Marshall-Olkin bivariate Pareto distribution with location and scale"; E L Lehmann and George Casella (1998) <doi:10.1007/b98854> "Theory of Point Estimation"; Bradley Efron and R J Tibshirani (1994) <doi:10.1201/9780429246593> "An Introduction to the Bootstrap"; A P Dempster, N M Laird and D B Rubin (1977) <www.jstor.org/stable/2984875> "Maximum Likelihood from Incomplete Data via the EM Algorithm".
The recovery of visual sensitivity in a dark environment is known as dark adaptation. In a clinical or research setting the recovery is typically measured after a dazzling flash of light and can be described by the Mahroo, Lamb and Pugh (MLP) model of dark adaptation. The functions in this package take dark adaptation data and use nonlinear regression to find the parameters of the model that best describe the data. They do this by firstly, generating rapid initial objective estimates of data adaptation parameters, then a multi-start algorithm is used to reduce the possibility of a local minimum. There is also a bootstrap method to calculate parameter confidence intervals. The functions rely upon a dark list or object. This object is created as the first step in the workflow and parts of the object are updated as it is processed.
This package provides a collection of functions developed to support the tutorial on using Exploratory Structural Equiation Modeling (ESEM) (Asparouhov & Muthén, 2009) <https://www.statmodel.com/download/EFACFA810.pdf>) with Longitudinal Study of Australian Children (LSAC) dataset (Mohal et al., 2023) <doi:10.26193/QR4L6Q>. The package uses tidyverse','psych', lavaan','semPlot and provides additional functions to conduct ESEM. The package provides general functions to complete ESEM, including esem_c(), creation of target matrix (if it is used) make_target(), generation of the Confirmatory Factor Analysis (CFA) model syntax esem_cfa_syntax(). A sample data is provided - the package includes a sample data of the Strengths and Difficulties Questionnaire of the Longitudinal Study of Australian Children (SDQ LSAC) in sdq_lsac(). ESEM package vignette presents the tutorial demonstrating the use of ESEM on SDQ LSAC data.
This package provides functions to conduct a model-agnostic asymptotic hypothesis test for the identification of interaction effects in black-box machine learning models. The null hypothesis assumes that a given set of covariates does not contribute to interaction effects in the prediction model. The test statistic is based on the difference of variances of partial dependence functions (Friedman (2008) <doi:10.1214/07-AOAS148> and Welchowski (2022) <doi:10.1007/s13253-021-00479-7>) with respect to the original black-box predictions and the predictions under the null hypothesis. The hypothesis test can be applied to any black-box prediction model, and the null hypothesis of the test can be flexibly specified according to the research question of interest. Furthermore, the test is computationally fast to apply as the null distribution does not require resampling or refitting black-box prediction models.
This package provides functions for generating pseudo-random numbers that follow a uniform distribution [0,1]. Randomness tests were conducted using the National Institute of Standards and Technology test suite<https://csrc.nist.gov/pubs/sp/800/22/r1/upd1/final>, along with additional tests. The sequence generated depends on the initial values and parameters. The package includes a linear congruence map as the decision map and three chaotic maps to generate the pseudo-random sequence, which follow a uniform distribution. Other distributions can be generated from the uniform distribution using the Inversion Principle Method and BOX-Muller transformation. Small perturbations in seed values result in entirely different sequences of numbers due to the sensitive nature of the maps being used. The chaotic nature of the maps helps achieve randomness in the generator. Additionally, the generator is capable of producing random bits.
The strength of evidence provided by epidemiological and observational studies is inherently limited by the potential for unmeasured confounding. We focus on three key quantities: the observed bound of the confidence interval closest to the null, the relationship between an unmeasured confounder and the outcome, for example a plausible residual effect size for an unmeasured continuous or binary confounder, and the relationship between an unmeasured confounder and the exposure, for example a realistic mean difference or prevalence difference for this hypothetical confounder between exposure groups. Building on the methods put forth by Cornfield et al. (1959), Bross (1966), Schlesselman (1978), Rosenbaum & Rubin (1983), Lin et al. (1998), Lash et al. (2009), Rosenbaum (1986), Cinelli & Hazlett (2020), VanderWeele & Ding (2017), and Ding & VanderWeele (2016), we can use these quantities to assess how an unmeasured confounder may tip our result to insignificance.
Learning and inference over dynamic Bayesian networks of arbitrary Markovian order. Extends some of the functionality offered by the bnlearn package to learn the networks from data and perform exact inference. It offers three structure learning algorithms for dynamic Bayesian networks: Trabelsi G. (2013) <doi:10.1007/978-3-642-41398-8_34>, Santos F.P. and Maciel C.D. (2014) <doi:10.1109/BRC.2014.6880957>, Quesada D., Bielza C. and Larrañaga P. (2021) <doi:10.1007/978-3-030-86271-8_14>. It also offers the possibility to perform forecasts of arbitrary length. A tool for visualizing the structure of the net is also provided via the visNetwork package. Further detailed information and examples can be found in our Journal of Statistical Software paper Quesada D., Larrañaga P. and Bielza C. (2025) <doi:10.18637/jss.v115.i06>.
This package provides the probability density function (PDF), cumulative distribution function (CDF), the first-order and second-order partial derivatives of the PDF, and a fitting function for the diffusion decision model (DDM; e.g., Ratcliff & McKoon, 2008, <doi:10.1162/neco.2008.12-06-420>) with across-trial variability in the drift rate. Because the PDF, its partial derivatives, and the CDF of the DDM both contain an infinite sum, they need to be approximated. fddm implements all published approximations (Navarro & Fuss, 2009, <doi:10.1016/j.jmp.2009.02.003>; Gondan, Blurton, & Kesselmeier, 2014, <doi:10.1016/j.jmp.2014.05.002>; Blurton, Kesselmeier, & Gondan, 2017, <doi:10.1016/j.jmp.2016.11.003>; Hartmann & Klauer, 2021, <doi:10.1016/j.jmp.2021.102550>) plus new approximations. All approximations are implemented purely in C++ providing faster speed than existing packages.
This package provides a variety of latent Markov models, including hidden Markov models, hidden semi-Markov models, state-space models and continuous-time variants can be formulated and estimated within the same framework via directly maximising the likelihood function using the so-called forward algorithm. Applied researchers often need custom models that standard software does not easily support. Writing tailored R code offers flexibility but suffers from slow estimation speeds. We address these issues by providing easy-to-use functions (written in C++ for speed) for common tasks like the forward algorithm. These functions can be combined into custom models in a Lego-type approach, offering up to 10-20 times faster estimation via standard numerical optimisers. To aid in building fully custom likelihood functions, several vignettes are included that show how to simulate data from and estimate all the above model classes.
This package provides functions to fit point process models using the Palm likelihood. First proposed by Tanaka, Ogata, and Stoyan (2008) <DOI:10.1002/bimj.200610339>, maximisation of the Palm likelihood can provide computationally efficient parameter estimation for point process models in situations where the full likelihood is intractable. This package is chiefly focused on Neyman-Scott point processes, but can also fit the void processes proposed by Jones-Todd et al. (2019) <DOI:10.1002/sim.8046>. The development of this package was motivated by the analysis of capture-recapture surveys on which individuals cannot be identified---the data from which can conceptually be seen as a clustered point process (Stevenson, Borchers, and Fewster, 2019 <DOI:10.1111/biom.12983>). As such, some of the functions in this package are specifically for the estimation of cetacean density from two-camera aerial surveys.
This package contains functions for statistical data analysis based on spatially-clustered techniques. The package allows estimating the spatially-clustered spatial regression models presented in Cerqueti, Maranzano \& Mattera (2024), "Spatially-clustered spatial autoregressive models with application to agricultural market concentration in Europe", arXiv preprint 2407.15874 <doi:10.48550/arXiv.2407.15874>. Specifically, the current release allows the estimation of the spatially-clustered linear regression model (SCLM), the spatially-clustered spatial autoregressive model (SCSAR), the spatially-clustered spatial Durbin model (SCSEM), and the spatially-clustered linear regression model with spatially-lagged exogenous covariates (SCSLX). From release 0.0.2, the library contains functions to estimate spatial clustering based on Adiajacent Matrix K-Means (AMKM) as described in Zhou, Liu \& Zhu (2019), "Weighted adjacent matrix for K-means clustering", Multimedia Tools and Applications, 78 (23) <doi:10.1007/s11042-019-08009-x>.
An advanced version of package s2dverification'. Intended for seasonal to decadal (s2d) climate forecast verification, but also applicable to other types of forecasts or general climate analysis. This package is specifically designed for comparing experimental and observational datasets. It provides functionality for data retrieval, post-processing, skill score computation against observations, and visualization. Compared to s2dverification', s2dv is more compatible with the package startR', able to use multiple cores for computation and handle multi-dimensional arrays with a higher flexibility. The Climate Data Operators (CDO) version used in development is 1.9.8. Implements methods described in Wilks (2011) <doi:10.1016/B978-0-12-385022-5.00008-7>, DelSole and Tippett (2016) <doi:10.1175/MWR-D-15-0218.1>, Kharin et al. (2012) <doi:10.1029/2012GL052647>, Doblas-Reyes et al. (2003) <doi:10.1007/s00382-003-0350-4>.
Flexibly implements Integral Projection Models using a mathematical(ish) syntax. This package will not help with the vital rate modeling process, but will help convert those regression models into an IPM. ipmr handles density dependence and environmental stochasticity, with a couple of options for implementing the latter. In addition, provides functions to avoid unintentional eviction of individuals from models. Additionally, provides model diagnostic tools, plotting functionality, stochastic/deterministic simulations, and analysis tools. Integral projection models are described in depth by Easterling et al. (2000) <doi:10.1890/0012-9658(2000)081[0694:SSSAAN]2.0.CO;2>, Merow et al. (2013) <doi:10.1111/2041-210X.12146>, Rees et al. (2014) <doi:10.1111/1365-2656.12178>, and Metcalf et al. (2015) <doi:10.1111/2041-210X.12405>. Williams et al. (2012) <doi:10.1890/11-2147.1> discuss the problem of unintentional eviction.
Implementation of a transfer learning framework employing distribution mapping based domain transfer. Uses the renowned concept of histogram matching (see Gonzalez and Fittes (1977) <doi:10.1016/0094-114X(77)90062-3>, Gonzalez and Woods (2008) <isbn:9780131687288>) and extends it to include distribution measures like kernel density estimates (KDE; see Wand and Jones (1995) <isbn:978-0-412-55270-0>, Jones et al. (1996) <doi:10.2307/2291420). In the typical application scenario, one can use the underlying sample distributions (histogram or KDE) to generate a map between two distinct but related domains to transfer the target data to the source domain and utilize the available source data for better predictive modeling design. Suitable for the case where a one-to-one sample matching is not possible, thus one needs to transform the underlying data distribution to utilize the more available data for modeling.
This is a small, lightweight package that lets users investigate the distribution of genotypes in genotype-by-sequencing (GBS) data where they expect (by and large) Hardy-Weinberg equilibrium, in order to assess rates of genotyping errors and the dependence of those rates on read depth. It implements a Markov chain Monte Carlo (MCMC) sampler using Rcpp to compute a Bayesian estimate of what we call the heterozygote miscall rate for restriction-associated digest (RAD) sequencing data and other types of reduced representation GBS data. It also provides functions to generate plots of expected and observed genotype frequencies. Some background on these topics can be found in a recent paper "Recent advances in conservation and population genomics data analysis" by Hendricks et al. (2018) <doi:10.1111/eva.12659>, and another paper describing the MCMC approach is in preparation with Gordon Luikart and Thierry Gosselin.
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi: 10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.
Multidimensional scaling (MDS) functions for various tasks that are beyond the beta stage and way past the alpha stage. Currently, options are available for weights, restrictions, classical scaling or principal coordinate analysis, transformations (linear, power, Box-Cox, spline, ordinal), outlier mitigation (rdop), out-of-sample estimation (predict), negative dissimilarities, fast and faster executions with low memory footprints, penalized restrictions, cross-validation-based penalty selection, supplementary variable estimation (explain), additive constant estimation, mixed measurement level distance calculation, restricted classical scaling, etc. More will come in the future. References. Busing (2024) "A Simple Population Size Estimator for Local Minima Applied to Multidimensional Scaling". Manuscript submitted for publication. Busing (2025) "Node Localization by Multidimensional Scaling with Iterative Majorization". Manuscript submitted for publication. Busing (2025) "Faster Multidimensional Scaling". Manuscript in preparation. Barroso and Busing (2025) "e-RDOP, Relative Density-Based Outlier Probabilities, Extended to Proximity Mapping". Manuscript submitted for publication.
An end-to-end toolkit for land use and land cover classification using big Earth observation data. Builds satellite image data cubes from cloud collections. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Enables merging of multi-source imagery (SAR, optical, DEM). Includes functions for quality assessment of training samples using self-organized maps and to reduce training samples imbalance. Provides machine learning algorithms including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolution neural networks, and temporal attention encoders. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference. Enables best practices for estimating area and assessing accuracy of land change. Includes object-based spatio-temporal segmentation for space-time OBIA. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
This package provides a Bayesian latent space model for complex networks, either weighted or unweighted. Given an observed input graph, the estimates for the latent coordinates of the nodes are obtained through a Bayesian MCMC algorithm. The overall likelihood of the graph depends on a fundamental probability equation, which is defined so that ties are more likely to exist between nodes whose latent space coordinates are close. The package is mainly based on the model by Hoff, Raftery and Handcock (2002) <doi:10.1198/016214502388618906> and contains some extra features (e.g., removal of the Procrustean step, weights implemented as coefficients of the latent distances, 3D plots). The original code related to the above model was retrieved from <https://www.stat.washington.edu/people/pdhoff/Code/hoff_raftery_handcock_2002_jasa/>. Users can inspect the MCMC simulation, create and customize insightful graphical representations or apply clustering techniques.
Mapping, spatial analysis, and statistical modeling of microdata from sources such as the Demographic and Health Surveys <https://www.dhsprogram.com/> and Integrated Public Use Microdata Series <https://www.ipums.org/>. It can also be extended to other datasets. The package supports spatial correlation index construction and visualization, along with empirical Bayes approximation of regression coefficients in a multistage setup. The main functionality is repeated regression â for example, if we have to run regression for n groups, the group ID should be vertically composed into the variable for the parameter `location_var`. It can perform various kinds of regression, such as Generalized Regression Models, logit, probit, and more. Additionally, it can incorporate interaction effects. The key benefit of the package is its ability to store the regression results performed repeatedly on a dataset by the group ID, along with respective p-values and map those estimates.
Entropy weighted k-means (ewkm) by Liping Jing, Michael K. Ng and Joshua Zhexue Huang (2007) <doi:10.1109/TKDE.2007.1048> is a weighted subspace clustering algorithm that is well suited to very high dimensional data. Weights are calculated as the importance of a variable with regard to cluster membership. The two-level variable weighting clustering algorithm tw-k-means (twkm) by Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang and Yunming Ye (2013) <doi:10.1109/TKDE.2011.262> introduces two types of weights, the weights on individual variables and the weights on variable groups, and they are calculated during the clustering process. The feature group weighted k-means (fgkm) by Xiaojun Chen, Yunminng Ye, Xiaofei Xu and Joshua Zhexue Huang (2012) <doi:10.1016/j.patcog.2011.06.004> extends this concept by grouping features and weighting the group in addition to weighting individual features.
This package implements the Single Transferable Vote (STV) electoral system, with clear explanatory graphics. The core function stv() uses Meek's method, the purest expression of the simple principles of STV, but which requires electronic counting. It can handle votes expressing equal preferences for subsets of the candidates. A function stv.wig() implementing the Weighted Inclusive Gregory method, as used in Scottish council elections, is also provided, and with the same options, as described in the manual. The required vote data format is as an R list: a function pref.data() is provided to transform some commonly used data formats into this format. References for methodology: Hill, Wichmann and Woodall (1987) <doi:10.1093/comjnl/30.3.277>, Hill, David (2006) <https://www.votingmatters.org.uk/ISSUE22/I22P2.pdf>, Mollison, Denis (2023) <arXiv:2303.15310>, (see also the package manual pref_pkg_manual.pdf).
This package provides functions that allow you to generate and compare power spectral density (PSD) plots given time series data. Fast Fourier Transform (FFT) is used to take a time series data, analyze the oscillations, and then output the frequencies of these oscillations in the time series in the form of a PSD plot.Thus given a time series, the dominant frequencies in the time series can be identified. Additional functions in this package allow the dominant frequencies of multiple groups of time series to be compared with each other. To see example usage with the main functions of this package, please visit this site: <https://yhhc2.github.io/psdr/articles/Introduction.html>. The mathematical operations used to generate the PSDs are described in these sites: <https://www.mathworks.com/help/matlab/ref/fft.html>. <https://www.mathworks.com/help/signal/ug/power-spectral-density-estimates-using-fft.html>.