Integrating a stratified structure in the population in a sampling design can considerably reduce the variance of the Horvitz-Thompson estimator. We propose in this package different methods to handle the selection of a balanced sample in stratified population. For more details see Raphaël Jauslin, Esther Eustache and Yves Tillé (2021) <doi:10.1007/s42081-021-00134-y>. The package propose also a method based on optimal transport and balanced sampling, see Raphaël Jauslin and Yves Tillé <doi:10.1016/j.jspi.2022.12.003>.
Given a time series or pseudo-times series of gene expression data, we might wish to know: Do the changes in gene expression in these data exhibit directionality? Are there turning points in this directionality. Do different subsets of the data move in different directions? This package uses spherical geometry to probe these sorts of questions. In particular, if we are looking at (say) the first n dimensions of the PCA of gene expression, directionality can be detected as the clustering of points on the (n-1)-dimensional sphere.
Priority-ElasticNet extends the Priority-LASSO method (Klau et al. (2018) <doi:10.1186/s12859-018-2344-6>) by incorporating the ElasticNet penalty, allowing for both L1 and L2 regularization. This approach fits successive ElasticNet models for several blocks of (omics) data with different priorities, using the predicted values from each block as an offset for the subsequent block. It also offers robust options to handle block-wise missingness in multi-omics data, improving the flexibility and applicability of the model in the presence of incomplete datasets.
This package provides a statistical disclosure control tool to protect frequency tables in cases where small values are sensitive. The function PLSrounding() performs small count rounding of necessary inner cells so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The methodology is described in Langsrud and Heldal (2018) <https://www.researchgate.net/publication/327768398_An_Algorithm_for_Small_Count_Rounding_of_Tabular_Data>.
Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>.
This r-acceptancesampling provides functionality for creating and evaluating acceptance sampling plans. Acceptance sampling is a methodology commonly used in quality control and improvement. International standards of acceptance sampling provide sampling plans for specific circumstances. The aim of this package is to provide an easy-to-use interface to visualize single, double or multiple sampling plans. In addition, methods have been provided to enable the user to assess sampling plans against pre-specified levels of performance, as measured by the probability of acceptance for a given level of quality in the lot.
This package provides a common problem faced by journal reviewers and authors is the question of whether the results of a replication study are consistent with the original published study. One solution to this problem is to examine the effect size from the original study and generate the range of effect sizes that could reasonably be obtained (due to random sampling) in a replication attempt (i.e., calculate a prediction interval). This package has functions that calculate the prediction interval for the correlation (i.e., r), standardized mean difference (i.e., d-value), and mean.
The Function performs a parallel analysis using simulated polychoric correlation matrices. The nth-percentile of the eigenvalues distribution obtained from both the randomly generated and the real data polychoric correlation matrices is returned. A plot comparing the two types of eigenvalues (real and simulated) will help determine the number of real eigenvalues that outperform random data. The function is based on the idea that if real data are non-normal and the polychoric correlation matrix is needed to perform a Factor Analysis, then the Parallel Analysis method used to choose a non-random number of factors should also be based on randomly generated polychoric correlation matrices and not on Pearson correlation matrices. Random data sets are simulated assuming or a uniform or a multinomial distribution or via the bootstrap method of resampling (i.e., random permutations of cases). Also Multigroup Parallel analysis is made available for random (uniform and multinomial distribution and with or without difficulty factor) and bootstrap methods. An option to choose between default or full output is also available as well as a parameter to print Fit Statistics (Chi-squared, TLI, RMSEA, RMR and BIC) for the factor solutions indicated by the Parallel Analysis. Also weighted correlation matrices may be considered for PA.
Package xid is a globally unique id generator suited for web scale. Features:
zize: 12 bytes (96 bits), smaller than UUID, larger than snowflake
base32 hex encoded by default (20 chars when transported as printable string, still sortable)
mon configured, you don't need set a unique machine and/or data center id
k-ordered
embedded time with 1 second precision
unicity guaranteed for 16,777,216 (24 bits) unique ids per second and per host/process
lock-free (i.e.: unlike UUIDv1 and v2)
Functional differences between the cerebral hemispheres are a fundamental characteristic of the human brain. Researchers interested in studying these differences often infer underlying hemispheric dominance for a certain function (e.g., language) from laterality indices calculated from observed performance or brain activation measures . However, any inference from observed measures to latent (unobserved) classes has to consider the prior probability of class membership in the population. The provided functions implement a Bayesian model for predicting hemispheric dominance from observed laterality indices (Sorensen and Westerhausen, Laterality: Asymmetries of Body, Brain and Cognition, 2020, <doi:10.1080/1357650X.2020.1769124>).
This package implements the Quantile-on-Quantile (QQ) regression methodology developed by Sim and Zhou (2015) <doi:10.1016/j.jbankfin.2015.01.013>. QQ regression estimates the effect that quantiles of one variable have on quantiles of another, capturing the dependence between distributions. The package provides functions for QQ regression estimation, 3D surface visualization with MATLAB'-style color schemes ('Jet', Viridis', Plasma'), heatmaps, contour plots, and quantile correlation analysis. Uses quantreg for quantile regression and plotly for interactive visualizations. Particularly useful for examining relationships between financial variables, oil prices, and stock returns under different market conditions.
This package provides a toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.
This package implements the methodology introduced in Capezza, Lepore, and Paynabar (2025) <doi:10.1080/00401706.2025.2561744> for process monitoring with limited labeling resources. The package provides functions to (i) simulate data streams with true latent states and multivariate Gaussian observations as done in the paper, (ii) fit partially hidden Markov models (pHMMs) using a constrained Baum-Welch algorithm with partial labels, and (iii) perform stream-based active learning that balances exploration and exploitation to decide whether to request labels in real time. The methodology is particularly suited for statistical process monitoring in industrial applications where labeling is costly.
This package provides functions for deep learning estimation of Conditional Average Treatment Effects (CATEs) from meta-learner models and Population Average Treatment Effects on the Treated (PATT) in settings with treatment noncompliance using reticulate, TensorFlow and Keras3. Functions in the package also implements the conformal prediction framework that enables computation and illustration of conformal prediction (CP) intervals for estimated individual treatment effects (ITEs) from meta-learner models. Additional functions in the package permit users to estimate the meta-learner CATEs and the PATT in settings with treatment noncompliance using weighted ensemble learning via the super learner approach and R neural networks.
This package provides functions and examples for the weak and strong density asymmetry measures in the articles: "A measure of asymmetry", Patil, Patil and Bagkavos (2012) <doi:10.1007/s00362-011-0401-6> and "A measure of asymmetry based on a new necessary and sufficient condition for symmetry", Patil, Bagkavos and Wood (2014) <doi:10.1007/s13171-013-0034-z>. The measures provided here are useful for quantifying the asymmetry of the shape of a density of a random variable. The package facilitates implementation of the measures which are applicable in a variety of fields including e.g. probability theory, statistics and economics.
Weighted descriptive statistics is the discipline of quantitatively describing the main features of real-valued fuzzy data which usually given from a fuzzy population. One can summarize this special kind of fuzzy data numerically or graphically using this package. To interpret some of the properties of one or several sets of real-valued fuzzy data, numerically summarize is possible by some weighted statistics which are designed in this package such as mean, variance, covariance and correlation coefficent. Also, graphically interpretation can be given by weighted histogram and weighted scatter plot using this package to describe properties of real-valued fuzzy data set.
You can easily add advanced cohort-building component to your analytical dashboard or simple Shiny app. Then you can instantly start building cohorts using multiple filters of different types, filtering datasets, and filtering steps. Filters can be complex and data-specific, and together with multiple filtering steps you can use complex filtering rules. The cohort-building sidebar panel allows you to easily work with filters, add and remove filtering steps. It helps you with handling missing values during filtering, and provides instant filtering feedback with filter feedback plots. The GUI panel is not only compatible with native shiny bookmarking, but also provides reproducible R code.
Conduct multiple quantitative trait loci (QTL) mapping under the framework of random-QTL-effect linear mixed model. First, each position on the genome is detected in order to obtain a negative logarithm P-value curve against genome position. Then, all the peaks on each effect (additive or dominant) curve are viewed as potential QTL, all the effects of the potential QTL are included in a multi-QTL model, their effects are estimated by empirical Bayes in doubled haploid population or by adaptive lasso in F2 population, and true QTL are identified by likelihood radio test. See Wen et al. (2018) <doi:10.1093/bib/bby058>.
Implementation of the exact, normal approximation, and simulation-based methods for computing the probability mass function (pmf) and cumulative distribution function (cdf) of the Poisson-Multinomial distribution, together with a random number generator for the distribution. The exact method is based on multi-dimensional fast Fourier transformation (FFT) of the characteristic function of the Poisson-Multinomial distribution. The normal approximation method uses a multivariate normal distribution to approximate the pmf of the distribution based on central limit theorem. The simulation method is based on the law of large numbers. Details about the methods are available in Lin, Wang, and Hong (2022) <DOI:10.1007/s00180-022-01299-0>.
The effects of the site may severely bias the accuracy of a multisite machine-learning model, even if the analysts removed them when fitting the model in the training set and applying the model in the test set (Solanes et al., Neuroimage 2023, 265:119800). This simple R package estimates the accuracy of a multisite machine-learning model unbiasedly, as described in (Solanes et al., Psychiatry Research: Neuroimaging 2021, 314:111313). It currently supports the estimation of sensitivity, specificity, balanced accuracy (for binary or multinomial variables), the area under the curve, correlation, mean squarer error, and hazard ratio for binomial, multinomial, gaussian, and survival (time-to-event) outcomes.
The aim of the spatial downscaling is to increase the spatial resolution of the gridded geospatial input data. This package contains two deep learning based spatial downscaling methods, super-resolution deep residual network (SRDRN) (Wang et al., 2021 <doi:10.1029/2020WR029308>) and UNet (Ronneberger et al., 2015 <doi:10.1007/978-3-319-24574-4_28>), along with a statistical baseline method bias correction and spatial disaggregation (Wood et al., 2004 <doi:10.1023/B:CLIM.0000013685.99609.9e>). The SRDRN and UNet methods are implemented to optionally account for cyclical temporal patterns in case of spatio-temporal data. For more details of the methods, see Sipilä et al. (2025) <doi:10.48550/arXiv.2512.13753>.
Biodiversity areas, especially primary forest, serve a multitude of functions for local economy, regional functionality of the ecosystems as well as the global health of our planet. Recently, adverse changes in human land use practices and climatic responses to increased greenhouse gas emissions, put these biodiversity areas under a variety of different threats. The present package helps to analyse a number of biodiversity indicators based on freely available geographical datasets. It supports computational efficient routines that allow the analysis of potentially global biodiversity portfolios. The primary use case of the package is to support evidence based reporting of an organization's effort to protect biodiversity areas under threat and to identify regions were intervention is most duly needed.
Weather indices are formed from weather variables in this package. The users can input any number of weather variables recorded over any number of weeks. This package has no restriction on the number of weeks and weather variables to be taken as input.The details of the method can be seen (i)'Joint effects of weather variables on rice yields by R. Agrawal, R. C. Jain and M. P. Jha in Mausam, vol. 34, pp. 189-194, 1983,<doi:10.54302/mausam.v34i2.2392>,(ii)'Improved weather indices based Bayesian regression model for forecasting crop yield by M. Yeasin, K. N. Singh, A. Lama and B. Gurung in Mausam, vol. 72, pp.879-886, 2021,<doi:10.54302/mausam.v72i4.670>.
An implementation of a non-parametric statistical model using a parallelised Monte Carlo sampling scheme. The method implemented in this package allows non-parametric inference to be regularized for small sample sizes, while also being more accurate than approximations such as variational Bayes. The concentration parameter is an effective sample size parameter, determining the faith we have in the model versus the data. When the concentration is low, the samples are close to the exact Bayesian logistic regression method; when the concentration is high, the samples are close to the simplified variational Bayes logistic regression. The method is described in full in the paper Lyddon, Walker, and Holmes (2018), "Nonparametric learning from Bayesian models with randomized objective functions" <arXiv:1806.11544>.