The Function performs a parallel analysis using simulated polychoric correlation matrices. The nth-percentile of the eigenvalues distribution obtained from both the randomly generated and the real data polychoric correlation matrices is returned. A plot comparing the two types of eigenvalues (real and simulated) will help determine the number of real eigenvalues that outperform random data. The function is based on the idea that if real data are non-normal and the polychoric correlation matrix is needed to perform a Factor Analysis, then the Parallel Analysis method used to choose a non-random number of factors should also be based on randomly generated polychoric correlation matrices and not on Pearson correlation matrices. Random data sets are simulated assuming or a uniform or a multinomial distribution or via the bootstrap method of resampling (i.e., random permutations of cases). Also Multigroup Parallel analysis is made available for random (uniform and multinomial distribution and with or without difficulty factor) and bootstrap methods. An option to choose between default or full output is also available as well as a parameter to print Fit Statistics (Chi-squared, TLI, RMSEA, RMR and BIC) for the factor solutions indicated by the Parallel Analysis. Also weighted correlation matrices may be considered for PA.
Package xid is a globally unique id generator suited for web scale. Features:
zize: 12 bytes (96 bits), smaller than UUID, larger than snowflake
base32 hex encoded by default (20 chars when transported as printable string, still sortable)
mon configured, you don't need set a unique machine and/or data center id
k-ordered
embedded time with 1 second precision
unicity guaranteed for 16,777,216 (24 bits) unique ids per second and per host/process
lock-free (i.e.: unlike UUIDv1 and v2)
Functional differences between the cerebral hemispheres are a fundamental characteristic of the human brain. Researchers interested in studying these differences often infer underlying hemispheric dominance for a certain function (e.g., language) from laterality indices calculated from observed performance or brain activation measures . However, any inference from observed measures to latent (unobserved) classes has to consider the prior probability of class membership in the population. The provided functions implement a Bayesian model for predicting hemispheric dominance from observed laterality indices (Sorensen and Westerhausen, Laterality: Asymmetries of Body, Brain and Cognition, 2020, <doi:10.1080/1357650X.2020.1769124>).
This package provides a toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.
This package implements the methodology introduced in Capezza, Lepore, and Paynabar (2025) <doi:10.1080/00401706.2025.2561744> for process monitoring with limited labeling resources. The package provides functions to (i) simulate data streams with true latent states and multivariate Gaussian observations as done in the paper, (ii) fit partially hidden Markov models (pHMMs) using a constrained Baum-Welch algorithm with partial labels, and (iii) perform stream-based active learning that balances exploration and exploitation to decide whether to request labels in real time. The methodology is particularly suited for statistical process monitoring in industrial applications where labeling is costly.
This package provides functions for deep learning estimation of Conditional Average Treatment Effects (CATEs) from meta-learner models and Population Average Treatment Effects on the Treated (PATT) in settings with treatment noncompliance using reticulate, TensorFlow and Keras3. Functions in the package also implements the conformal prediction framework that enables computation and illustration of conformal prediction (CP) intervals for estimated individual treatment effects (ITEs) from meta-learner models. Additional functions in the package permit users to estimate the meta-learner CATEs and the PATT in settings with treatment noncompliance using weighted ensemble learning via the super learner approach and R neural networks.
This package provides functions and examples for the weak and strong density asymmetry measures in the articles: "A measure of asymmetry", Patil, Patil and Bagkavos (2012) <doi:10.1007/s00362-011-0401-6> and "A measure of asymmetry based on a new necessary and sufficient condition for symmetry", Patil, Bagkavos and Wood (2014) <doi:10.1007/s13171-013-0034-z>. The measures provided here are useful for quantifying the asymmetry of the shape of a density of a random variable. The package facilitates implementation of the measures which are applicable in a variety of fields including e.g. probability theory, statistics and economics.
Weighted descriptive statistics is the discipline of quantitatively describing the main features of real-valued fuzzy data which usually given from a fuzzy population. One can summarize this special kind of fuzzy data numerically or graphically using this package. To interpret some of the properties of one or several sets of real-valued fuzzy data, numerically summarize is possible by some weighted statistics which are designed in this package such as mean, variance, covariance and correlation coefficent. Also, graphically interpretation can be given by weighted histogram and weighted scatter plot using this package to describe properties of real-valued fuzzy data set.
You can easily add advanced cohort-building component to your analytical dashboard or simple Shiny app. Then you can instantly start building cohorts using multiple filters of different types, filtering datasets, and filtering steps. Filters can be complex and data-specific, and together with multiple filtering steps you can use complex filtering rules. The cohort-building sidebar panel allows you to easily work with filters, add and remove filtering steps. It helps you with handling missing values during filtering, and provides instant filtering feedback with filter feedback plots. The GUI panel is not only compatible with native shiny bookmarking, but also provides reproducible R code.
Conduct multiple quantitative trait loci (QTL) mapping under the framework of random-QTL-effect linear mixed model. First, each position on the genome is detected in order to obtain a negative logarithm P-value curve against genome position. Then, all the peaks on each effect (additive or dominant) curve are viewed as potential QTL, all the effects of the potential QTL are included in a multi-QTL model, their effects are estimated by empirical Bayes in doubled haploid population or by adaptive lasso in F2 population, and true QTL are identified by likelihood radio test. See Wen et al. (2018) <doi:10.1093/bib/bby058>.
Implementation of the exact, normal approximation, and simulation-based methods for computing the probability mass function (pmf) and cumulative distribution function (cdf) of the Poisson-Multinomial distribution, together with a random number generator for the distribution. The exact method is based on multi-dimensional fast Fourier transformation (FFT) of the characteristic function of the Poisson-Multinomial distribution. The normal approximation method uses a multivariate normal distribution to approximate the pmf of the distribution based on central limit theorem. The simulation method is based on the law of large numbers. Details about the methods are available in Lin, Wang, and Hong (2022) <DOI:10.1007/s00180-022-01299-0>.
The effects of the site may severely bias the accuracy of a multisite machine-learning model, even if the analysts removed them when fitting the model in the training set and applying the model in the test set (Solanes et al., Neuroimage 2023, 265:119800). This simple R package estimates the accuracy of a multisite machine-learning model unbiasedly, as described in (Solanes et al., Psychiatry Research: Neuroimaging 2021, 314:111313). It currently supports the estimation of sensitivity, specificity, balanced accuracy (for binary or multinomial variables), the area under the curve, correlation, mean squarer error, and hazard ratio for binomial, multinomial, gaussian, and survival (time-to-event) outcomes.
Biodiversity areas, especially primary forest, serve a multitude of functions for local economy, regional functionality of the ecosystems as well as the global health of our planet. Recently, adverse changes in human land use practices and climatic responses to increased greenhouse gas emissions, put these biodiversity areas under a variety of different threats. The present package helps to analyse a number of biodiversity indicators based on freely available geographical datasets. It supports computational efficient routines that allow the analysis of potentially global biodiversity portfolios. The primary use case of the package is to support evidence based reporting of an organization's effort to protect biodiversity areas under threat and to identify regions were intervention is most duly needed.
Weather indices are formed from weather variables in this package. The users can input any number of weather variables recorded over any number of weeks. This package has no restriction on the number of weeks and weather variables to be taken as input.The details of the method can be seen (i)'Joint effects of weather variables on rice yields by R. Agrawal, R. C. Jain and M. P. Jha in Mausam, vol. 34, pp. 189-194, 1983,<doi:10.54302/mausam.v34i2.2392>,(ii)'Improved weather indices based Bayesian regression model for forecasting crop yield by M. Yeasin, K. N. Singh, A. Lama and B. Gurung in Mausam, vol. 72, pp.879-886, 2021,<doi:10.54302/mausam.v72i4.670>.
An implementation of a non-parametric statistical model using a parallelised Monte Carlo sampling scheme. The method implemented in this package allows non-parametric inference to be regularized for small sample sizes, while also being more accurate than approximations such as variational Bayes. The concentration parameter is an effective sample size parameter, determining the faith we have in the model versus the data. When the concentration is low, the samples are close to the exact Bayesian logistic regression method; when the concentration is high, the samples are close to the simplified variational Bayes logistic regression. The method is described in full in the paper Lyddon, Walker, and Holmes (2018), "Nonparametric learning from Bayesian models with randomized objective functions" <arXiv:1806.11544>.
Exploring time series for signal detection. It is specifically designed to detect possible outbreaks using infectious disease surveillance data at the European Union / European Economic Area or country level. Automatic detection tools used are presented in the paper "Monitoring count time series in R: aberration detection in public health surveillance", by Salmon (2016) <doi:10.18637/jss.v070.i10>. The package includes: - Signal Detection tool, an interactive shiny application in which the user can import external data and perform basic signal detection analyses; - An automated report in HTML format, presenting the results of the time series analysis in tables and graphs. This report can also be stratified by population characteristics (see Population variable). This project was funded by the European Centre for Disease Prevention and Control.
Case-based reasoning is a problem-solving methodology that involves solving a new problem by referring to the solution of a similar problem in a large set of previously solved problems. The key aspect of Case Based Reasoning is to determine the problem that "most closely" matches the new problem at hand. This is achieved by defining a family of distance functions and using these distance functions as parameters for local averaging regression estimates of the final result. The optimal distance function is chosen based on a specific error measure used in regression estimation. This approach allows for efficient problem-solving by leveraging past experiences and adapting solutions from similar cases. The underlying concept is inspired by the work of Dippon J. (2002) <doi:10.1016/S0167-9473(02)00058-0>.
Causal moderated mediation analysis using the methods proposed by Qin and Wang (2023) <doi:10.3758/s13428-023-02095-4>. Causal moderated mediation analysis is crucial for investigating how, for whom, and where a treatment is effective by assessing the heterogeneity of mediation mechanism across individuals and contexts. This package enables researchers to estimate and test the conditional and moderated mediation effects, assess their sensitivity to unmeasured pre-treatment confounding, and visualize the results. The package is built based on the quasi-Bayesian Monte Carlo method, because it has relatively better performance at small sample sizes, and its running speed is the fastest. The package is applicable to a treatment of any scale, a binary or continuous mediator, a binary or continuous outcome, and one or more moderators of any scale.
This package implements a bootstrap-based heterogeneity test for standardized mean differences (d), Fisher-transformed Pearson's correlations (r), and natural-logarithm-transformed odds ratio (or) in meta-analysis studies. Depending on the presence of moderators, this Monte Carlo based test can be implemented in the random- or mixed-effects model. This package uses rma() function from the R package metafor to obtain parameter estimates and likelihoods, so installation of R package metafor is required. This approach refers to the studies of Anscombe (1956) <doi:10.2307/2332926>, Haldane (1940) <doi:10.2307/2332614>, Hedges (1981) <doi:10.3102/10769986006002107>, Hedges & Olkin (1985, ISBN:978-0123363800), Silagy, Lancaster, Stead, Mant, & Fowler (2004) <doi:10.1002/14651858.CD000146.pub2>, Viechtbauer (2010) <doi:10.18637/jss.v036.i03>, and Zuckerman (1994, ISBN:978-0521432009).
We provide comprehensive draft data for major professional sports leagues, including the National Football League (NFL), National Basketball Association (NBA), and National Hockey League (NHL). It offers access to both historical and current draft data, allowing for detailed analysis and research on player biases and player performance. The package is useful for sports fans and researchers interested in identifying biases and trends within scouting reports. Created by web scraping data from leading websites that cover professional sports player scouting reports, the package allows users to filter and summarize data for analytical purposes. For further details on the methods used, please refer to Wickham (2022) "rvest: Easily Harvest (Scrape) Web Pages" <https://CRAN.R-project.org/package=rvest> and Harrison (2023) "RSelenium: R Bindings for Selenium WebDriver" <https://CRAN.R-project.org/package=RSelenium>.
This package provides functions for imputing missing item responses for dichotomous and polytomous test and assessment data. This package enables missing imputation methods that are suitable for test and assessment data, including: listwise (LW) deletion (see De Ayala et al. 2001 <doi:10.1111/j.1745-3984.2001.tb01124.x>), treating as incorrect (IN, see Lord, 1974 <doi: 10.1111/j.1745-3984.1974.tb00996.x>; Mislevy & Wu, 1996 <doi: 10.1002/j.2333-8504.1996.tb01708.x>; Pohl et al., 2014 <doi: 10.1177/0013164413504926>), person mean imputation (PM), item mean imputation (IM), two-way (TW) and response function (RF) imputation, (see Sijtsma & van der Ark, 2003 <doi: 10.1207/s15327906mbr3804_4>), logistic regression (LR) imputation, predictive mean matching (PMM), and expectationâ maximization (EM) imputation (see Finch, 2008 <doi: 10.1111/j.1745-3984.2008.00062.x>).
Projections are common dimensionality reduction methods, which represent high-dimensional data in a two-dimensional space. However, when restricting the output space to two dimensions, which results in a two dimensional scatter plot (projection) of the data, low dimensional similarities do not represent high dimensional distances coercively [Thrun, 2018] <DOI: 10.1007/978-3-658-20540-9>. This could lead to a misleading interpretation of the underlying structures [Thrun, 2018]. By means of the 3D topographic map the generalized Umatrix is able to depict errors of these two-dimensional scatter plots. The package is derived from the book of Thrun, M.C.: "Projection Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9> and the main algorithm called simplified self-organizing map for dimensionality reduction methods is published in <DOI: 10.1016/j.mex.2020.101093>.
Asciidoctor PDF is an extension for Asciidoctor that converts AsciiDoc documents to Portable Document Format (PDF) using the Prawn PDF library. It has features such as:
Direct AsciiDoc to PDF conversion
Configuration-driven theme (style and layout)
Scalable Vector Graphics (SVG) support
PDF document outline (i.e., bookmarks)
Table of contents page(s)
Document metadata (title, authors, subject, keywords, etc.)
Internal cross reference links
Syntax highlighting with Rouge, Pygments, or CodeRay
Page numbering
Customizable running content (header and footer)
“Keep together” blocks (i.e., page breaks avoided in certain block content)
Orphaned section titles avoided
Autofit verbatim blocks (as permitted by base_font_size_min setting)
Table border settings honored
Font-based icons
Custom TrueType (TTF) fonts
Double-sided printing mode (margins alternate on recto and verso pages)
This package provides a fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2024) <doi:10.1007/s11749-024-00942-w> and Papastamoulis and Milienos (2024b) <doi:10.48550/arXiv.2409.10221>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of distributions. In both cases, user-defined families of distributions are allowed under some specific requirements. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.