This package provides a tool for computing probabilities and other quantities that are relevant in selecting performance criteria for discrete trial training. The main function, miebl()
, computes Bayesian and frequentist probabilities and bounds for each of n possible performance criterion choices when attempting to determine a student's true mastery level by counting their number of successful attempts at displaying learning among n trials. The reporting function miebl_re()
takes output from miebl()
and prepares it into a brief report for a specific criterion. miebl_cp()
combines 2 to 5 distributions of true mastery level given performance criterion in one plot for comparison. Ramos (2025) <doi:10.1007/s40617-025-01058-9>.
The classical two-sample t-test works well for the normally distributed data or data with large sample size. The tcfu()
and tt()
tests implemented in this package provide better type-I-error control with more accurate power when testing the equality of two-sample means for skewed populations having unequal variances. These tests are especially useful when the sample sizes are moderate. The tcfu()
uses the Cornish-Fisher expansion to achieve a better approximation to the true percentiles. The tt()
provides transformations of the Welch's t-statistic so that the sampling distribution become more symmetric. For more technical details, please refer to Zhang (2019) <http://hdl.handle.net/2097/40235>.
Consolidates and calculates different sets of time-series features from multiple R and Python packages including Rcatch22 Henderson, T. (2021) <doi:10.5281/zenodo.5546815>, feasts O'Hara-Wild, M., Hyndman, R., and Wang, E. (2021) <https://CRAN.R-project.org/package=feasts>, tsfeatures Hyndman, R., Kang, Y., Montero-Manso, P., Talagala, T., Wang, E., Yang, Y., and O'Hara-Wild, M. (2020) <https://CRAN.R-project.org/package=tsfeatures>, tsfresh Christ, M., Braun, N., Neuffer, J., and Kempa-Liehr A.W. (2018) <doi:10.1016/j.neucom.2018.03.067>, TSFEL Barandas, M., et al. (2020) <doi:10.1016/j.softx.2020.100456>, and Kats Facebook Infrastructure Data Science (2021) <https://facebookresearch.github.io/Kats/>.
This package provides functions for simulating Markov chains using the Barker proposal to compute Markov chain Monte Carlo (MCMC) estimates of expectations with respect to a target distribution on a real-valued vector space. The Barker proposal, described in Livingstone and Zanella (2022) <doi:10.1111/rssb.12482>, is a gradient-based MCMC algorithm inspired by the Barker accept-reject rule. It combines the robustness of simpler MCMC schemes, such as random-walk Metropolis, with the efficiency of gradient-based methods, such as the Metropolis adjusted Langevin algorithm. The key function provided by the package is sample_chain()
, which allows sampling a Markov chain with a specified target distribution as its stationary distribution. The chain is sampled by generating proposals and accepting or rejecting them using a Metropolis-Hasting acceptance rule. During an initial warm-up stage, the parameters of the proposal distribution can be adapted, with adapters available to both: tune the scale of the proposals by coercing the average acceptance rate to a target value; tune the shape of the proposals to match covariance estimates under the target distribution. As well as the default Barker proposal, the package also provides implementations of alternative proposal distributions, such as (Gaussian) random walk and Langevin proposals. Optionally, if BridgeStan's
R interface <https://roualdes.github.io/bridgestan/latest/languages/r.html>, available on GitHub
<https://github.com/roualdes/bridgestan>, is installed, then BridgeStan
can be used to specify the target distribution to sample from.
This package provides various basis expansions for flexible regression modeling, including random Fourier features (Rahimi & Recht, 2007) <https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf>, exact kernel / Gaussian process feature maps, Bayesian Additive Regression Trees (BART) (Chipman et al., 2010) <doi:10.1214/09-AOAS285> prior features, and a helpful interface for n-way interactions. The provided functions may be used within any modeling formula, allowing the use of kernel methods and other basis expansions in modeling functions that do not otherwise support them. Along with the basis expansions, a number of kernel functions are also provided, which support kernel arithmetic to form new kernels. Basic ridge regression functionality is included as well.
The developed function is a comprehensive tool for the analysis of India Meteorological Department (IMD) NetCDF
rainfall data. Specifically designed to process high-resolution daily gridded rainfall datasets. It provides four key functions to process IMD NetCDF
rainfall data and create rasters for various temporal scales, including annual, seasonal, monthly, and weekly rainfall. For method details see, Malik, A. (2019).<DOI:10.1007/s12517-019-4454-5>. It supports different aggregation methods, such as sum, min, max, mean, and standard deviation. These functions are designed for spatio-temporal analysis of rainfall patterns, trend analysis,geostatistical modeling of rainfall variability, identifying rainfall anomalies and extreme events and can be an input for hydrological and agricultural models.
This package provides a d-statistic tests the null hypothesis of no treatment effect in a matched, nonrandomized study of the effects caused by treatments. A d-statistic focuses on subsets of matched pairs that demonstrate insensitivity to unmeasured bias in such an observational study, correcting for double-use of the data by conditional inference. This conditional inference can, in favorable circumstances, substantially increase the power of a sensitivity analysis (Rosenbaum (2010) <doi:10.1007/978-1-4419-1213-8_14>). There are two examples, one concerning unemployment from Lalive et al. (2006) <doi:10.1111/j.1467-937X.2006.00406.x>, the other concerning smoking and periodontal disease from Rosenbaum (2017) <doi:10.1214/17-STS621>.
The functions compute the double-entry intraclass correlation, which is an index of profile similarity (Furr, 2010; McCrae
, 2008). The double-entry intraclass correlation is a more precise index of the agreement of two empirically observed profiles than the often-used intraclass correlation (McCrae
, 2008). Profiles comprising correlations are automatically transformed according to the Fisher z-transformation before the double-entry intraclass correlation is calculated. If the profiles comprise scores such as sum scores from various personality scales, it is recommended to standardize each individual score prior to computation of the double-entry intraclass correlation (McCrae
, 2008). See Furr (2010) <doi:10.1080/00223890903379134> or McCrae
(2008) <doi:10.1080/00223890701845104> for details.
Map image classification efficacy (MICE) adjusts the accuracy rate relative to a random classification baseline (Shao et al. (2021)<doi:10.1109/ACCESS.2021.3116526> and Tang et al. (2024)<doi:10.1109/TGRS.2024.3446950>). Only the proportions from the reference labels are considered, as opposed to the proportions from the reference and predictions, as is the case for the Kappa statistic. This package offers means to calculate MICE and adjusted versions of class-level user's accuracy (i.e., precision) and producer's accuracy (i.e., recall) and F1-scores. Class-level metrics are aggregated using macro-averaging. Functions are also made available to estimate confidence intervals using bootstrapping and statistically compare two classification results.
This package implements various methods for eliciting a probability distribution for a single parameter from an expert or a group of experts. The expert provides a small number of probability judgements, corresponding to points on his or her cumulative distribution function. A range of parametric distributions can then be fitted and displayed, with feedback provided in the form of fitted probabilities and percentiles. For multiple experts, a weighted linear pool can be calculated. Also includes functions for eliciting beliefs about population distributions; eliciting multivariate distributions using a Gaussian copula; eliciting a Dirichlet distribution; eliciting distributions for variance parameters in a random effects meta-analysis model; survival extrapolation. R Shiny apps for most of the methods are included.
This package performs binary classification via Group Method of Data Handling (GMDH) - type neural network algorithms. There exist two main algorithms available in GMDH()
and dceGMDH()
functions. GMDH()
performs classification via GMDH algorithm for a binary response and returns important variables. dceGMDH()
performs classification via diverse classifiers ensemble based on GMDH (dce-GMDH) algorithm. Also, the package produces a well-formatted table of descriptives for a binary response. Moreover, it produces confusion matrix, its related statistics and scatter plot (2D and 3D) with classification labels of binary classes to assess the prediction performance. All GMDH2 functions are designed for a binary response (Dag et al., 2019, <https://download.atlantis-press.com/article/125911202.pdf>).
Pleiotropy-informed significance analysis of genome-wide association studies (GWAS) with surrogate functional false discovery rates (sfFDR
). The sfFDR
framework adapts the fFDR
to leverage informative data from multiple sets of GWAS summary statistics to increase power in study while accommodating for linkage disequilibrium. sfFDR
provides estimates of key FDR quantities in a significance analysis such as the functional local FDR and q-value, and uses these estimates to derive a functional p-value for type I error rate control and a functional local Bayes factor for post-GWAS analyses (e.g., fine mapping and colocalization). The sfFDR
framework is described in Bass and Wallace (2024) <doi:10.1101/2024.09.24.24314276>.
This package provides a set of tools to perform Ecological Niche Modeling with presence-absence data. It includes algorithms for data partitioning, model fitting, calibration, evaluation, selection, and prediction. Other functions help to explore signals of ecological niche using univariate and multivariate analyses, and model features such as variable response curves and variable importance. Unique characteristics of this package are the ability to exclude models with concave quadratic responses, and the option to clamp model predictions to specific variables. These tools are implemented following principles proposed in Cobos et al., (2022) <doi:10.17161/bi.v17i.15985>, Cobos et al., (2019) <doi:10.7717/peerj.6281>, and Peterson et al., (2008) <doi:10.1016/j.ecolmodel.2007.11.008>.
The accurate annotation of genes and Quantitative Trait Loci (QTLs) located within candidate markers and/or regions (haplotypes, windows, CNVs, etc) is a crucial step the most common genomic analyses performed in livestock, such as Genome-Wide Association Studies or transcriptomics. The Genomic Annotation in Livestock for positional candidate LOci (GALLO) is an R package designed to provide an intuitive and straightforward environment to annotate positional candidate genes and QTLs from high-throughput genetic studies in livestock. Moreover, GALLO allows the graphical visualization of gene and QTL annotation results, data comparison among different grouping factors (e.g., methods, breeds, tissues, statistical models, studies, etc.), and QTL enrichment in different livestock species including cattle, pigs, sheep, and chicken, among others.
Evaluates land suitability for different crops production. The package is based on the Food and Agriculture Organization (FAO) and the International Rice Research Institute (IRRI) methodology for land evaluation. Development of ALUES is inspired by similar tool for land evaluation, Land Use Suitability Evaluation Tool (LUSET). The package uses fuzzy logic approach to evaluate land suitability of a particular area based on inputs such as rainfall, temperature, topography, and soil properties. The membership functions used for fuzzy modeling are the following: Triangular, Trapezoidal and Gaussian. The methods for computing the overall suitability of a particular area are also included, and these are the Minimum, Maximum and Average. Finally, ALUES is a highly optimized library with core algorithms written in C++.
This package provides a comprehensive suite of tools for managing, processing, and analyzing data from the IFCB. I R FlowCytobot
('iRfcb
') supports quality control, geospatial analysis, and preparation of IFCB data for publication in databases like <https://www.gbif.org>, <https://www.obis.org>, <https://emodnet.ec.europa.eu/en>, <https://shark.smhi.se/>, and <https://www.ecotaxa.org>. The package integrates with the MATLAB ifcb-analysis tool, which is described in Sosik and Olson (2007) <doi:10.4319/lom.2007.5.204>, and provides features for working with raw, manually classified, and machine learningâ classified image datasets. Key functionalities include image extraction, particle size distribution analysis, taxonomic data handling, and biomass concentration calculations, essential for plankton research.
This package provides a method that estimates an IV-optimal individualized treatment rule. An individualized treatment rule is said to be IV-optimal if it minimizes the maximum risk with respect to the putative IV and the set of IV identification assumptions. Please refer to <arXiv:2002.02579>
for more details on the methodology and some theory underpinning the method. Function IV-PILE()
uses functions in the package locClass
'. Package locClass
can be accessed and installed from the R-Forge repository via the following link: <https://r-forge.r-project.org/projects/locclass/>. Alternatively, one can install the package by entering the following in R: install.packages("locClass
", repos="<http://R-Forge.R-project.org>")'.
This package provides a Modern and Flexible Neo4J Driver, allowing you to query data on a Neo4J server and handle the results in R. It's modern in the sense it provides a driver that can be easily integrated in a data analysis workflow, especially by providing an API working smoothly with other data analysis and graph packages. It's flexible in the way it returns the results, by trying to stay as close as possible to the way Neo4J returns data. That way, you have the control over the way you will compute the results. At the same time, the result is not too complex, so that the "heavy lifting" of data wrangling is not left to the user.
Performs unconditional exact tests and power calculations for 2x2 contingency tables. For comparing two independent proportions, performs Barnard's test (1945) using the original CSM test (Barnard (1947)), using Fisher's p-value referred to as Boschloo's test (1970), or using a Z-statistic (Suissa and Shuster (1985)). For comparing two binary proportions, performs unconditional exact test using McNemar's Z-statistic (Berger and Sidik (2003)), using McNemar's Z-statistic with continuity correction, or using CSM test. Calculates confidence intervals for the difference in proportion.
This package provides interface to the MATLAB toolbox Flexible Statistical Data Analysis (FSDA) which is comprehensive and computationally efficient software package for robust statistics in regression, multivariate and categorical data analysis. The current R version implements tools for regression: (forward search, S- and MM-estimation, least trimmed squares (LTS) and least median of squares (LMS)), for multivariate analysis (forward search, S- and MM-estimation), for cluster analysis and cluster-wise regression. The distinctive feature of our package is the possibility of monitoring the statistics of interest as a function of breakdown point, efficiency or subset size, depending on the estimator. This is accompanied by a rich set of graphical features, such as dynamic brushing, linking, particularly useful for exploratory data analysis.
This package provides a model that provides researchers with a powerful tool for the classification and study of native corn by aiding in the identification of racial complexes which are fundamental to Mexico's agriculture and culture. This package has been developed based on data collected by "Proyecto Global de Maà ces Nativos México", which has conducted exhaustive surveys across the country to document the qualitative and quantitative characteristics of different types of native maize. The trained model uses a robust and diverse dataset, enabling it to achieve an 80% accuracy in classifying maize racial complexes. The characteristics included in the analysis comprise geographic location, grain and cob colors, as well as various physical measurements, such as lengths and widths.
The age is estimated by calculating the Dirichlet Normal Energy (DNE) on the whole auricular surface and the apex of the auricular surface. It involves three estimation methods: principal component discriminant analysis (PCQDA), and principal component logistic regression analysis (PCLR) methods, principal component regression analysis with Southeast Asian (A_PCR), and principal component regression analysis with multipopulation (M_PCR). The package is created with the data from the Louis Lopes Collection in Lisbon, the 21st Century Identified Human Remains Collection in Coimbra, and the CAL Milano Cemetery Skeletal Collection in Milan, and the skeletal collection at Khon Kaen University (KKU) Human Skeletal Research Centre (HSRC), housed in the Department of Anatomy in the Faculty of Medicine at KKU in Khon Kaen.
Fits Stable Isotope Mixing Models (SIMMs) and is meant as a longer term replacement to the previous widely-used package SIAR. SIMMs are used to infer dietary proportions of organisms consuming various food sources from observations on the stable isotope values taken from the organisms tissue samples. However SIMMs can also be used in other scenarios, such as in sediment mixing or the composition of fatty acids. The main functions are simmr_load()
and simmr_mcmc()
. The two vignettes contain a quick start and a full listing of all the features. The methods used are detailed in the papers Parnell et al 2010 <doi:10.1371/journal.pone.0009672>, and Parnell et al 2013 <doi:10.1002/env.2221>.
Run mixed-effects models that include weights at every level. The WeMix
package fits a weighted mixed model, also known as a multilevel, mixed, or hierarchical linear model (HLM). The weights could be inverse selection probabilities, such as those developed for an education survey where schools are sampled probabilistically, and then students inside of those schools are sampled probabilistically. Although mixed-effects models are already available in R, WeMix
is unique in implementing methods for mixed models using weights at multiple levels. Both linear and logit models are supported. Models may have up to three levels. Random effects are estimated using the PIRLS algorithm from lme4pureR
(Walker and Bates (2013) <https://github.com/lme4/lme4pureR>
).