This package provides tools for building Rescorla-Wagner Models for Two-Alternative Forced Choice tasks, commonly employed in psychological research. Most concepts and ideas within this R package are referenced from Sutton and Barto (2018) <ISBN:9780262039246>. The package allows for the intuitive definition of RL models using simple if-else statements and three basic models built into this R package are referenced from Niv et al. (2012)<doi:10.1523/JNEUROSCI.5498-10.2012>. Our approach to constructing and evaluating these computational models is informed by the guidelines proposed in Wilson & Collins (2019) <doi:10.7554/eLife.49547>. Example datasets included with the package are sourced from the work of Mason et al. (2024) <doi:10.3758/s13423-023-02415-x>.
This package provides some tools for developing and validating prediction models, estimate expected survival of patients and visualize them graphically. Most of the implemented methods are based on penalized regressions such as: the lasso (Tibshirani R (1996)), the elastic net (Zou H et al. (2005) <doi:10.1111/j.1467-9868.2005.00503.x>), the adaptive lasso (Zou H (2006) <doi:10.1198/016214506000000735>), the stability selection (Meinshausen N et al. (2010) <doi:10.1111/j.1467-9868.2010.00740.x>), some extensions of the lasso (Ternes et al. (2016) <doi:10.1002/sim.6927>), some methods for the interaction setting (Ternes N et al. (2016) <doi:10.1002/bimj.201500234>), or others. A function generating simulated survival data set is also provided.
This package provides methods for estimation and hypothesis testing of proportions in group testing designs: methods for estimating a proportion in a single population (assuming sensitivity and specificity equal to 1 in designs with equal group sizes), as well as hypothesis tests and functions for experimental design for this situation. For estimating one proportion or the difference of proportions, a number of confidence interval methods are included, which can deal with various different pool sizes. Further, regression methods are implemented for simple pooling and matrix pooling designs. Methods for identification of positive items in group testing designs: Optimal testing configurations can be found for hierarchical and array-based algorithms. Operating characteristics can be calculated for testing configurations across a wide variety of situations.
Given a likelihood provided by the user, this package applies it to a given matrix dataset in order to find change points in the data that maximize the sum of the likelihoods of all the segments. This package provides a handful of algorithms with different time complexities and assumption compromises so the user is able to choose the best one for the problem at hand. The implementation of the segmentation algorithms in this package are based on the paper by Bruno M. de Castro, Florencia Leonardi (2018) <arXiv:1501.01756>. The Berlin weather sample dataset was provided by Deutscher Wetterdienst <https://dwd.de/>. You can find all the references in the Acknowledgments section of this package's repository via the URL below.
Statistical procedures to perform stability analysis in plant breeding and to identify stable genotypes under diverse environments. It is possible to calculate coefficient of homeostaticity by Khangildin et al. (1979), variance of specific adaptive ability by Kilchevsky&Khotyleva (1989), weighted homeostaticity index by Martynov (1990), steadiness of stability index by Udachin (1990), superiority measure by Lin&Binn (1988) <doi:10.4141/cjps88-018>, regression on environmental index by Erberhart&Rassel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, Tai's (1971) stability parameters <doi:10.2135/cropsci1971.0011183X001100020006x>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, ecovalence by Wricke (1962), nonparametric stability parameters by Nassar&Huehn (1987) <doi:10.2307/2531947>, Francis&Kannenberg's parameters of stability (1978) <doi:10.4141/cjps78-157>.
This package provides a tool for matching ICD-10 codes to corresponding Clinical Classification Software Refined (CCSR) codes. The main function, CCSRfind(), identifies each CCSR code that applies to an individual given their diagnosis codes. It also provides a summary of CCSR codes that are matched to a dataset. The package contains 3 datasets: DXCCSR (mapping of ICD-10 codes to CCSR codes), Legend (conversion of DXCCSR to CCSRfind-usable format for CCSR codes with less than or equal to 1000 ICD-10 diagnosis codes), and LegendExtend (conversion of DXCCSR to CCSRfind-usable format for CCSR codes with more than 1000 ICD-10 dx codes). The disc() function applies grepl() ('base') to multiple columns and is used in CCSRfind().
Corset plots are a visualization technique used strictly to visualize repeat measures at 2 time points (such as pre- and post- data). The distribution of measurements are visualized at each time point, whilst the trajectories of individual change are visualized by connecting the pre- and post- values linearly. These lines can be coloured to represent the magnitude of change, or other user-defined value. This method of visualization is ideal for showing the heterogeneity of data, including differences by sub-groups. The package relies on ggplot2 allowing for easy integration so that users can customize their visualizations as required. Users can create corset plots using data in either wide or long format using the functions gg_corset() or gg_corset_elongated(), respectively.
Estimation of the generalized beta distribution of the second kind (GB2) and related models using grouped data in form of income shares. The GB2 family is a general class of distributions that provides an accurate fit to income data. GB2group includes functions to estimate the GB2, the Singh-Maddala, the Dagum, the Beta 2, the Lognormal and the Fisk distributions. GB2group deploys two different econometric strategies to estimate these parametric distributions, the equally weighted minimum distance (EWMD) estimator and the optimally weighted minimum distance (OMD) estimator. Asymptotic standard errors are reported for the OMD estimates. Standard errors of the EWMD estimates are obtained by Monte Carlo simulation. See Jorda et al. (2018) <arXiv:1808.09831> for a detailed description of the estimation procedure.
Computing statistical hypothesis testing for loading in principal component analysis (PCA) (Yamamoto, H. et al. (2014) <doi:10.1186/1471-2105-15-51>), orthogonal smoothed PCA (OS-PCA) (Yamamoto, H. et al. (2021) <doi:10.3390/metabo11030149>), one-sided kernel PCA (Yamamoto, H. (2023) <doi:10.51094/jxiv.262>), partial least squares (PLS) and PLS discriminant analysis (PLS-DA) (Yamamoto, H. et al. (2009) <doi:10.1016/j.chemolab.2009.05.006>), PLS with rank order of groups (PLS-ROG) (Yamamoto, H. (2017) <doi:10.1002/cem.2883>), regularized canonical correlation analysis discriminant analysis (RCCA-DA) (Yamamoto, H. et al. (2008) <doi:10.1016/j.bej.2007.12.009>), multiset PLS and PLS-ROG (Yamamoto, H. (2022) <doi:10.1101/2022.08.30.505949>).
This package performs the O2PLS data integration method for two datasets, yielding joint and data-specific parts for each dataset. The algorithm automatically switches to a memory-efficient approach to fit O2PLS to high dimensional data. It provides a rigorous and a faster alternative cross-validation method to select the number of components, as well as functions to report proportions of explained variation and to construct plots of the results. See the software article by el Bouhaddani et al (2018) <doi:10.1186/s12859-018-2371-3>, and Trygg and Wold (2003) <doi:10.1002/cem.775>. It also performs Sparse Group (Penalized) O2PLS, see Gu et al (2020) <doi:10.1186/s12859-021-03958-3> and cross-validation for the degree of sparsity.
The hybrid model is a highly effective forecasting approach that integrates decomposition techniques with machine learning to enhance time series prediction accuracy. Each decomposition technique breaks down a time series into multiple intrinsic mode functions (IMFs), which are then individually modeled and forecasted using machine learning algorithms. The final forecast is obtained by aggregating the predictions of all IMFs, producing an ensemble output for the time series. The performance of the developed models is evaluated using international monthly maize price data, assessed through metrics such as root mean squared error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE). For method details see Choudhary, K. et al. (2023). <https://ssca.org.in/media/14_SA44052022_R3_SA_21032023_Girish_Jha_FINAL_Finally.pdf>.
This package performs robust multiple testing for means in the presence of known and unknown latent factors presented in Fan et al.(2019) "FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control" <doi:10.1080/01621459.2018.1527700>. Implements a series of adaptive Huber methods combined with fast data-drive tuning schemes proposed in Ke et al.(2019) "User-Friendly Covariance Estimation for Heavy-Tailed Distributions" <doi:10.1214/19-STS711> to estimate model parameters and construct test statistics that are robust against heavy-tailed and/or asymmetric error distributions. Extensions to two-sample simultaneous mean comparison problems are also included. As by-products, this package contains functions that compute adaptive Huber mean, covariance and regression estimators that are of independent interest.
Novel method to unbiasedly include studies with Non-statistically Significant Unreported Effects (NSUEs) in a meta-analysis. First, the function calculates the interval where the unreported effects (e.g., t-values) should be according to the threshold of statistical significance used in each study. Afterward, the method uses maximum likelihood techniques to impute the expected effect size of each study with NSUEs, accounting for between-study heterogeneity and potential covariates. Multiple imputations of the NSUEs are then randomly created based on the expected value, variance, and statistical significance bounds. Finally, it conducts a restricted-maximum likelihood random-effects meta-analysis separately for each set of imputations, and it performs estimations from these meta-analyses. Please read the reference in metansue for details of the procedure.
This package provides a collection of white noise hypothesis tests for functional time series and related visualizations. These include tests based on the norms of autocovariance operators that are built under both strong and weak white noise assumptions. Additionally, tests based on the spectral density operator and on principal component dimensional reduction are included, which are built under strong white noise assumptions. Also, this package provides goodness-of-fit tests for functional autoregressive of order 1 models. These methods are described in Kokoszka et al. (2017) <doi:10.1016/j.jmva.2017.08.004>, Characiejus and Rice (2019) <doi:10.1016/j.ecosta.2019.01.003>, Gabrys and Kokoszka (2007) <doi:10.1198/016214507000001111>, and Kim et al. (2023) <doi: 10.1214/23-SS143> respectively.
The goal of blocking is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via rnndescent', RcppHNSW', RcppAnnoy', and mlpack packages, and provides integration with the reclin2 package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.
Simulates age-at-onset traits associated with a segregating major gene in family data obtained from population-based, clinic-based, or multi-stage designs. Appropriate ascertainment correction is utilized to estimate age-dependent penetrance functions either parametrically from the fitted model or nonparametrically from the data. The Expectation and Maximization algorithm can infer missing genotypes and carrier probabilities estimated from family's genotype and phenotype information or from a fitted model. Plot functions include pedigrees of simulated families and predicted penetrance curves based on specified parameter values. For more information see Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30.
Visualization of decision rules for binary classification and Receiver Operating Characteristic (ROC) curve estimation under different generalizations proposed in the literature: - making the classification subsets flexible to cover those scenarios where both extremes of the marker are associated with a higher risk of being positive, considering two thresholds (gROC() function); - transforming the marker by a proper function trying to improve the classification performance (hROC() function); - when dealing with multivariate markers, considering a proper transformation to univariate space trying to maximize the resulting AUC of the TPR for each FPR (multiROC() function). The classification regions behind each point of the ROC curve are displayed in both static graphics (plot_buildROC(), plot_regions() or plot_funregions() function) or videos (movieROC() function).
Collision Risk Models for avian fauna (seabird and migratory birds) at offshore wind farms. The base deterministic model is derived from Band (2012) <https://tethys.pnnl.gov/publications/using-collision-risk-model-assess-bird-collision-risks-offshore-wind-farms>. This was further expanded on by Masden (2015) <doi:10.7489/1659-1> and code used here is heavily derived from this work with input from Dr A. Cook at the British Trust for Ornithology. These collision risk models are useful for marine ornithologists who are working in the offshore wind industry, particularly in UK waters. However, many of the species included in the stochastic collision risk models can also be found in the North Atlantic in the United States and Canada, and could be applied there.
This package offers extensive tools for phylogenetic analysis. It focuses on phylogenetic comparative biology but also includes methods for visualizing, analyzing, manipulating, reading, writing, and inferring phylogenetic trees. Functions for comparative biology include ancestral state reconstruction, model fitting, and phylogeny and trait data simulation. A broad range of plotting methods includes mapping trait evolution on trees, projecting trees into phenotype space or geographic maps, and visualizing correlated speciation between trees. Additional functions allow for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data. Examples include computing consensus trees, simulating trees and data under various models, and attaching species or clades to a tree either randomly or non-randomly. This package provides numerous tools for tree manipulations and analyses that are valuable for phylogenetic research.
This package implements functions that calculate upper prediction bounds on the false discovery proportion (FDP) in the list of discoveries returned by competition-based setups, implementing Ebadi et al. (2022) <arXiv:2302.11837>. Such setups include target-decoy competition (TDC) in computational mass spectrometry and the knockoff construction in linear regression (note this package typically uses the terminology of TDC). Included is the standardized (TDC-SB) and uniform (TDC-UB) bound on TDC's FDP, and the simultaneous standardized and uniform bands. Requires pre-computed Monte Carlo statistics available at <https://github.com/uni-Arya/fdpbandsdata>. This data can be downloaded by running the command devtools::install_github("uni-Arya/fdpbandsdata") in R and restarting R after installation. The size of this data is roughly 81Mb.
Estimation of DIFferential COexpressed NETworks using diverse and user metrics. This package is basically used for three functions related to the estimation of differential coexpression. First, to estimate differential coexpression where the coexpression is estimated, by default, by Spearman correlation. For this, a metric to compare two correlation distributions is needed. The package includes 6 metrics. Some of them needs a threshold. A new metric can also be specified as a user function with specific parameters (see difconet.run). The significance is be estimated by permutations. Second, to generate datasets with controlled differential correlation data. This is done by either adding noise, or adding specific correlation structure. Third, to show the results of differential correlation analyses. Please see <http://bioinformatica.mty.itesm.mx/difconet> for further information.
Generate reports that enable quick visual review of temporal shifts in record-level data. Time series plots showing aggregated values are automatically created for each data field (column) depending on its contents (e.g. min/max/mean values for numeric data, no. of distinct values for categorical data), as well as overviews for missing values, non-conformant values, and duplicated rows. The resulting reports are shareable and can contribute to forming a transparent record of the entire analysis process. It is designed with Electronic Health Records in mind, but can be used for any type of record-level temporal data (i.e. tabular data where each row represents a single "event", one column contains the "event date", and other columns contain any associated values for the event).
Identification of causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations using a search-based algorithm by Tikka, Hyttinen and Karvanen (2021) <doi:10.18637/jss.v099.i05>. Allows for the presence of mechanisms related to selection bias (Bareinboim and Tian, 2015) <doi:10.1609/aaai.v29i1.9679>, transportability (Bareinboim and Pearl, 2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, missing data (Mohan, Pearl, and Tian, 2013) <http://ftp.cs.ucla.edu/pub/stat_ser/r410.pdf>) and arbitrary combinations of these. Also supports identification in the presence of context-specific independence (CSI) relations through labeled directed acyclic graphs (LDAG). For details on CSIs see (Corander et al., 2019) <doi:10.1016/j.apal.2019.04.004>.
Hybrid model is the most promising forecasting method by combining decomposition and deep learning techniques to improve the accuracy of time series forecasting. Each decomposition technique decomposes a time series into a set of intrinsic mode functions (IMFs), and the obtained IMFs are modelled and forecasted separately using the deep learning models. Finally, the forecasts of all IMFs are combined to provide an ensemble output for the time series. The prediction ability of the developed models are calculated using international monthly price series of maize in terms of evaluation criteria like root mean squared error, mean absolute percentage error and, mean absolute error. For method details see Choudhary, K. et al. (2023). <https://ssca.org.in/media/14_SA44052022_R3_SA_21032023_Girish_Jha_FINAL_Finally.pdf>.