Error-driven learning (based on the Widrow & Hoff (1960)<https://isl.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf> learning rule, and essentially the same as Rescorla-Wagner's learning equations (Rescorla & Wagner, 1972, ISBN: 0390718017), which are also at the core of Naive Discrimination Learning, (Baayen et al, 2011, <doi:10.1037/a0023851>) can be used to explain bottom-up human learning (Hoppe et al, <doi:10.31234/osf.io/py5kd>), but is also at the core of artificial neural networks applications in the form of the Delta rule. This package provides a set of functions for building small-scale simulations to investigate the dynamics of error-driven learning and it's interaction with the structure of the input. For modeling error-driven learning using the Rescorla-Wagner equations the package ndl (Baayen et al, 2011, <doi:10.1037/a0023851>) is available on CRAN at <https://cran.r-project.org/package=ndl>. However, the package currently only allows tracing of a cue-outcome combination, rather than returning the learned networks. To fill this gap, we implemented a new package with a few functions that facilitate inspection of the networks for small error driven learning simulations. Note that our functions are not optimized for training large data sets (no parallel processing), as they are intended for small scale simulations and course examples. (Consider the python implementation pyndl <https://pyndl.readthedocs.io/en/latest/> for that purpose.).
An implementation of hypothesis testing in an extended Rasch modeling framework, including sample size planning procedures and power computations. Provides 4 statistical tests, i.e., gradient test (GR), likelihood ratio test (LR), Rao score or Lagrange multiplier test (RS), and Wald test, for testing a number of hypotheses referring to the Rasch model (RM), linear logistic test model (LLTM), rating scale model (RSM), and partial credit model (PCM). Three types of functions for power and sample size computations are provided. Firstly, functions to compute the sample size given a user-specified (predetermined) deviation from the hypothesis to be tested, the level alpha, and the power of the test. Secondly, functions to evaluate the power of the tests given a user-specified (predetermined) deviation from the hypothesis to be tested, the level alpha of the test, and the sample size. Thirdly, functions to evaluate the so-called post hoc power of the tests. This is the power of the tests given the observed deviation of the data from the hypothesis to be tested and a user-specified level alpha of the test. Power and sample size computations are based on a Monte Carlo simulation approach. It is computationally very efficient. The variance of the random error in computing power and sample size arising from the simulation approach is analytically derived by using the delta method. Draxler, C., & Alexandrowicz, R. W. (2015), <doi:10.1007/s11336-015-9472-y>.
The King's Health Questionnaire (KHQ) is a disease-specific, self-administered questionnaire designed specific to assess the impact of Urinary Incontinence (UI) on Quality of Life. The questionnaire was developed by Kelleher and collaborators (1997) <doi:10.1111/j.1471-0528.1997.tb11006.x>. It is a simple, acceptable and reliable measure to use in the clinical setting and a research tool that is useful in evaluating UI treatment outcomes. The KHQ five dimensions (KHQ5D) is a condition-specific preference-based measure developed by Brazier and collaborators (2008) <doi:10.1177/0272989X07301820>. Although not as popular as the SF6D <doi:10.1016/S0895-4356(98)00103-6> and EQ-5D <https://euroqol.org/>, the KHQ5D measures health-related quality of life (HRQoL
) specifically for UI, not general conditions like the others two instruments mentioned. The KHQ5D ca be used in the clinical and economic evaluation of health care. The subject self-rates their health in terms of five dimensions: Role Limitation (RL), Physical Limitations (PL), Social Limitations (SL), Emotions (E), and Sleep (S). Frequently the states on these five dimensions are converted to a single utility index using country specific value sets, which can be used in the clinical and economic evaluation of health care as well as in population health surveys. This package provides methods to calculate scores for each dimension of the KHQ; converts KHQ item scores to KHQ5D scores; and also calculates the utility index of the KHQ5D.
Multivariate Time Series (MTS) is a general package for analyzing multivariate linear time series and estimating multivariate volatility models. It also handles factor models, constrained factor models, asymptotic principal component analysis commonly used in finance and econometrics, and principal volatility component analysis. (a) For the multivariate linear time series analysis, the package performs model specification, estimation, model checking, and prediction for many widely used models, including vector AR models, vector MA models, vector ARMA models, seasonal vector ARMA models, VAR models with exogenous variables, multivariate regression models with time series errors, augmented VAR models, and Error-correction VAR models for co-integrated time series. For model specification, the package performs structural specification to overcome the difficulties of identifiability of VARMA models. The methods used for structural specification include Kronecker indices and Scalar Component Models. (b) For multivariate volatility modeling, the MTS package handles several commonly used models, including multivariate exponentially weighted moving-average volatility, Cholesky decomposition volatility models, dynamic conditional correlation (DCC) models, copula-based volatility models, and low-dimensional BEKK models. The package also considers multiple tests for conditional heteroscedasticity, including rank-based statistics. (c) Finally, the MTS package also performs forecasting using diffusion index , transfer function analysis, Bayesian estimation of VAR models, and multivariate time series analysis with missing values.Users can also use the package to simulate VARMA models, to compute impulse response functions of a fitted VARMA model, and to calculate theoretical cross-covariance matrices of a given VARMA model.
To estimate ecological stochasticity in community assembly. Understanding the community assembly mechanisms controlling biodiversity patterns is a central issue in ecology. Although it is generally accepted that both deterministic and stochastic processes play important roles in community assembly, quantifying their relative importance is challenging. The new index, normalized stochasticity ratio (NST), is to estimate ecological stochasticity, i.e. relative importance of stochastic processes, in community assembly. With functions in this package, NST can be calculated based on different similarity metrics and/or different null model algorithms, as well as some previous indexes, e.g. previous Stochasticity Ratio (ST), Standard Effect Size (SES), modified Raup-Crick metrics (RC). Functions for permutational test and bootstrapping analysis are also included. Previous ST is published by Zhou et al (2014) <doi:10.1073/pnas.1324044111>. NST is modified from ST by considering two alternative situations and normalizing the index to range from 0 to 1 (Ning et al 2019) <doi:10.1073/pnas.1904623116>. A modified version, MST, is a special case of NST, used in some recent or upcoming publications, e.g. Liang et al (2020) <doi:10.1016/j.soilbio.2020.108023>. SES is calculated as described in Kraft et al (2011) <doi:10.1126/science.1208584>. RC is calculated as reported by Chase et al (2011) <doi:10.1890/ES10-00117.1> and Stegen et al (2013) <doi:10.1038/ismej.2013.93>. Version 3 added NST based on phylogenetic beta diversity, used by Ning et al (2020) <doi:10.1038/s41467-020-18560-z>.
Aids in identifying the Koeppen-Geiger (KG) climatic zone for a given location. The Koeppen-Geiger climate zones were first published in 1884, as a system to classify regions of the earth by their relative heat and humidity through the year, for the benefit of human health, plant and agriculture and other human activity [1]. This climate zone classification system, applicable to all of the earths surface, has continued to be developed by scientists up to the present day. Recently one of use (FZ) has published updated, higher accuracy KG climate zone definitions [2]. In this package we use these updated high-resolution maps as the data source [3]. We provide functions that return the KG climate zone for a given longitude and lattitude, or for a given United States zip code. In addition the CZUncertainty()
function will check climate zones nearby to check if the given location is near a climate zone boundary. In addition an interactive shiny app is provided to define the KG climate zone for a given longitude and lattitude, or United States zip code. Digital data, as well as animated maps, showing the shift of the climate zones are provided on the following website <http://koeppen-geiger.vu-wien.ac.at>. This work was supported by the DOE-EERE SunShot
award DE-EE-0007140. [1] W. Koeppen, (2011) <doi:10.1127/0941-2948/2011/105>. [2] F. Rubel and M. Kottek, (2010) <doi:10.1127/0941-2948/2010/0430>. [3] F. Rubel, K. Brugger, K. Haslinger, and I. Auer, (2016) <doi:10.1127/metz/2016/0816>.
Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner's g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the mixture of g-priors from Liang et al (2008) <DOI:10.1198/016214507000001337> for linear models or mixtures of g-priors from Li and Clyde (2019) <DOI:10.1080/01621459.2018.1469992> in generalized linear models. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using sampling w/out replacement or an efficient MCMC algorithm which samples models using a tree structure of the model space as an efficient hash table. See Clyde, Ghosh and Littman (2010) <DOI:10.1198/jcgs.2010.09049> for details on the sampling algorithms. Uniform priors over all models or beta-binomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used to enforce sampling models that are full rank. The user may force variables to always be included in addition to imposing constraints that higher order interactions are included only if their parents are included in the model. This material is based upon work supported by the National Science Foundation under Division of Mathematical Sciences grant 1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Implementations of two empirical versions the kernel partial correlation (KPC) coefficient and the associated variable selection algorithms. KPC is a measure of the strength of conditional association between Y and Z given X, with X, Y, Z being random variables taking values in general topological spaces. As the name suggests, KPC is defined in terms of kernels on reproducing kernel Hilbert spaces (RKHSs). The population KPC is a deterministic number between 0 and 1; it is 0 if and only if Y is conditionally independent of Z given X, and it is 1 if and only if Y is a measurable function of Z and X. One empirical KPC estimator is based on geometric graphs, such as K-nearest neighbor graphs and minimum spanning trees, and is consistent under very weak conditions. The other empirical estimator, defined using conditional mean embeddings (CMEs) as used in the RKHS literature, is also consistent under suitable conditions. Using KPC, a stepwise forward variable selection algorithm KFOCI (using the graph based estimator of KPC) is provided, as well as a similar stepwise forward selection algorithm based on the RKHS based estimator. For more details on KPC, its empirical estimators and its application on variable selection, see Huang, Z., N. Deb, and B. Sen (2022). â Kernel partial correlation coefficient â a measure of conditional dependenceâ (URL listed below). When X is empty, KPC measures the unconditional dependence between Y and Z, which has been described in Deb, N., P. Ghosal, and B. Sen (2020), â Measuring association on topological spaces using kernels and geometric graphsâ <arXiv:2010.01768>
, and it is implemented in the functions KMAc()
and Klin()
in this package. The latter can be computed in near linear time.
Implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha)% global envelope is a band bounded by two vectors such that the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional analysis of variance, functional general linear model, n-sample test of correspondence of distribution functions), for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot) and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction). See Myllymäki and MrkviÄ ka (2024) <doi:10.18637/jss.v111.i03>, Myllymäki et al. (2017) <doi:10.1111/rssb.12172>, MrkviÄ ka and Myllymäki (2023) <doi:10.1007/s11222-023-10275-7>, MrkviÄ ka et al. (2016) <doi:10.1016/j.spasta.2016.04.005>, MrkviÄ ka et al. (2017) <doi:10.1007/s11222-016-9683-9>, MrkviÄ ka et al. (2020) <doi:10.14736/kyb-2020-3-0432>, MrkviÄ ka et al. (2021) <doi:10.1007/s11009-019-09756-y>, Myllymäki et al. (2021) <doi:10.1016/j.spasta.2020.100436>, MrkviÄ ka et al. (2022) <doi:10.1002/sim.9236>, Dai et al. (2022) <doi:10.5772/intechopen.100124>, DvoŠák and MrkviÄ ka (2022) <doi:10.1007/s00180-021-01134-y>, MrkviÄ ka et al. (2023) <doi:10.48550/arXiv.2309.04746>
, and Konstantinou et al. (2024) <doi: 10.1007/s00180-024-01569-z>.
In phase I clinical trials, the primary objective is to ascertain the maximum tolerated dose (MTD) corresponding to a specified target toxicity rate. The subsequent phase II trials are designed to examine the potential efficacy of the drug based on the MTD obtained from the phase I trials, with the aim of identifying the optimal biological dose (OBD). The CFO package facilitates the implementation of dose-finding trials by utilizing calibration-free odds type (CFO-type) designs. Specifically, it encompasses the calibration-free odds (CFO) (Jin and Yin (2022) <doi:10.1177/09622802221079353>), randomized CFO (rCFO
), precision CFO (pCFO
), two-dimensional CFO (2dCFO
) (Wang et al. (2023) <doi:10.3389/fonc.2023.1294258>), time-to-event CFO (TITE-CFO) (Jin and Yin (2023) <doi:10.1002/pst.2304>), fractional CFO (fCFO
), accumulative CFO (aCFO
), TITE-aCFO
, and f-aCFO
(Fang and Yin (2024) <doi: 10.1002/sim.10127>). It supports phase I/II trials for the CFO design and only phase I trials for the other CFO-type designs. The â CFO package accommodates diverse CFO-type designs, allowing users to tailor the approach based on factors such as dose information inclusion, handling of late-onset toxicity, and the nature of the target drug (single-drug or drug-combination). The functionalities embedded in CFO package include the determination of the dose level for the next cohort, the selection of the MTD for a real trial, and the execution of single or multiple simulations to obtain operating characteristics. Moreover, these functions are equipped with early stopping and dose elimination rules to address safety considerations. Users have the flexibility to choose different distributions, thresholds, and cohort sizes among others for their specific needs. The output of the CFO package can be summary statistics as well as various plots for better visualization. An interactive web application for CFO is available at the provided URL.
Analysis of species count data in ecology often requires normalization to an identical sample size. Rarefying (random subsampling without replacement), which is a popular method for normalization, has been widely criticized for its poor reproducibility and potential distortion of the community structure. In the context of microbiome count data, researchers explicitly advised against the use of rarefying. An alternative to rarefying is scaling with ranked subsampling (SRS). SRS consists of two steps. In the first step, the total counts for all OTUs (operational taxonomic units) or species in each sample are divided by a scaling factor chosen in such a way that the sum of the scaled counts Cscaled equals Cmin. In the second step, the non-integer Cscaled values are converted into integers by an algorithm that we dub ranked subsampling. The Cscaled value for each OTU or species is split into the integer part Cint (Cint = floor(Cscaled)) and the fractional part Cfrac (Cfrac = Cscaled - Cints). Since the sum of Cint is smaller or equal to Cmin, additional delta C = Cmin - the sum of Cint counts have to be added to the library to reach the total count of Cmin. This is achieved as follows. OTUs are ranked in the descending order of their Cfrac values. Beginning with the OTU of the highest rank, single count per OTU is added to the normalized library until the total number of added counts reaches delta C and the sum of all counts in the normalized library equals Cmin. When the lowest Cfrag involved in picking delta C counts is shared by several OTUs, the OTUs used for adding a single count to the library are selected in the order of their Cint values. This selection minimizes the effect of normalization on the relative frequencies of OTUs. OTUs with identical Cfrag as well as Cint are sampled randomly without replacement. See Beule & Karlovsky (2020) <doi:10.7717/peerj.9593> for details.
This package performs statistical estimation and inference-related computations by accessing and executing modified versions of Fortran subroutines originally published in the Association for Computing Machinery (ACM) journal Transactions on Mathematical Software (TOMS) by Bunch, Gay and Welsch (1993) <doi:10.1145/151271.151279>. The acronym BGW (from the authors last names) will be used when making reference to technical content (e.g., algorithm, methodology) that originally appeared in ACM TOMS. A key feature of BGW is that it exploits the special structure of statistical estimation problems within a trust-region-based optimization approach to produce an estimation algorithm that is much more effective than the usual practice of using optimization methods and codes originally developed for general optimization. The bgw package bundles R wrapper (and related) functions with modified Fortran source code so that it can be compiled and linked in the R environment for fast execution. This version implements a function ('bgw_mle.R') that performs maximum likelihood estimation (MLE) for a user-provided model object that computes probabilities (a.k.a. probability densities). The original motivation for producing this package was to provide fast, efficient, and reliable MLE for discrete choice models that can be called from the Apollo choice modelling R package ( see <http://www.apollochoicemodelling.com>). Starting with the release of Apollo 3.0, BGW is the default estimation package. However, estimation can also be performed using BGW in a stand-alone fashion without using Apollo (as shown in simple examples included in the package). Note also that BGW capabilities are not limited to MLE, and future extension to other estimators (e.g., nonlinear least squares, generalized method of moments, etc.) is possible. The Fortran code included in bgw was modified by one of the original BGW authors (Bunch) under his rights as confirmed by direct consultation with the ACM Intellectual Property and Rights Manager. See <https://authors.acm.org/author-resources/author-rights>. The main requirement is clear citation of the original publication (see above).
Computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: including: (1) based on Messer et al. (2006) <doi:10.1007/s11524-006-9094-x> and (2) based on Andrews et al. (2020) <doi:10.1080/17445647.2020.1750066> and Slotman et al. (2022) <doi:10.1016/j.dib.2022.108002> who use variables chosen by Roux and Mair (2010) <doi:10.1111/j.1749-6632.2009.05333.x>. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also (1) compute the spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011) <doi:10.1016/j.sste.2011.06.002>, (2) compute the spatial Educational Isolation Index (EI) based on Bravo et al. (2021) <doi:10.3390/ijerph18179384>, (3) compute the aspatial Index of Concentration at the Extremes (ICE) based on Feldman et al. (2015) <doi:10.1136/jech-2015-205728> and Krieger et al. (2016) <doi:10.2105/AJPH.2015.302955>, (4) compute the aspatial racial/ethnic Dissimilarity Index based on Duncan & Duncan (1955) <doi:10.2307/2088328>, (5) compute the aspatial income or racial/ethnic Atkinson Index based on Atkinson (1970) <doi:10.1016/0022-0531(70)90039-6>, (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) <doi:10.2307/2574118>, (7) aspatial racial/ethnic Correlation Ratio based on Bell (1954) <doi:10.2307/2574118> and White (1986) <doi:10.2307/3644339>, (8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) <doi:10.2307/2084686> and Sudano et al. (2013) <doi:10.1016/j.healthplace.2012.09.015>, and (9) aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017) <doi:10.1158/1055-9965.EPI-16-0926>. Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini Index based Gini (1921) <doi:10.2307/2223319>.
Import SGF (Smart Game File) into R.
This package facilitates RNA secondary structure plotting.
Simulate random matrices and ensembles and compute their eigenvalue spectra and dispersions.
Estimating repeatability (intra-class correlation) from Gaussian, binary, proportion and Poisson data.
This is a sudoku game package with a shiny application for playing .
rTRM
identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.
Fast and efficient computation of rolling and expanding statistics for time-series data.
Creation, manipulation, simulation of linear Gaussian Bayesian networks from text files and more...
Floating Percentile Model with additional functions for optimizing inputs and evaluating outputs and assumptions.