Allows the user to estimate transition probabilities for migratory animals between any two phases of the annual cycle, using a variety of different data types. Also quantifies the strength of migratory connectivity (MC), a standardized metric to quantify the extent to which populations co-occur between two phases of the annual cycle. Includes functions to estimate MC and the more traditional metric of migratory connectivity strength (Mantel correlation) incorporating uncertainty from multiple sources of sampling error. For cross-species comparisons, methods are provided to estimate differences in migratory connectivity strength, incorporating uncertainty. See Cohen et al. (2018) <doi:10.1111/2041-210X.12916>, Cohen et al. (2019) <doi:10.1111/ecog.03974>, and Roberts et al. (2023) <doi:10.1002/eap.2788> for details on some of these methods.
Facilitates some of the analyses performed in studies of behavioral economic discounting. The package supports scoring of the 27-Item Monetary Choice Questionnaire (see Kaplan et al., 2016; <doi:10.1007/s40614-016-0070-9>), calculating k values (Mazur's simple hyperbolic and exponential) using nonlinear regression, calculating various Area Under the Curve (AUC) measures, plotting regression curves for both fit-to-group and two-stage approaches, checking for unsystematic discounting (Johnson & Bickel, 2008; <doi:10.1037/1064-1297.16.3.264>) and scoring of the minute discounting task (see Koffarnus & Bickel, 2014; <doi:10.1037/a0035973>) using the Qualtrics 5-trial discounting template (see the Qualtrics Minute Discounting User Guide; <doi:10.13140/RG.2.2.26495.79527>), which is also available as a .qsf file in this package.
Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own.
Package graphicx
provides a useful keyword viewport which allows to show just a part of an image. However, one needs to put there the actual coordinates of the viewport window. Sometimes it is useful to have relative coordinates as fractions of natural size. For example, one may want to print a large image on a spread, putting a half on a verso page, and another half on the next recto page. For this one would need a viewport occupying exactly one half of the file's bounding box, whatever the actual width of the image may be. This package adds a new keyword rviewport
to the graphicx
package specifying relative viewport for graphics inclusion: a window defined by the given fractions of the natural width and height of the image.
This package provides a function for distribution free control chart based on the change point model, for multivariate statistical process control. The main constituent of the chart is the energy test that focuses on the discrepancy between empirical characteristic functions of two random vectors. This new control chart highlights in three aspects. Firstly, it is distribution free, requiring no knowledge of the random processes. Secondly, this control chart can monitor mean and variance simultaneously. Thirdly it is devised for multivariate time series which is more practical in real data application. Fourthly, it is designed for online detection (Phase II), which is central for real time surveillance of stream data. For more information please refer to O. Okhrin and Y.F. Xu (2017) <https://github.com/YafeiXu/working_paper/raw/master/CPM102.pdf>
.
Implementation of cross-validation method for testing the forecasting accuracy of several multi-population mortality models. The family of multi-population includes several multi-population mortality models proposed through the actuarial and demography literature. The package includes functions for fitting and forecast the mortality rates of several populations. Additionally, we include functions for testing the forecasting accuracy of different multi-population models. References. Atance, D., Debon, A., and Navarro, E. (2020) <doi:10.3390/math8091550>. Bergmeir, C. & Benitez, J.M. (2012) <doi:10.1016/j.ins.2011.12.028>. Debon, A., Montes, F., & Martinez-Ruiz, F. (2011) <doi:10.1007/s13385-011-0043-z>. Lee, R.D. & Carter, L.R. (1992) <doi:10.1080/01621459.1992.10475265>. Russolillo, M., Giordano, G., & Haberman, S. (2011) <doi:10.1080/03461231003611933>. Santolino, M. (2023) <doi:10.3390/risks11100170>.
Surrounds the usual sample variance of a univariate numeric sample with a confidence interval for the population variance. This has been done so far only under the assumption that the underlying distribution is normal. Under the hood, this package implements the unique least-variance unbiased estimator of the variance of the sample variance, in a formula that is equivalent to estimating kurtosis and square of the population variance in an unbiased way and combining them according to the classical formula into an estimator of the variance of the sample variance. Both the sample variance and the estimator of its variance are U-statistics. By the theory of U-statistic, the resulting estimator is unique. See Fuchs, Krautenbacher (2016) <doi:10.1080/15598608.2016.1158675> and the references therein for an overview of unbiased estimation of variances of U-statistics.
An ensemble meta-prediction framework to integrate multiple regression models into a current study. Gu, T., Taylor, J.M.G. and Mukherjee, B. (2020) <arXiv:2010.09971>
. A meta-analysis framework along with two weighted estimators as the ensemble of empirical Bayes estimators, which combines the estimates from the different external models. The proposed framework is flexible and robust in the ways that (i) it is capable of incorporating external models that use a slightly different set of covariates; (ii) it is able to identify the most relevant external information and diminish the influence of information that is less compatible with the internal data; and (iii) it nicely balances the bias-variance trade-off while preserving the most efficiency gain. The proposed estimators are more efficient than the naive analysis of the internal data and other naive combinations of external estimators.
Extensive global and small-area estimation procedures for multiphase forest inventories under the design-based Monte-Carlo approach are provided. The implementation has been published in the Journal of Statistical Software (<doi:10.18637/jss.v097.i04>) and includes estimators for simple and cluster sampling published by Daniel Mandallaz in 2007 (<doi:10.1201/9781584889779>), 2013 (<doi:10.1139/cjfr-2012-0381>, <doi:10.1139/cjfr-2013-0181>, <doi:10.1139/cjfr-2013-0449>, <doi:10.3929/ethz-a-009990020>) and 2016 (<doi:10.3929/ethz-a-010579388>). It provides point estimates, their external- and design-based variances and confidence intervals, as well as a set of functions to analyze and visualize the produced estimates. The procedures have also been optimized for the use of remote sensing data as auxiliary information, as demonstrated in 2018 by Hill et al. (<doi:10.3390/rs10071052>).
It is known that current false discovery rate (FDR) procedures can be very conservative when applied to multiple testing in the discrete paradigm where p-values (and test statistics) have discrete and heterogeneous null distributions. This package implements more powerful weighted or adaptive FDR procedures for FDR control and estimation in the discrete paradigm. The package takes in the original data set rather than just the p-values in order to carry out the adjustments for discreteness and heterogeneity of p-value distributions. The package implements methods for two types of test statistics and their p-values: (a) binomial test on if two independent Poisson distributions have the same means, (b) Fisher's exact test on if the conditional distribution is the same as the marginal distribution for two binomial distributions, or on if two independent binomial distributions have the same probabilities of success.
Two methods for performing equivalence test for the means of two (test and reference) normal distributions are implemented. The null hypothesis of the equivalence test is that the absolute difference between the two means are greater than or equal to the equivalence margin and the alternative is that the absolute difference is less than the margin. Given that the margin is often difficult to obtain a priori, it is assumed to be a constant multiple of the standard deviation of the reference distribution. The first method assumes a fixed margin which is a constant multiple of the estimated standard deviation of the reference data and whose variability is ignored. The second method takes into account the margin variability. In addition, some tools to summarize and illustrate the data and test results are included to facilitate the evaluation of the data and interpretation of the results.
Automation of the item selection processes for Rasch scales by means of exhaustive search for suitable Rasch models (dichotomous, partial credit, rating-scale) in a list of item-combinations. The item-combinations to test can be either all possible combinations or item-combinations can be defined by several rules (forced inclusion of specific items, exclusion of combinations, minimum/maximum items of a subset of items). Tests for model fit and item fit include ordering of the thresholds, item fit-indices, likelihood ratio test, Martin-Löf test, Wald-like test, person-item distribution, person separation index, principal components of Rasch residuals, empirical representation of all raw scores or Rasch trees for detecting differential item functioning. The tests, their ordering and their parameters can be defined by the user. For parameter estimation and model tests, functions of the packages eRm
', psychotools or pairwise can be used.
This package provides a versatile toolkit for analyzing and visualizing DEXi (Decision EXpert for education) decision trees, facilitating multi-criteria decision analysis directly within R. Users can read .dxi files, manipulate decision trees, and evaluate various scenarios. It supports sensitivity analysis through Monte Carlo simulations, one-at-a-time approaches, and variance-based methods, helping to discern the impact of input variations. Additionally, it includes functionalities for generating sampling plans and an array of visualization options for decision trees and analysis results. A distinctive feature is the synoptic table plot, aiding in the efficient comparison of scenarios. Whether for in-depth decision modeling or sensitivity analysis, this package stands as a comprehensive solution. Definition of sensitivity analyses available in Carpani, Bergez and Monod (2012) <doi:10.1016/j.envsoft.2011.10.002> and detailed description of the package soon available in Alaphilippe, Allart, Carpani, Cavan, Monod and Bergez (submitted to Software Impacts).
Monetary valuation of wood in German forests (stumpage values), including estimations of harvest quantities, wood revenues, and harvest costs. The functions are sensitive to tree species, mean diameter of the harvested trees, stand quality, and logging method. The functions include estimations for the consequences of disturbances on revenues and costs. The underlying assortment tables are taken from Offer and Staupendahl (2018) with corresponding functions for salable and skidded volume derived in Fuchs et al. (2023). Wood revenue and harvest cost functions were taken from v. Bodelschwingh (2018). The consequences of disturbances refer to Dieter (2001), Moellmann and Moehring (2017), and Fuchs et al. (2022a, 2022b). For the full references see documentation of the functions, package README, and Fuchs et al. (2023). Apart from Dieter (2001) and Moellmann and Moehring (2017), all functions and factors are based on data from HessenForst
, the forest administration of the Federal State of Hesse in Germany.
This package provides a framework to infer causality on a pair of time series of real numbers based on variable-lag Granger causality and transfer entropy. Typically, Granger causality and transfer entropy have an assumption of a fixed and constant time delay between the cause and effect. However, for a non-stationary time series, this assumption is not true. For example, considering two time series of velocity of person A and person B where B follows A. At some time, B stops tying his shoes, then running to catch up A. The fixed-lag assumption is not true in this case. We propose a framework that allows variable-lags between cause and effect in Granger causality and transfer entropy to allow them to deal with variable-lag non-stationary time series. Please see Chainarong Amornbunchornvej, Elena Zheleva, and Tanya Berger-Wolf (2021) <doi:10.1145/3441452> when referring to this package in publications.
Randomly splits data into testing and training sets. Then, uses stepwise selection to fit numerous multiple regression models on the training data, and tests them on the test data. Returned for each model are plots comparing model Akaike Information Criterion (AIC), Pearson correlation coefficient (r) between the predicted and actual values, Mean Absolute Error (MAE), and R-Squared among the models. Each model is ranked relative to the other models by the model evaluation metrics (i.e., AIC, r, MAE, and R-Squared) and the model with the best mean ranking among the model evaluation metrics is returned. Model evaluation metric weights for AIC, r, MAE, and R-Squared are taken in as arguments as aic_wt, r_wt, mae_wt, and r_squ_wt, respectively. They are equally weighted as default but may be adjusted relative to each other if the user prefers one or more metrics to the others, Field, A. (2013, ISBN:978-1-4462-4918-5).
Hyvärinen's score matching (Hyvärinen, 2005) <https://jmlr.org/papers/v6/hyvarinen05a.html> is a useful estimation technique when the normalising constant for a probability distribution is difficult to compute. This package implements score matching estimators using automatic differentiation in the CppAD
library <https://github.com/coin-or/CppAD>
and is designed for quickly implementing score matching estimators for new models. Also available is general robustification (Windham, 1995) <https://www.jstor.org/stable/2346159>. Already in the package are estimators for directional distributions (Mardia, Kent and Laha, 2016) <doi:10.48550/arXiv.1604.08470>
and the flexible Polynomially-Tilted Pairwise Interaction model for compositional data. The latter estimators perform well when there are zeros in the compositions (Scealy and Wood, 2023) <doi:10.1080/01621459.2021.2016422>, even many zeros (Scealy, Hingee, Kent, and Wood, 2024) <doi:10.1007/s11222-024-10412-w>. A partial interface to CppAD's
ADFun objects is also available.
This package contains functions to compute and plot confidence distributions, confidence densities, p-value functions and s-value (surprisal) functions for several commonly used estimates. Instead of just calculating one p-value and one confidence interval, p-value functions display p-values and confidence intervals for many levels thereby allowing to gauge the compatibility of several parameter values with the data. These methods are discussed by Infanger D, Schmidt-Trucksäss A. (2019) <doi:10.1002/sim.8293>; Poole C. (1987) <doi:10.2105/AJPH.77.2.195>; Schweder T, Hjort NL. (2002) <doi:10.1111/1467-9469.00285>; Bender R, Berg G, Zeeb H. (2005) <doi:10.1002/bimj.200410104> ; Singh K, Xie M, Strawderman WE. (2007) <doi:10.1214/074921707000000102>; Rothman KJ, Greenland S, Lash TL. (2008, ISBN:9781451190052); Amrhein V, Trafimow D, Greenland S. (2019) <doi:10.1080/00031305.2018.1543137>; Greenland S. (2019) <doi:10.1080/00031305.2018.1529625> and Rafi Z, Greenland S. (2020) <doi:10.1186/s12874-020-01105-9>.
This package provides a toolbox to train a single sample classifier that uses in-sample feature relationships. The relationships are represented as feature1 < feature2 (e.g. gene1 < gene2). We provide two options to go with. First is based on switchBox
package which uses Top-score pairs algorithm. Second is a novel implementation based on random forest algorithm. For simple problems we recommend to use one-vs-rest using TSP option due to its simplicity and for being easy to interpret. For complex problems RF performs better. Both lines filter the features first then combine the filtered features to make the list of all the possible rules (i.e. rule1: feature1 < feature2, rule2: feature1 < feature3, etc...). Then the list of rules will be filtered and the most important and informative rules will be kept. The informative rules will be assembled in an one-vs-rest model or in an RF model. We provide a detailed description with each function in this package to explain the filtration and training methodology in each line. Reference: Marzouka & Eriksson (2021) <doi:10.1093/bioinformatics/btab088>.
Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.
Generates feature matrix outputs from R object inputs using a variety of expansion functions. The generated feature matrices have applications as inputs for a variety of machine learning algorithms. The expansion functions are based on coercing the input to a matrix, treating the columns as features and converting individual columns or combinations into blocks of columns. Currently these include expansion of columns by efficient sparse embedding by vectors of lags, quadratic expansion into squares and unique products, powers by vectors of degree, vectors of orthogonal polynomials functions, and block random affine projection transformations (RAPTs). The transformations are magrittr- and cbind-friendly, and can be used in a building block fashion. For instance, taking the cos()
of the output of the RAPT transformation generates a stationary kernel expansion via Bochner's theorem, and this expansion can then be cbind-ed with other features. Additionally, there are utilities for replacing features, removing rows with NAs, creating matrix samples of a given distribution, a simple wrapper for LASSO with CV, a Freeman-Tukey transform, generalizations of the outer function, matrix size-preserving discrete difference by row, plotting, etc.
Airport problems, introduced by Littlechild and Owen (1973) <https://www.jstor.org/stable/2629727>, are cost allocation problems where agents share the cost of a facility (or service) based on their ordered needs. Valid allocations must satisfy no-subsidy constraints, meaning that no group of agents contributes more than the highest cost of its members (i.e., no agent is allowed to subsidize another). A rule is a mechanism that selects an allocation vector for a given problem. This package computes several rules proposed in the literature, including both standard rules and their variants, such as weighted versions, rules for clones, and rules based on the agentsâ hierarchy order. These rules can be applied to various problems of interest, including the allocation of liabilities and the maintenance of irrigation systems, among others. Moreover, the package provides functions for graphical representation, enabling users to visually compare the outcomes produced by each rule, or to display the no-subsidy set. In addition, it includes four datasets illustrating different applications and examples of airport problems. For a more detailed explanation of all concepts, see Thomson (2024) <doi:10.1016/j.mathsocsci.2024.03.007>.
Implementation of three methods based on the diversity forest (DF) algorithm (Hornung, 2022, <doi:10.1007/s42979-021-00920-1>), a split-finding approach that enables complex split procedures in random forests. The package includes: 1. Interaction forests (IFs) (Hornung & Boulesteix, 2022, <doi:10.1016/j.csda.2022.107460>): Model quantitative and qualitative interaction effects using bivariable splitting. Come with the Effect Importance Measure (EIM), which can be used to identify variable pairs that have well-interpretable quantitative and qualitative interaction effects with high predictive relevance. 2. Two random forest-based variable importance measures (VIMs) for multi-class outcomes: the class-focused VIM, which ranks covariates by their ability to distinguish individual outcome classes from the others, and the discriminatory VIM, which measures overall covariate influence irrespective of class-specific relevance. 3. The basic form of diversity forests that uses conventional univariable, binary splitting (Hornung, 2022). Except for the multi-class VIMs, all methods support categorical, metric, and survival outcomes. The package includes visualization tools for interpreting the identified covariate effects. Built as a fork of the ranger R package (main author: Marvin N. Wright), which implements random forests using an efficient C++ implementation.
The aim of most plant breeding programmes is simultaneous improvement of several characters. An objective method involving simultaneous selection for several attributes then becomes necessary. It has been recognised that most rapid improvements in the economic value is expected from selection applied simultaneously to all the characters which determine the economic value of a plant, and appropriate assigned weights to each character according to their economic importance, heritability and correlations between characters. So the selection for economic value is a complex matter. If the component characters are combined together into an index in such a way that when selection is applied to the index, as if index is the character to be improved, most rapid improvement of economic value is expected. Such an index was first proposed by Smith (1937 <doi:10.1111/j.1469-1809.1936.tb02143.x>) based on the Fisher's (1936 <doi:10.1111/j.1469-1809.1936.tb02137.x>) "discriminant function" Dabholkar (1999 <https://books.google.co.in/books?id=mlFtumAXQ0oC&lpg=PA4&ots=Xgxp1qLuxS&dq=elements%20of%20biometrical%20genetics&lr&pg=PP1#v=onepage&q&f=false>
). In this package selection index is calculated based on the Smith (1937) selection index method.