Analysis of complex ANOVA models with any combination of orthogonal/nested and fixed/random factors, as described by Underwood (1997). There are two restrictions: (i) data must be balanced; (ii) fixed nested factors are not allowed. Homogeneity of variances is checked using Cochran's C test and a posteriori comparisons of means are done using Student-Newman-Keuls (SNK) procedure. For those terms with no denominator in the F-ratio calculation, pooled mean squares and quasi F-ratios are provided. Magnitute of effects are assessed by components of variation.
Firstly, both functions of the univariate Poisson dispersion index (DI) for count data and the univariate exponential variation index (VI) for nonnegative continuous data are performed. Next, other functions of univariate indexes such the binomial dispersion index (DIb), the negative binomial dispersion index (DInb) and the inverse Gaussian variation index (VIiG
) are given. Finally, we are computed some multivariate versions of these functions such that the generalized dispersion index (GDI) with its marginal one (MDI) and the generalized variation index (GVI) with its marginal one (MVI) too.
This package provides probability density functions and sampling algorithms for three key distributions from the General Unimodal Distribution (GUD) family: the Flexible Gumbel (FG) distribution, the Double Two-Piece (DTP) Student-t distribution, and the Two-Piece Scale (TPSC) Student-t distribution. Additionally, this package includes a function for Bayesian linear modal regression, leveraging these three distributions for model fitting. The details of the Bayesian modal regression model based on the GUD family can be found at Liu, Huang, and Bai (2024) <doi:10.1016/j.csda.2024.108012>.
Metadata about populations and data about samples from the 1000 Genomes Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded collection of 3,202 samples with 602 additional trios. The data is described in Auton et al. (2015) <doi:10.1038/nature15393> and Byrska-Bishop et al. (2022) <doi:10.1016/j.cell.2022.08.004>, and raw data is available at <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/>. See Turner (2022) <doi:10.48550/arXiv.2210.00539>
for more details.
Computes Logistic Knowledge Tracing ('LKT') which is a general method for tracking human learning in an educational software system. Please see Pavlik, Eglington, and Harrel-Williams (2021) <https://ieeexplore.ieee.org/document/9616435>. LKT is a method to compute features of student data that are used as predictors of subsequent performance. LKT allows great flexibility in the choice of predictive components and features computed for these predictive components. The system is built on top of LiblineaR
', which enables extremely fast solutions compared to base glm()
in R.
This package provides methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF
vignette gives an overview of the functionality provided.
This package implements the Classification-based on Association Rules (CBA) algorithm for association rule classification. The package, also described in Hahsler et al. (2019) <doi:10.32614/RJ-2019-048>, contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the arules package. To further decrease the size of the CBA models produced by the arc package, postprocessing by the qCBA
package is suggested.
The fxl Charting package is used to prepare and design single case design figures that are typically prepared in spreadsheet software. With fxl', there is no need to leave the R environment to prepare these works and many of the more unique conventions in single case experimental designs can be performed without the need for physically constructing features of plots (e.g., drawing annotations across plots). Support is provided for various different plotting arrangements (e.g., multiple baseline), annotations (e.g., brackets, arrows), and output formats (e.g., svg, rasters).
Factor analysis implementation for multiple data sources, i.e., for groups of variables. The whole data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The model group factor analysis (GFA) is inferred with Gibbs sampling, and it has been presented originally by Virtanen et al. (2012), and extended in Klami et al. (2015) <DOI:10.1109/TNNLS.2014.2376974> and Bunte et al. (2016) <DOI:10.1093/bioinformatics/btw207>; for details, see the citation info.
Implementation of Narrowest Significance Pursuit, a general and flexible methodology for automatically detecting localised regions in data sequences which each must contain a change-point (understood as an abrupt change in the parameters of an underlying linear model), at a prescribed global significance level. Narrowest Significance Pursuit works with a wide range of distributional assumptions on the errors, and yields exact desired finite-sample coverage probabilities, regardless of the form or number of the covariates. For details, see P. Fryzlewicz (2021) <https://stats.lse.ac.uk/fryzlewicz/nsp/nsp.pdf>.
Implementation of optimistic optimization methods for global optimization of deterministic or stochastic functions. The algorithms feature guarantees of the convergence to a global optimum. They require minimal assumptions on the (only local) smoothness, where the smoothness parameter does not need to be known. They are expected to be useful for the most difficult functions when we have no information on smoothness and the gradients are unknown or do not exist. Due to the weak assumptions, however, they can be mostly effective only in small dimensions, for example, for hyperparameter tuning.
Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem()
and the functions cv.oem()
and xval.oem()
are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem()
function allows for out of memory fitting. A description of the underlying methods and code interface is described in Huling and Chien (2022) <doi:10.18637/jss.v104.i06>.
Estimation for longitudinal data following outcome dependent sampling using the sequential offsetted regression technique. Includes support for binary, count, and continuous data. The first regression is a logistic regression, which uses a known ratio (the probability of being sampled given that the subject/observation was referred divided by the probability of being sampled given that the subject/observation was no referred) as an offset to estimate the probability of being referred given outcome and covariates. The second regression uses this estimated probability to calculate the mean population response given covariates.
Smart Adaptive Recommendations (SAR) is the name of a fast, scalable, adaptive algorithm for personalized recommendations based on user transactions and item descriptions. It produces easily explainable/interpretable recommendations and handles "cold item" and "semi-cold user" scenarios. This package provides two implementations of SAR': a standalone implementation, and an interface to a web service in Microsoft's Azure cloud: <https://github.com/Microsoft/Product-Recommendations/blob/master/doc/sar.md>. The former allows fast and easy experimentation, and the latter provides robust scalability and extra features for production use.
This package provides functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.
Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>
.
Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. See Gaure (2013) <doi:10.1016/j.csda.2013.03.024> Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multi-way clustered standard errors, as well as limited mobility bias correction (Gaure 2014 <doi:10.1002/sta4.68>). Since version 3.0, it provides dedicated functions to estimate Poisson models.
Likelihood-based estimation of conditional transformation models via the most likely transformation approach described in Hothorn et al. (2018) <DOI:10.1111/sjos.12291> and Hothorn (2020) <DOI:10.18637/jss.v092.i01>. Shift-scale (Siegfried et al, 2023, <DOI:10.1080/00031305.2023.2203177>) and multivariate (Klein et al, 2022, <DOI:10.1111/sjos.12501>) transformation models are part of this package. A package vignette is available from <DOI:10.32614/CRAN.package.mlt.docreg> and more convenient user interfaces to many models from <DOI:10.32614/CRAN.package.tram>.
The Moving Epidemic Method, created by T Vega and JE Lozano (2012, 2015) <doi:10.1111/j.1750-2659.2012.00422.x>, <doi:10.1111/irv.12330>, allows the weekly assessment of the epidemic and intensity status to help in routine respiratory infections surveillance in health systems. Allows the comparison of different epidemic indicators, timing and shape with past epidemics and across different regions or countries with different surveillance systems. Also, it gives a measure of the performance of the method in terms of sensitivity and specificity of the alert week.
This package provides a computationally efficient way of fitting weighted linear fixed effects estimators for causal inference with various weighting schemes. Weighted linear fixed effects estimators can be used to estimate the average treatment effects under different identification strategies. This includes stratified randomized experiments, matching and stratification for observational studies, first differencing, and difference-in-differences. The package implements methods described in Imai and Kim (2017) "When should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?", available at <https://imai.fas.harvard.edu/research/FEmatch.html>.
SNM is a modeling strategy especially designed for normalizing high-throughput genomic data. The underlying premise of our approach is that your data is a function of what we refer to as study-specific variables. These variables are either biological variables that represent the target of the statistical analysis, or adjustment variables that represent factors arising from the experimental or biological setting the data is drawn from. The SNM approach aims to simultaneously model all study-specific variables in order to more accurately characterize the biological or clinical variables of interest.
This package provides different approaches for selecting the threshold in generalized Pareto distributions. Most of them are based on minimizing the AMSE-criterion or at least by reducing the bias of the assumed GPD-model. Others are heuristically motivated by searching for stable sample paths, i.e. a nearly constant region of the tail index estimator with respect to k, which is the number of data in the tail. The third class is motivated by graphical inspection. In addition, a sequential testing procedure for GPD-GoF-tests is also implemented here.
Set of functions to calculate Benthic Biotic Indices from composition data, obtained whether from morphotaxonomic inventories or sequencing data. Based on reference ecological weights publicly available for a set of commonly used marine biotic indices, such as AMBI (A Marine Biotic Index, Borja et al., 2000) <doi:10.1016/S0025-326X(00)00061-8> NSI (Norwegian Sensitivity Index) and ISI (Indicator Species Index) (Rygg 2013, <ISBN:978-82-577-6210-0>). It provides the ecological quality status of the samples based on each BBI as well as the normalized Ecological Quality Ratio.
Estimate different types of cluster robust standard errors (CR0, CR1, CR2) with degrees of freedom adjustments. Standard errors are computed based on Liang and Zeger (1986) <doi:10.1093/biomet/73.1.13> and Bell and McCaffrey
<https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2002002/article/9058-eng.pdf?st=NxMjN1YZ>
. Functions used in Huang and Li <doi:10.3758/s13428-021-01627-0>, Huang, Wiedermann', and Zhang <doi:10.1080/00273171.2022.2077290>, and Huang, Zhang', and Li (forthcoming: Journal of Research on Educational Effectiveness).