Computationally efficient tools for high dimensional predictive modeling (regression and classification). SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks.
Implementation of the methodologies described in 1) Alexander Petersen, Xi Liu and Afshin A. Divani (2021) <doi:10.1214/20-aos1971>, including global F tests, partial F tests, intrinsic Wasserstein-infinity bands and Wasserstein density bands, and 2) Chao Zhang, Piotr Kokoszka and Alexander Petersen (2022) <doi:10.1111/jtsa.12590>, including estimation, prediction, and inference of the Wasserstein autoregressive models.
This package provides visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was originally inspired by the book "Visualizing Categorical Data" by Michael Friendly and is now the main support package for a new book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer (2015).
This package provides a general test for conditional independence in supervised learning algorithms as proposed by Watson & Wright (2021) <doi:10.1007/s10994-021-06030-6>. Implements a conditional variable importance measure which can be applied to any supervised learning algorithm and loss function. Provides statistical inference procedures without parametric assumptions and applies equally well to continuous and categorical predictors and outcomes.
Uses the CMS application programming interface <https://dnav.cms.gov/api/healthdata> to provide users databases containing yearly Medicare reimbursement rates in the United States. Data can be acquired for the entire United States or only for specific localities. Currently, support is only provided for the Medicare Physician Fee Schedule, but support will be expanded for other CMS databases in future versions.
Discrete event simulation (DES) involves modeling of systems having discrete, i.e. abrupt, state changes. For instance, when a job arrives to a queue, the queue length abruptly increases by 1. This package is an R implementation of the event-oriented approach to DES; see the tutorial in Matloff (2008) <http://heather.cs.ucdavis.edu/~matloff/156/PLN/DESimIntro.pdf>
.
Various algorithms related to linguistic fuzzy logic: mining for linguistic fuzzy association rules, composition of fuzzy relations, performing perception-based logical deduction (PbLD
), and forecasting time-series using fuzzy rule-based ensemble (FRBE). The package also contains basic fuzzy-related algebraic functions capable of handling missing values in different styles (Bochvar, Sobocinski, Kleene etc.), computation of Sugeno integrals and fuzzy transform.
Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, mde provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.
This package provides utilities for estimation for the multivariate inverse Gaussian distribution of Minami (2003) <doi:10.1081/STA-120025379>, including random vector generation and explicit estimators of the location vector and scale matrix. The package implements kernel density estimators discussed in Belzile, Desgagnes, Genest and Ouimet (2024) <doi:10.48550/arXiv.2209.04757>
for smoothing multivariate data on half-spaces.
Multivariate tests, estimates and methods based on the identity score, spatial sign score and spatial rank score are provided. The methods include one and c-sample problems, shape estimation and testing, linear regression and principal components. The methodology is described in Oja (2010) <doi:10.1007/978-1-4419-0468-3> and Nordhausen and Oja (2011) <doi:10.18637/jss.v043.i05>.
Consider a data matrix of n individuals with p variates. The objective general index (OGI) is a general index that combines the p variates into a univariate index in order to rank the n individuals. The OGI is always positively correlated with each of the variates. More details can be found in Sei (2016) <doi:10.1016/j.jmva.2016.02.005>.
This package provides an infrastructure for efficient processing of large-scale genetic and phenotypic data including core functions for: 1) fitting linear mixed models, 2) constructing marker-based genomic relationship matrices, 3) estimating genetic parameters (heritability and correlation), 4) performing genomic prediction and genetic risk profiling, and 5) single or multi-marker association analyses. Rohde et al. (2019) <doi:10.1101/503631>.
Estimate and plot wavelet quantile correlations(Kumar and Padakandla,2022) between two time series. Wavelet quantile correlation is used to capture the dependency between two time series across quantiles and different frequencies. This method is useful in identifying potential hedges and safe-haven instruments for investment purposes. See Kumar and Padakandla(2022) <doi:10.1016/j.frl.2022.102707> for further details.
This package provides tools for reading, parsing and visualizing simulation data stored in xvg'/'xpm file formats (commonly generated by GROMACS molecular dynamics software). Streamlines post-processing and analysis of molecular dynamics ('MD') simulation outputs, enabling efficient exploration of molecular stability and conformational changes. Supports import of trajectory metrics ('RMSD', energy, temperature) and creation of publication-ready visualizations through integration with ggplot2'.
Rygel is a home media solution (UPnP AV MediaServer and MediaRenderer) for GNOME that allows you to easily share audio, video, and pictures, and to control a media player on your home network.
Rygel achieves interoperability with other devices by trying to conform to the strict requirements of DLNA and by converting media on-the-fly to formats that client devices can handle.
This package provides a tidy framework for automatic knowledge classification and visualization. Currently, the core functionality of the framework is mainly supported by modularity-based clustering (community detection) in keyword co-occurrence network, and focuses on co-word analysis of bibliometric research. However, the designed functions in akc are general, and could be extended to solve other tasks in text mining as well.
It provides the density, distribution function, quantile function, random number generator, likelihood function, moments and Maximum Likelihood estimators for a given sample, all this for the three parameter Asymmetric Laplace Distribution defined in Koenker and Machado (1999). This is a special case of the skewed family of distributions available in Galarza et.al. (2017) <doi:10.1002/sta4.140> useful for quantile regression.
Splits data into Gaussian type clusters using the Cross-Entropy Clustering ('CEC') method. This method allows for the simultaneous use of various types of Gaussian mixture models, for performing the reduction of unnecessary clusters, and for discovering new clusters by splitting them. CEC is based on the work of Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.
This package provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42.
This package provides a flexible framework for calculating Elo ratings and resulting rankings of any two-team-per-matchup system (chess, sports leagues, Go', etc.). This implementation is capable of evaluating a variety of matchups, Elo rating updates, and win probabilities, all based on the basic Elo rating system. It also includes methods to benchmark performance, including logistic regression and Markov chain models.
Compute maximum likelihood estimators of parameters in a Gaussian factor model using the the matrix-free methodology described in Dai et al. (2020) <doi:10.1080/10618600.2019.1704296>. In contrast to the factanal()
function from stats package, fad()
can handle high-dimensional datasets where number of variables exceed the sample size and is also substantially faster than the EM algorithms.
This package implements the generalized integration model, which integrates individual-level data and summary statistics under a generalized linear model framework. It supports continuous and binary outcomes to be modeled by the linear and logistic regression models. For binary outcome, data can be sampled in prospective cohort studies or case-control studies. Described in Zhang et al. (2020)<doi:10.1093/biomet/asaa014>.
Implementation of Tyler, Critchley, Duembgen and Oja's (JRSS B, 2009, <doi:10.1111/j.1467-9868.2009.00706.x>) and Oja, Sirkia and Eriksson's (AJS, 2006, <https://www.ajs.or.at/index.php/ajs/article/view/vol35,%20no2%263%20-%207>) method of two different scatter matrices to obtain an invariant coordinate system or independent components, depending on the underlying assumptions.
This package provides methods and tools for model selection and multi-model inference (Burnham and Anderson (2002) <doi:10.1007/b97636>, among others). SUR (for parameter estimation), logit'/'probit (for binary classification), and VARMA (for time-series forecasting) are implemented. Evaluations are both in-sample and out-of-sample. It is designed to be efficient in terms of CPU usage and memory consumption.