Designed to enhance data validation and management processes by employing a set of functions that read a set of rules from a CSV or Excel file and apply them to a dataset. Funded by the National Renewable Energy Laboratory and Possibility Lab, maintained by the Moore Institute for Plastic Pollution Research.
Access and Analyze Official Development Assistance (ODA) data using the OECD API <https://gitlab.algobank.oecd.org/public-documentation/dotstat-migration/-/raw/main/OECD_Data_API_documentation.pdf>. ODA data includes sovereign-level aid data such as key aggregates (DAC1), geographical distributions (DAC2A), project-level data (CRS), and multilateral contributions (Multisystem).
Improving graphics by ameliorating order effects, using Eulerian tours and Hamiltonian decompositions of graphs. References for the methods presented here are C.B. Hurley and R.W. Oldford (2010) <doi:10.1198/jcgs.2010.09136> and C.B. Hurley and R.W. Oldford (2011) <doi:10.1007/s00180-011-0229-5>.
Gene-based association tests using the actual impurity reduction (AIR) variable importance. The function aggregates AIR importance measures from a group of SNPs or probes and outputs a p-value for each gene. The procedures builds upon the method described in <doi:10.1093/Bioinformatics/Bty373> and will be published soon.
An implementation of a computationally efficient method to fit large-scale interaction models based on the reluctant interaction selection principle. The method and its properties are described in greater depth in Yu, G., Bien, J., and Tibshirani, R.J. (2019) "Reluctant interaction modeling", which is available at <arXiv:1907.08414>.
Fits, spatially predicts and temporally forecasts large amounts of space-time data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian Auto-Regressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatio-temporal big-n problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.
Function for the computation of fractal dimension based on mass of soil particle size distribution by Tyler & Wheatcraft (1992) <doi:10.2136/sssaj1992.03615995005600020005x>. It also provides functions for calculation of mean weight and geometric mean diameter of particle size distribution by Perfect et al. (1992) <doi:10.2136/sssaj1992.03615995005600050012x>.
Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. Superpc is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.
Get sun position, sunlight phases (times for sunrise, sunset, dusk, etc.), moon position and lunar phase for the given location and time. Most calculations are based on the formulas given in Astronomy Answers articles about position of the sun and the planets : <https://www.aa.quae.nl/en/reken/zonpositie.html>.
This package provides a set of measures of dissimilarity between time series to perform time series clustering. Metrics based on raw data, on generating models and on the forecast behavior are implemented. Some additional utilities related to time series clustering are also provided, such as clustering algorithms and cluster evaluation metrics.
The goal of vetiver is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The vetiver package is extensible, with generics that can support many kinds of models.
Simulates individual-based models of agricultural pest management and the evolution of pesticide resistance. Management occurs on a spatially explicit landscape that is divided into an arbitrary number of farms that can grow one of up to 10 crops and apply one of up to 10 pesticides. Pest genomes are modelled in a way that allows for any number of pest traits with an arbitrary covariance structure that is constructed using an evolutionary algorithm in the mine_gmatrix() function. Simulations are then run using the run_farm_sim() function. This package thereby allows for highly mechanistic social-ecological models of the evolution of pesticide resistance under different types of crop rotation and pesticide application regimes.
This package provides functions to fit Gaussian linear model by maximising the residual log likelihood where the covariance structure can be written as a linear combination of known matrices. Can be used for multivariate models and random effects models. Easy straight forward manner to specify random effects models, including random interactions. Code now optimised to use Sherman Morrison Woodbury identities for matrix inversion in random effects models. We've added the ability to fit models using any kernel as well as a function to return the mean and covariance of random effects conditional on the data (best linear unbiased predictors, BLUPs). Clifford and McCullagh (2006) <https://www.r-project.org/doc/Rnews/Rnews_2006-2.pdf>.
Adds the MIxing-Data Sampling (MIDAS, Ghysels et al. (2007) <doi:10.1080/07474930600972467>) components to a variety of GARCH and MEM (Engle (2002) <doi:10.1002/jae.683>, Engle and Gallo (2006) <doi:10.1016/j.jeconom.2005.01.018>, and Amendola et al. (2024) <doi:10.1016/j.seps.2023.101764>) models, with the aim of predicting the volatility with additional low-frequency (that is, MIDAS) terms. The estimation takes place through simple functions, which provide in-sample and (if present) and out-of-sample evaluations. rumidas also offers a summary tool, which synthesizes the main information of the estimated model. There is also the possibility of generating one-step-ahead and multi-step-ahead forecasts.
This package analyzes and creates plots of array CGH data. Also, it allows usage of CBS, wavelet-based smoothing, HMM, BioHMM, GLAD, CGHseg. Most computations are parallelized (either via forking or with clusters, including MPI and sockets clusters) and use ff for storing data.
This package contains the functions to find the gene expression modules that represent the drivers of Kauffman's attractor landscape. The modules are the core attractor pathways that discriminate between different cell types of groups of interest. Each pathway has a set of synexpression groups, which show transcriptionally-coordinated changes in gene expression.
This package provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more, there are several utility functions for data handling and management.
This package performs ratio, GC content correction and normalization of data obtained using low coverage (one read every 100-10,000 bp) high troughput sequencing. It performs a "discrete" normalization looking for the ploidy of the genome. It will also provide tumour content if at least two ploidy states can be found.
CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.
GCAT is an association test for genome wide association studies that controls for population structure under a general class of trait models. This test conditions on the trait, which makes it immune to confounding by unmodeled environmental factors. Population structure is modeled via logistic factors, which are estimated using the `lfa` package.
This package provides large-scale single-cell omics data manipulation using Genomic Data Structure (GDS) files. It combines dense and sparse matrices stored in GDS files and the Bioconductor infrastructure framework (SingleCellExperiment and DelayedArray) to provide out-of-memory data storage and large-scale manipulation using the R programming language.
The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package has a set of datasets commonly used in our book.
This package provides a toolbox for analyzing and simulating large networks based on hierarchical exponential-family random graph models (HERGMs).'bigergm implements the estimation for large networks efficiently building on the lighthergm and hergm packages. Moreover, the package contains tools for simulating networks with local dependence to assess the goodness-of-fit.
Runs hierarchical linear Bayesian models. Samples from the posterior distributions of model parameters in JAGS (Just Another Gibbs Sampler; Plummer, 2017, <http://mcmc-jags.sourceforge.net>). Computes Bayes factors for group parameters of interest with the Savage-Dickey density ratio (Wetzels, Raaijmakers, Jakab, Wagenmakers, 2009, <doi:10.3758/PBR.16.4.752>).