This package provides an interface for working with large matrices stored in files, not in computer memory. It supports multiple non-character data types (double, integer, logical and raw) of various sizes (e.g. 8 and 4 byte real values). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. It supports very large matrices; the package has been tested on multi-terabyte matrices. It allows for more than 2^32 rows or columns, ad allows for quick addition of extra columns to a filematrix.
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot, Chi-squared test and Kolmogorov-Smirnov test. The package is based on the publication of Ultsch, A., Thrun, M.C., Hansen-Goos, O., Lotsch, J. (2015) <DOI:10.3390/ijms161025897>.
This package provides a collection of R functions were implemented from published and available analytic solutions for the One-Dimensional Boussinesq Equation (ground-water). In particular, the function "beq.lin()
" is the analytic solution of the linearized form of Boussinesq Equation between two different head-based boundary (Dirichlet) conditions; "beq.song" is the non-linear power-series analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.
This package provides peruvian agricultural production data from the Agriculture Minestry of Peru (MINAGRI). The first version includes 6 crops: rice, quinoa, potato, sweet potato, tomato and wheat; all of them across 24 departments. Initially, in excel files which has been transformed and assembled using tidy data principles, i.e. each variable is in a column, each observation is a row and each value is in a cell. The variables variables are sowing and harvest area per crop, yield, production and price per plot, every one year, from 2004 to 2014.
Several functions are available for calculating the most widely used effect sizes (ES), along with their variances, confidence intervals and p-values. The output includes ES's of d (mean difference), g (unbiased estimate of d), r (correlation coefficient), z (Fisher's z), and OR (odds ratio and log odds ratio). In addition, NNT (number needed to treat), U3, CLES (Common Language Effect Size) and Cliff's Delta are computed. This package uses recommended formulas as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
The purpose of forecastML
is to simplify the process of multi-step-ahead forecasting with standard machine learning algorithms. forecastML
supports lagged, dynamic, static, and grouping features for modeling single and grouped numeric or factor/sequence time series. In addition, simple wrapper functions are used to support model-building with most R packages. This approach to forecasting is inspired by Bergmeir, Hyndman, and Koo's (2018) paper "A note on the validity of cross-validation for evaluating autoregressive time series prediction" <doi:10.1016/j.csda.2017.11.003>.
The main functions in this package are with_cache()
and cached_read()
. The former is a simple way to cache an R object into a file on disk, using cachem'. The latter is a wrapper around any standard read function, but caches both the output and the file list info. If the input file list info hasn't changed, the cache is used; otherwise, the original files are re-read. This can save time if the original operation requires reading from many files, and/or involves lots of processing.
Citrus is a computational technique developed for the analysis of high dimensional cytometry data sets. This package extracts, statistically analyzes, and visualizes marker expression from citrus data. This code was used to generate data for Figures 3 and 4 in the forthcoming manuscript: Throm et al. â Identification of Enhanced Interferon-Gamma Signaling in Polyarticular Juvenile Idiopathic Arthritis with Mass Cytometryâ , JCI-Insight. For more information on Citrus, please see: Bruggner et al. (2014) <doi:10.1073/pnas.1408792111>. To download the citrus package, please see <https://github.com/nolanlab/citrus>.
This package performs genomic prediction of hybrid performance using eight GS methods including GBLUP, BayesB
, RKHS, PLS, LASSO, Elastic net, XGBoost and LightGBM
. GBLUP: genomic best liner unbiased prediction, RKHS: reproducing kernel Hilbert space, PLS: partial least squares regression, LASSO: least absolute shrinkage and selection operator, XGBoost: extreme gradient boosting, LightGBM
: light gradient boosting machine. It also provides fast cross-validation and mating design scheme for training population (Xu S et al (2016) <doi:10.1111/tpj.13242>; Xu S (2017) <doi:10.1534/g3.116.038059>).
This package provides a systematic bioinformatics tool to develop a new pathway-based gene panel for tumor mutational burden (TMB) assessment (pathway-based tumor mutational burden, PTMB), using somatic mutations files in an efficient manner from either The Cancer Genome Atlas sources or any in-house studies as long as the data is in mutation annotation file (MAF) format. Besides, we develop a multiple machine learning method using the sample's PTMB profiles to identify cancer-specific dysfunction pathways, which can be a biomarker of prognostic and predictive for cancer immunotherapy.
Computes the minimum sample size required for the development of a new multivariable prediction model using the criteria proposed by Riley et al. (2018) <doi: 10.1002/sim.7992>. pmsampsize can be used to calculate the minimum sample size for the development of models with continuous, binary or survival (time-to-event) outcomes. Riley et al. (2018) <doi: 10.1002/sim.7992> lay out a series of criteria the sample size should meet. These aim to minimise the overfitting and to ensure precise estimation of key parameters in the prediction model.
This is an implementation of the algorithm described in Section 3 of Hosszejni and Frühwirth-Schnatter (2022) <doi:10.48550/arXiv.2211.00671>
. The algorithm is used to verify that the counting rule CR(r,1) holds for the sparsity pattern of the transpose of a factor loading matrix. As detailed in Section 2 of the same paper, if CR(r,1) holds, then the idiosyncratic variances are generically identified. If CR(r,1) does not hold, then we do not know whether the idiosyncratic variances are identified or not.
Infrastructure and functions that can be used for integrating Stan (Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>) code into stand alone R packages which in turn use the CmdStan
engine which is often accessed through CmdStanR
'. Details given in Stan Development Team (2025) <https://mc-stan.org/cmdstanr/>. Using CmdStanR
and pre-written Stan code can make package installation easy. Using staninside offers a way to cache user-compiled Stan models in user-specified directories reducing the need to recompile the same model multiple times.
The standard index of DNA methylation (beta) is computed from methylated and unmethylated signal intensities. Betas calculated from raw signal intensities perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. This package provides 15 flavours of betas and three performance metrics, with methods for objects produced by the methylumi
and minfi
packages.
Rolling and expanding window approaches to assessing abundance based early warning signals, non-equilibrium resilience measures, and machine learning. See Dakos et al. (2012) <doi:10.1371/journal.pone.0041010>, Deb et al. (2022) <doi:10.1098/rsos.211475>, Drake and Griffen (2010) <doi:10.1038/nature09389>, Ushio et al. (2018) <doi:10.1038/nature25504> and Weinans et al. (2021) <doi:10.1038/s41598-021-87839-y> for methodological details. Graphical presentation of the outputs are also provided for clear and publishable figures. Visit the EWSmethods website for more information, and tutorials.
Calculates fundamental IO matrices (Leontief, Wassily W. (1951) <doi:10.1038/scientificamerican1051-15>); within period analysis via various rankings and coefficients (Sonis and Hewings (2006) <doi:10.1080/09535319200000013>, Blair and Miller (2009) <ISBN:978-0-521-73902-3>, Antras et al (2012) <doi:10.3386/w17819>, Hummels, Ishii, and Yi (2001) <doi:10.1016/S0022-1996(00)00093-3>); across period analysis with impact analysis (Dietzenbacher, van der Linden, and Steenge (2006) <doi:10.1080/09535319300000017>, Sonis, Hewings, and Guo (2006) <doi:10.1080/09535319600000002>); and a variety of table operators.
Life and Fertility Tables are appropriate to study the dynamics of arthropods populations. This package provides utilities for constructing Life Tables and Fertility Tables, related demographic parameters, and some simple graphs of interest. It also offers functions to transform the obtained data into a known format for better manipulation. This document is based on the article by Maia, Luiz, and Campanhola "Statistical Inference on Associated Fertility Life Table Parameters Using Jackknife Technique Computational Aspects" (April 2000, Journal of Economic Entomology, Volume 93, Issue 2) <doi:10.1603/0022-0493-93.2.511>.
Extended tools for analyzing telemetry data using generalized hidden Markov models. Features of momentuHMM
(pronounced ``momentum'') include data pre-processing and visualization, fitting HMMs to location and auxiliary biotelemetry or environmental data, biased and correlated random walk movement models, hierarchical HMMs, multiple imputation for incorporating location measurement error and missing data, user-specified design matrices and constraints for covariate modelling of parameters, random effects, decoding of the state process, visualization of fitted models, model checking and selection, and simulation. See McClintock
and Michelot (2018) <doi:10.1111/2041-210X.12995>.
This is the very popular mine sweeper game! The game requires you to find out tiles that contain mines through clues from unmasking neighboring tiles. Each tile that does not contain a mine shows the number of mines in its adjacent tiles. If you unmask all tiles that do not contain mines, you win the game; if you unmask any tile that contains a mine, you lose the game. For further game instructions, please run `help(run_game)` and check details. This game runs in X11-compatible devices with `grDevices::x11()
`.
This package implements Bayesian phase I repeated measurement design that accounts for multidimensional toxicity endpoints and longitudinal efficacy measure from multiple treatment cycles. The package provides flags to fit a variety of model-based phase I design, including 1 stage models with or without individualized dose modification, 3-stage models with or without individualized dose modification, etc. Functions are provided to recommend dosage selection based on the data collected in the available patient cohorts and to simulate trial characteristics given design parameters. Yin, Jun, et al. (2017) <doi:10.1002/sim.7134>.
Temporal disaggregation methods are used to disaggregate and interpolate a low frequency time series to a higher frequency series, where either the sum, the mean, the first or the last value of the resulting high frequency series is consistent with the low frequency series. Temporal disaggregation can be performed with or without one or more high frequency indicator series. Contains the methods of Chow-Lin, Santos-Silva-Cardoso, Fernandez, Litterman, Denton and Denton-Cholette, summarized in Sax and Steiner (2013) <doi:10.32614/RJ-2013-028>. Supports most R time series classes.
Covered uses modern Ruby features to generate comprehensive coverage, including support for templates which are compiled into Ruby. It has the following features:
Incremental coverage -- if you run your full test suite, and the run a subset, it will still report the correct coverage - so you can incrementally work on improving coverage.
Integration with RSpec, Minitest, Travis & Coveralls - no need to configure anything - out of the box support for these platforms.
It supports coverage of views -- templates compiled to Ruby code can be tracked for coverage reporting.
Calculates distances from point locations to features. The usual approach for eg. resource selection function analyses is to generate a complete distance to features surface then sample it with your observed and random points. Since these raster based approaches can be pretty costly with large areas, and often lead to memory issues in R, the distanceto package opts to compute these distances using efficient, vector based approaches. As a helper, there's a decidedly low-res raster based approach for visually inspecting your region's distance surface. But the workhorse is distance_to.
Provide estimation and data generation tools for some new multivariate frailty models. This version includes the gamma, inverse Gaussian, weighted Lindley, Birnbaum-Saunders, truncated normal, mixture of inverse Gaussian, mixture of Birnbaum-Saunders and generalized exponential as the distribution for the frailty terms. For the basal model, it is considered a parametric approach based on the exponential, Weibull and the piecewise exponential distributions as well as a semiparametric approach. For details, see Gallardo and Bourguignon (2025) <doi:10.1002/bimj.70044> and Gallardo et al. (2024) <doi:10.1007/s11222-024-10458-w>.