This package provides a set of streamlined functions that allow easy generation of linear regression diagnostic plots necessarily for checking linear model assumptions. This package is meant for easy scheming of linear regression diagnostics, while preserving merits of "The Grammar of Graphics" as implemented in ggplot2'. See the ggplot2 website for more information regarding the specific capability of graphics.
Analyses species distribution models and evaluates their performance. It includes functions for variation partitioning, extracting variable importance, computing several metrics of model discrimination and calibration performance, optimizing prediction thresholds based on a number of criteria, performing multivariate environmental similarity surface (MESS) analysis, and displaying various analytical plots. Initially described in Barbosa et al. (2013) <doi:10.1111/ddi.12100>.
Maximum a posteriori (MAP) estimation for topic models (i.e., Latent Dirichlet Allocation) in text analysis, as described in Taddy (2012) On estimation and selection for topic models'. Previous versions of this code were included as part of the textir package. If you want to take advantage of openmp parallelization, uncomment the relevant flags in src/MAKEVARS before compiling.
This package provides a way to estimate and test marginal mediation effects for zero-inflated compositional mediators. Estimates of Natural Indirect Effect (NIE), Natural Direct Effect (NDE) of each taxon, as well as their standard errors and confident intervals, were provided as outputs. Zeros will not be imputed during analysis. See Wu et al. (2022) <doi:10.3390/genes13061049>.
Fits community site occupancy models to environmental DNA metabarcoding data collected using spatially-replicated survey design. Model fitting results can be used to evaluate and compare the effectiveness of species detection to find an efficient survey design. Reference: Fukaya et al. (2022) <doi:10.1111/2041-210X.13732>, Fukaya and Hasebe (2025) <doi:10.1101/2025.01.09.632116>.
Anomaly detection in dynamic, temporal networks. The package oddnet uses a feature-based method to identify anomalies. First, it computes many features for each network. Then it models the features using time series methods. Using time series residuals it detects anomalies. This way, the temporal dependencies are accounted for when identifying anomalies (Kandanaarachchi, Hyndman 2022) <arXiv:2210.07407>
.
Simulation of recurrent event data for non-constant baseline hazard in the total time model with risk-free intervals and possibly a competing event. Possibility to cut the data to an interim data set. Data can be plotted. Details about the method can be found in Jahn-Eimermacher, A. et al. (2015) <doi:10.1186/s12874-015-0005-2>.
This package provides methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also various normalisation methods (min-max, z-score, Box-Cox, Yeo-Johnson), and forecasting accuracy measures are implemented.
This package provides a model building aid for nonlinear mixed-effects (population) model analysis using NONMEM, facilitating data set checkout, exploration and visualization, model diagnostics, candidate covariate identification and model comparison. The methods are described in Keizer et al. (2013) <doi:10.1038/psp.2013.24>, and Jonsson et al. (1999) <doi:10.1016/s0169-2607(98)00067-4>.
MEDIPS was developed for analyzing data derived from methylated DNA immunoprecipitation (MeDIP
) experiments followed by sequencing (MeDIP-seq
). However, MEDIPS provides functionalities for the analysis of any kind of quantitative sequencing data (e.g. ChIP-seq
, MBD-seq, CMS-seq and others) including calculation of differential coverage between groups of samples and saturation and correlation analysis.
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. The package provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of ggplot2
and tidy data.
This package provides the exponential integrals E_1(x)
, E_2(x)
, E_n(x)
and Ei(x)
, and the incomplete gamma function G(a, x)
defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details.
This package provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.
This method identifies topological domains in genomes from Hi-C sequence data. The authors published an implementation of their method as an R script. This package originates from those original TopDom
R scripts and provides help pages adopted from the original TopDom
PDF documentation. It also provides a small number of bug fixes to the original code.
Helps users in quickly visualizing risk-of-bias assessments performed as part of a systematic review. It allows users to create weighted bar-plots of the distribution of risk-of-bias judgments within each bias domain, in addition to traffic-light plots of the specific domain-level judgments for each study. The resulting figures are of publication quality and are formatted according the risk-of-bias assessment tool use to perform the assessments. Currently, the supported tools are ROB2.0 (for randomized controlled trials; Sterne et al (2019) <doi:10.1136/bmj.l4898>), ROBINS-I (for non-randomised studies of interventions; Sterne et al (2016) <doi:10.1136/bmj.i4919>), and QUADAS-2 (for diagnostic accuracy studies; Whiting et al (2011) <doi:10.7326/0003-4819-155-8-201110180-00009>).
Data exploration and prediction with focus on high dimensional data and chemometrics. The package was initially designed about partial least squares regression and discrimination models and variants, in particular locally weighted PLS models (LWPLS). Then, it has been expanded to many other methods for analyzing high dimensional data. The name rchemo comes from the fact that the package is orientated to chemometrics, but most of the provided methods are fully generic to other domains. Functions such as transform()
, predict()
, coef()
and summary()
are available. Tuning the predictive models is facilitated by generic functions gridscore()
(validation dataset) and gridcv()
(cross-validation). Faster versions are also available for models based on latent variables (LVs) (gridscorelv()
and gridcvlv()
) and ridge regularization (gridscorelb()
and gridcvlb()
).
Allows the user to learn Bayesian networks from datasets containing thousands of variables. It focuses on score-based learning, mainly the BIC and the BDeu score functions. It provides state-of-the-art algorithms for the following tasks: (1) parent set identification - Mauro Scanagatta (2015) <http://papers.nips.cc/paper/5803-learning-bayesian-networks-with-thousands-of-variables>; (2) general structure optimization - Mauro Scanagatta (2018) <doi:10.1007/s10994-018-5701-9>, Mauro Scanagatta (2018) <http://proceedings.mlr.press/v73/scanagatta17a.html>; (3) bounded treewidth structure optimization - Mauro Scanagatta (2016) <http://papers.nips.cc/paper/6232-learning-treewidth-bounded-bayesian-networks-with-thousands-of-variables>; (4) structure learning on incomplete data sets - Mauro Scanagatta (2018) <doi:10.1016/j.ijar.2018.02.004>. Distributed under the LGPL-3 by IDSIA.
The Resource Description Framework, or RDF is a widely used data representation model that forms the cornerstone of the Semantic Web. RDF represents data as a graph rather than the familiar data table or rectangle of relational databases. The rdflib package provides a friendly and concise user interface for performing common tasks on RDF data, such as reading, writing and converting between the various serializations of RDF data, including rdfxml', turtle', nquads', ntriples', and json-ld'; creating new RDF graphs, and performing graph queries using SPARQL'. This package wraps the low level redland R package which provides direct bindings to the redland C library. Additionally, the package supports the newer and more developer friendly JSON-LD format through the jsonld package. The package interface takes inspiration from the Python rdflib library.
This package provides a complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS
include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.
Compare dissolution profiles with confidence interval of similarity factor f2 using bootstrap methodology as described in the literature, such as Efron and Tibshirani (1993, ISBN:9780412042317), Davison and Hinkley (1997, ISBN:9780521573917), and Shah et al. (1998) <doi:10.1023/A:1011976615750>. The package can also be used to simulate dissolution profiles based on mathematical modelling and multivariate normal distribution.
Survey systems and other third-party data sources commonly use non-standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.
This package contains functions that can determine whether a time series is second-order stationary or not (and hence evidence for locally stationarity). Given two non-stationary series (i.e. locally stationary series) this package can then discover time-varying linear combinations that are second-order stationary. Cardinali, A. and Nason, G.P. (2013) <doi:10.18637/jss.v055.i01>.
Matrix-variate covariance estimation via the Kronecker-core decomposition. Computes the Kronecker and core covariance matrices corresponding to an arbitrary covariance matrix, and provides an empirical Bayes covariance estimator that adaptively shrinks towards the space of separable covariance matrices. For details, see Hoff, McCormack
and Zhang (2022) <arXiv:2207.12484>
"Core Shrinkage Covariance Estimation for Matrix-variate data".
This package provides a function that quickly computes the fine structure isotope patterns of a set of chemical formulas to a given degree of accuracy (up to the limit set by errors in floating point arithmetic). A data-set comprising the masses and isotopic abundances of individual elements is also provided and calculation of isotopic gross structures is also supported.