The employment of the Wavelet decomposition technique proves to be highly advantageous in the modelling of noisy time series data. Wavelet decomposition technique using the "haar" algorithm has been incorporated to formulate a hybrid Wavelet KNN (K-Nearest Neighbour) model for time series forecasting, as proposed by Anjoy and Paul (2017) <DOI:10.1007/s00521-017-3289-9>.
Extremely fast hashing of R objects using xxHash
'. R objects are hashed via the standard serialization mechanism in R. Raw byte vectors and strings can be handled directly for compatibility with hashes created on other systems. This implementation is a wrapper around the xxHash
C library which is available from <https://github.com/Cyan4973/xxHash>
.
MSstats package provide tools for preprocessing, summarization and differential analysis of mass spectrometry (MS) proteomics data. Recently, some MS protocols enable acquisition of data sets that result in larger than memory quantitative data. MSstats functions are not able to process such data. MSstatsBig
package provides additional converter functions that enable processing larger than memory data sets.
HDCytoData contains a set of high-dimensional cytometry benchmark datasets. These datasets are formatted into SummarizedExperiment and flowSet Bioconductor object formats, including all required metadata. Row metadata includes sample IDs, group IDs, patient IDs, reference cell population or cluster labels and labels identifying spiked in cells. Column metadata includes channel names, protein marker names, and protein marker classes.
The package is usable with Affymetrix GeneChip short oligonucleotide arrays, and it can be adapted or extended to other platforms. It is able to modify or replace the grouping of probes in the probe sets. Also, the package contains simple functions to read R connections in the FASTA format and it can create an alternative mapping from sequences.
This package implements multitaper spectral estimation techniques using prolate spheroidal sequences (Slepians) and sine tapers for time series analysis. It includes an adaptive weighted multitaper spectral estimate, a coherence estimate, Thomson's Harmonic F-test, and complex demodulation. The Slepians sequences are generated efficiently using a tridiagonal matrix solution, and jackknifed confidence intervals are available for most estimates.
R's default conflict management system gives the most recently loaded package precedence. This can make it hard to detect conflicts, particularly when they arise because a package update creates ambiguity that did not previously exist. The conflicted
package takes a different approach, making every conflict an error and forcing you to choose which function to use.
In order to smoothly animate the transformation of polygons and paths, many aspects needs to be taken into account, such as differing number of control points, changing center of rotation, etc. The transformr package provides an extensive framework for manipulating the shapes of polygons and paths and can be seen as the spatial brother to the tweenr package.
Random Jungle is an implementation of Random Forests. It is supposed to analyse high dimensional data. In genetics, it can be used for analysing big Genome Wide Association (GWA) data. Random Forests is a powerful machine learning method. Most interesting features are variable selection, missing value imputation, classifier creation, generalization error estimation and sample proximities between pairs of cases.
Mixedpower uses pilotdata and a linear mixed model fitted with lme4 to simulate new data sets. Power is computed separate for every effect in the model output as the relation of significant simulations to all simulations. More conservative simulations as a protection against a bias in the pilotdata are available as well as methods for plotting the results.
This package provides a tool that contains trained deep learning models for predicting effector proteins. deepredeff has been trained to identify effector proteins using a set of known experimentally validated effectors from either bacteria, fungi, or oomycetes. Documentation is available via several vignettes, and the paper by Kristianingsih and MacLean
(2020) <doi:10.1101/2020.07.08.193250>.
This package implements several algorithms for bundling edges in networks and flow and metro map layouts. This includes force directed edge bundling <doi:10.1111/j.1467-8659.2009.01450.x>, a flow algorithm based on Steiner trees<doi:10.1080/15230406.2018.1437359> and a multicriteria optimization method for metro map layouts <doi:10.1109/TVCG.2010.24>.
This package provides a collection of functions that would help one to build features based on external data. Very useful for Data Scientists in data to day work. Many functions create features using parallel computation. Since the nitty gritty of parallel computation is hidden under the hood, the user need not worry about creating clusters and shutting them down.
R lists, especially nested lists, can be very difficult to visualize or represent. Sometimes str()
is not enough, so this suite of htmlwidgets is designed to help see, understand, and maybe even modify your R lists. The function reactjson()
requires a package reactR
that can be installed from CRAN or <https://github.com/timelyportfolio/reactR>
.
Efficient calculation of pseudo-ranks and (pseudo)-rank based test statistics. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. See Happ et al. (2020, <doi:10.18637/jss.v095.c01>) for details.
For biparental, three and four-way crosses Identity by Descent (IBD) probabilities can be calculated using Hidden Markov Models and inheritance vectors following Lander and Green (<https://www.jstor.org/stable/29713>) and Huang (<doi:10.1073/pnas.1100465108>). One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.
This package provides bindings to Tree-sitter', an incremental parsing system for programming tools. Tree-sitter builds concrete syntax trees for source files of any language, and can efficiently update those syntax trees as the source file is edited. It also includes a robust error recovery system that provides useful parse results even in the presence of syntax errors.
This package provides a Shiny application for visualization, exploration, comparison, and filtering of CRISPR screens analyzed with MAGeCK
RRA or MLE. Features include interactive plots with on-click labeling, full customization of plot aesthetics, data upload and/or download, and much more. Quickly and easily explore your CRISPR screen results and generate publication-quality figures in seconds.
This package provides functionality for performing divergence analysis as presented in Dinalankara et al, "Digitizing omics profiles by divergence from a baseline", PANS 2018. This allows the user to simplify high dimensional omics data into a binary or ternary format which encapsulates how the data is divergent from a specified baseline group with the same univariate or multivariate features.
This package allows to estimate missing values in DNA methylation data. methyLImp
method is based on linear regression since methylation levels show a high degree of inter-sample correlation. Implementation is parallelised over chromosomes since probes on different chromosomes are usually independent. Mini-batch approach to reduce the runtime in case of large number of samples is available.
Estimate group aggregates, where one can set user-defined conditions that each group of records must satisfy to be suitable for aggregation. If a group of records is not suitable, it is expanded using a collapsing scheme defined by the user. A paper on this package was published in the Journal of Statistical Software <doi:10.18637/jss.v112.i04>.
This package contains a range of functions covering the present development of the distributional method for the dichotomisation of continuous outcomes. The method provides estimates with standard error of a comparison of proportions (difference, odds ratio and risk ratio) derived, with similar precision, from a comparison of means. See the URL below or <arXiv:1809.03279>
for more information.
This package implements the G-Formula method for causal inference with time-varying treatments and confounders using Bayesian multiple imputation methods, as described by Bartlett et al (2025) <doi:10.1177/09622802251316971>. It creates multiple synthetic imputed datasets under treatment regimes of interest using the mice package. These can then be analysed using rules developed for analysing multiple synthetic datasets.
Support for geostatistical analysis of multivariate data, in particular data with restrictions, e.g. positive amounts, compositions, distributional data, microstructural data, etc. It includes descriptive analysis and modelling for such data, both from a two-point Gaussian perspective and multipoint perspective. The methods mainly follow Tolosana-Delgado, Mueller and van den Boogaart (2018) <doi:10.1007/s11004-018-9769-3>.