This package provides tools to estimate the genome size of polyploid species using k-mer frequencies. This package includes functions to process k-mer frequency data and perform genome size estimation by fitting k-mer frequencies with a normal distribution model. It supports handling of complex polyploid genomes and offers various options for customizing the estimation process. The basic method findGSE
is detailed in Sun, Hequan, et al. (2018) <doi:10.1093/bioinformatics/btx637>.
This package provides a minimal set of routines to calculate the Grantham distance <doi:10.1126/science.185.4154.862>. The Grantham distance attempts to provide a proxy for the evolutionary distance between two amino acids based on three key chemical properties: composition, polarity and molecular volume. In turn, evolutionary distance is used as a proxy for the impact of missense mutations. The higher the distance, the more deleterious the substitution is expected to be.
Set of tools for reading, writing and transforming spatial and seasonal data, model selection and specific statistical tests for ecologists. It includes functions to interpolate regular positions of points between landmarks, to discretize polylines into regular point positions, link distant observations to points and convert a bounding box in a spatial object. It also provides miscellaneous functions for field ecologists such as spatial statistics and inference on diversity indexes, writing data.frame with Chinese characters.
Estimates the population average controlled difference for a given outcome between levels of a binary treatment, exposure, or other group membership variable of interest for clustered, stratified survey samples where sample selection depends on the comparison group. Provides three methods for estimation, namely outcome modeling and two factorizations of inverse probability weighting. Under stronger assumptions, these methods estimate the causal population average treatment effect. Salerno et al., (2024) <doi:10.48550/arXiv.2406.19597>
.
This package provides functions and utilities to perform Statistical Analyses in the Six Sigma way. Through the DMAIC cycle (Define, Measure, Analyze, Improve, Control), you can manage several Quality Management studies: Gage R&R, Capability Analysis, Control Charts, Loss Function Analysis, etc. Data frames used in the books "Six Sigma with R" [ISBN 978-1-4614-3652-2] and "Quality Control with R" [ISBN 978-3-319-24046-6], are also included in the package.
Maximum likelihood estimation of the parameters of matrix and 3rd-order tensor normal distributions with unstructured factor variance covariance matrices, two procedures, and for unbiased modified likelihood ratio testing of simple and double separability for variance-covariance structures, two procedures. References: Dutilleul P. (1999) <doi:10.1080/00949659908811970>, Manceur AM, Dutilleul P. (2013) <doi:10.1016/j.cam.2012.09.017>, and Manceur AM, Dutilleul P. (2013) <doi:10.1016/j.spl.2012.10.020>.
Users can build and test customized quantitative trading strategies. Some quantitative trading strategies are already implemented, e.g. various moving-average filters with trend following approaches. The implemented class called "Strategy" allows users to access several methods to analyze performance figures, plots and backtest the strategies. Furthermore, custom strategies can be added, a generic template is available. The custom strategies require a certain input and output so they can be called from the Strategy-constructor.
Package to predict protein-protein interaction (PPI) networks in target organisms for which only a view information about PPIs is available. Path2PPI predicts PPI networks based on sets of proteins which can belong to a certain pathway from well-established model organisms. It helps to combine and transfer information of a certain pathway or biological process from several reference organisms to one target organism. Path2PPI only depends on the sequence similarity of the involved proteins.
Large data files can be difficult to work with in R, where data generally resides in memory. This package encourages a style of programming where data is streamed from disk into R via a `producer and through a series of `consumers that, typically reduce the original data to a manageable size. The package provides useful Producer and Consumer stream components for operations such as data input, sampling, indexing, and transformation; see package?Streamer for details.
Uniquorn enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).
This package is to find SNV/Indel differences between two bam
files with near relationship in a way of pairwise comparison through each base position across the genome region of interest. The difference is inferred by Fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region.
This package contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library. All models return coda
mcmc
objects that can then be summarized using the coda
package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
Skinfold measurements is one of the most popular and practical methods for estimating percent body fat. Body composition is a term that describes the relative proportions of fat, bone, and muscle mass in the human body. Following the collection of skinfold measurements, regression analysis (a statistical procedure used to predict a dependent variable based on one or more independent or predictor variables) is used to estimate total percent body fat in humans. <doi:10.4324/9780203868744>.
This package provides functions to get and download city bike data from the website and API service of each city bike service in Norway. The package aims to reduce time spent on getting Norwegian city bike data, and lower barriers to start analyzing it. The data is retrieved from Oslo City Bike, Bergen City Bike, and Trondheim City Bike. The data is made available under NLOD 2.0 <https://data.norge.no/nlod/en/2.0>.
This package performs parallel analysis (Timmerman & Lorenzo-Seva, 2011 <doi:10.1037/a0023353>) and hull method (Lorenzo-Seva, Timmerman, & Kiers, 2011 <doi:10.1080/00273171.2011.564527>) for assessing the dimensionality of a set of variables using minimum rank factor analysis (see ten Berge & Kiers, 1991 <doi:10.1007/BF02294464> for more information). The package also includes the option to compute minimum rank factor analysis by itself, as well as the greater lower bound calculation.
Easy-to-use, very fast implementation of various functional bases. Easily used together with other packages. A functional basis is a collection of basis functions [\phi_1, ..., \phi_n] that can represent a smooth function, i.e. $f(t) = \sum c_k \phi_k(t)$. First- and second-order derivatives are also included. These are the mathematically correct ones, no approximations applied. As of version 1.1, this package includes B-splines, Fourier bases and polynomials.
This package provides complete detailed preprocessing of two-dimensional gas chromatogram (GCxGC
) samples. Baseline correction, smoothing, peak detection, and peak alignment. Also provided are some analysis functions, such as finding extracted ion chromatograms, finding mass spectral data, targeted analysis, and nontargeted analysis with either the National Institute of Standards and Technology Mass Spectral Library or with the mass data. There are also several visualization methods provided for each step of the preprocessing and analysis.
Reads data collected from wearable acceleratometers as used in sleep and physical activity research. Currently supports file formats: binary data from GENEActiv <https://activinsights.com/>, .bin-format from GENEA devices (not for sale), and .cwa-format from Axivity <https://axivity.com>. Further, it has functions for reading text files with epoch level aggregates from Actical', Fitbit', Actiwatch', ActiGraph
', and PhilipsHealthBand
'. Primarily designed to complement R package GGIR <https://CRAN.R-project.org/package=GGIR>.
Facilitate frequentist and Bayesian meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance (Debray et al., 2019) <doi:10.1177/0962280218785504>. It also includes functions to evaluate funnel plot asymmetry (Debray et al., 2018) <doi:10.1002/jrsm.1266>. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering (de Jong et al., 2021) <doi:10.1002/sim.8981>.
This package provides a declarative language for specifying multilevel models, solving for population parameters based on specified variance-explained effect size measures, generating data, and conducting power analyses to determine sample size recommendations. The specification allows for any number of within-cluster effects, between-cluster effects, covariate effects at either level, and random coefficients. Moreover, the models do not assume orthogonal effects, and predictors can correlate at either level and accommodate models with multiple interaction effects.
Makes it possible to create an internally consistent repository consisting of selected packages from CRAN-like repositories. The user specifies a set of desired packages, and miniCRAN
recursively reads the dependency tree for these packages, then downloads only this subset. The user can then install packages from this repository directly, rather than from CRAN. This is useful in production settings, e.g. server behind a firewall, or remote locations with slow (or zero) Internet access.
This package implements HSROC (hierarchical summary receiver operating characteristic) model developed by Ma, Lian, Chu, Ibrahim, and Chen (2018) <doi:10.1093/biostatistics/kxx025> and hierarchical model developed by Lian, Hodges, and Chu (2019) <doi:10.1080/01621459.2018.1476239> for performing meta-analysis for 1-5 diagnostic tests to simultaneously compare multiple tests within a missing data framework. This package evaluates the accuracy of multiple diagnostic tests and also gives graphical representation of the results.
Speeds up the process of loading raw data from MBA (Multiplex Bead Assay) examinations, performs quality control checks, and automatically normalises the data, preparing it for more advanced, downstream tasks. The main objective of the package is to create a simple environment for a user, who does not necessarily have experience with R language. The package is developed within the project of the same name - PvSTATEM
', which is an international project aiming for malaria elimination.
Fits a wide variety of multivariate spatio-temporal models with simultaneous and lagged interactions among variables (including vector autoregressive spatio-temporal ('VAST') dynamics) for areal, continuous, or network spatial domains. It includes time-variable, space-variable, and space-time-variable interactions using dynamic structural equation models ('DSEM') as expressive interface, and the mgcv package to specify splines via the formula interface. See Thorson et al. (2024) <doi:10.48550/arXiv.2401.10193>
for more details.