This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
This package provides a one-to-one mapping from gene to "best" probe set for four Affymetrix human gene expression microarrays: hgu95av2, hgu133a, hgu133plus2, and u133x3p. On Affymetrix gene expression microarrays, a single gene may be measured by multiple probe sets. This can present a mild conundrum when attempting to evaluate a gene "signature" that is defined by gene names rather than by specific probe sets. This package also includes the pre-calculated probe set quality scores that were used to define the mapping.
This package provides a general routine, envMU
, which allows estimation of the M envelope of span(U) given root n consistent estimators of M and U. The routine envMU
does not presume a model. This package implements response envelopes, partial response envelopes, envelopes in the predictor space, heteroscedastic envelopes, simultaneous envelopes, scaled response envelopes, scaled envelopes in the predictor space, groupwise envelopes, weighted envelopes, envelopes in logistic regression, envelopes in Poisson regression envelopes in function-on-function linear regression, envelope-based Partial Partial Least Squares, envelopes with non-constant error covariance, envelopes with t-distributed errors, reduced rank envelopes and reduced rank envelopes with non-constant error covariance. For each of these model-based routines the package provides inference tools including bootstrap, cross validation, estimation and prediction, hypothesis testing on coefficients are included except for weighted envelopes. Tools for selection of dimension include AIC, BIC and likelihood ratio testing. Background is available at Cook, R. D., Forzani, L. and Su, Z. (2016) <doi:10.1016/j.jmva.2016.05.006>. Optimization is based on a clockwise coordinate descent algorithm.
Diagnostic tools based on two-way anova and median-polish residual plots for Bicluster output obtained from packages; "biclust" by Kaiser et al.(2008),"isa2" by Csardi et al. (2010) and "fabia" by Hochreiter et al. (2010). Moreover, It provides visualization tools for bicluster output and corresponding non-bicluster rows- or columns outcomes. It has also extended the idea of Kaiser et al.(2008) which is, extracting bicluster output in a text format, by adding two bicluster methods from the fabia and isa2 R packages.
This package provides a systematic biology tool was developed to identify cell infiltration via an Individualized Cell crosstalk network. CITMIC first constructed a weighted cell crosstalk network by integrating Cell-target interaction information, biological process data from the Gene Ontology (GO) database, and gene transcriptomic data in a specific sample, and then, it used a network propagation algorithm on the network to identify cell infiltration for the sample. Ultimately, cell infiltration in the patient dataset was obtained by normalizing the centrality scores of the cells.
Plot confidence interval from the objects of statistical tests such as t.test()
, var.test()
, cor.test()
, prop.test()
and fisher.test()
('htest class), Tukey test [TukeyHSD()
], Dunnett test [glht()
in multcomp package], logistic regression [glm()
], and Tukey or Games-Howell test [posthocTGH()
in userfriendlyscience package]. Users are able to set the styles of lines and points. This package contains the function to calculate odds ratios and their confidence intervals from the result of logistic regression.
This package provides the mathematical model described by "Serostatus Testing & Dengue Vaccine Cost-Benefit Thresholds" in <doi:10.1098/rsif.2019.0234>. Using the functions in the package, that analysis can be repeated using sample life histories, either synthesized from local seroprevalence data using other functions in this package (as in the manuscript) or from some other source. The package provides a vignette which walks through the analysis in the publication, as well as a function to generate a project skeleton for such an analysis.
Wrappers for functions in the gRain
package to emulate some RHugin functionality, allowing the building of Bayesian networks consisting on discrete chance nodes incrementally, through adding nodes, edges and conditional probability tables, the setting of evidence, both hard (boolean) or soft (likelihoods), querying marginal probabilities and normalizing constants, and generating sets of high-probability configurations. Computations will typically not be so fast as they are with RHugin', but this package should assist users without access to Hugin to use code written to use RHugin'.
Flexible and robust estimation and inference of generalised autoregressive conditional heteroscedasticity (GARCH) models with covariates ('X') based on the results by Francq and Thieu (2018) <doi:10.1017/S0266466617000512>. Coefficients can straightforwardly be set to zero by omission, and quasi maximum likelihood methods ensure estimates are generally consistent and inference valid, even when the standardised innovations are non-normal and/or dependent over time, see <https://journal.r-project.org/archive/2021/RJ-2021-057/RJ-2021-057.pdf> for an overview of the package.
Iterator for generating permutations and combinations. They can be either drawn with or without replacement, or with distinct/ non-distinct items (multiset). The generated sequences are in lexicographical order (dictionary order). The algorithms to generate permutations and combinations are memory efficient. These iterative algorithms enable users to process all sequences without putting all results in the memory at the same time. The algorithms are written in C/C++ for faster performance. Note: iterpc is no longer being maintained. Users are recommended to switch to arrangements'.
Color schemes ready for each type of data (qualitative, diverging or sequential), with colors that are distinct for all people, including color-blind readers. This package provides an implementation of Paul Tol (2018) and Fabio Crameri (2018) <doi:10.5194/gmd-11-2541-2018> color schemes for use with graphics or ggplot2'. It provides tools to simulate color-blindness and to test how well the colors of any palette are identifiable. Several scientific thematic schemes (geologic timescale, land cover, FAO soils, etc.) are also implemented.
This package provides a collection of matrix functions for teaching and learning matrix linear algebra as used in multivariate statistical methods. These functions are mainly for tutorial purposes in learning matrix algebra ideas using R. In some cases, functions are provided for concepts available elsewhere in R, but where the function call or name is not obvious. In other cases, functions are provided to show or demonstrate an algorithm. In addition, a collection of functions are provided for drawing vector diagrams in 2D and 3D.
Values below the limit of detection (LOD) are a problem in several fields of science, and there are numerous approaches for replacing the missing data. We present a new mathematical solution for maximum likelihood estimation that allows us to estimate the true values of the mean and standard deviation for normal distributions and is significantly faster than previous implementations. The article with the details was submitted to JSS and can be currently seen on <https://www2.arnes.si/~tverbo/LOD/Verbovsek_Sega_2_Manuscript.pdf>.
Incorporates a Bayesian monotonic single-index mixed-effect model with a multivariate skew-t likelihood, specifically designed to handle survey weights adjustments. Features include a simulation program and an associated Gibbs sampler for model estimation. The single-index function is constrained to be monotonic increasing, utilizing a customized Gaussian process prior for precise estimation. The model assumes random effects follow a canonical skew-t distribution, while residuals are represented by a multivariate Student-t distribution. Offers robust Bayesian adjustments to integrate survey weight information effectively.
This package provides a collection of various techniques correcting statistical models for sample selection bias is provided. In particular, the resampling-based methods "stochastic inverse-probability oversampling" and "parametric inverse-probability bagging" are placed at the disposal which generate synthetic observations for correcting classifiers for biased samples resulting from stratified random sampling. For further information, see the article Krautenbacher, Theis, and Fuchs (2017) <doi:10.1155/2017/7847531>. The methods may be used for further purposes where weighting and generation of new observations is needed.
Spectral and Average Autocorrelation Zero Distance Density ('sazed') is a method for estimating the season length of a seasonal time series. sazed is aimed at practitioners, as it employs only domain-agnostic preprocessing and does not depend on parameter tuning or empirical constants. The computation of sazed relies on the efficient autocorrelation computation methods suggested by Thibauld Nion (2012, URL: <https://etudes.tibonihoo.net/literate_musing/autocorrelations.html>) and by Bob Carpenter (2012, URL: <https://lingpipe-blog.com/2012/06/08/autocorrelation-fft-kiss-eigen/>).
The goal of trainR
is to provide a simple interface to the National Rail Enquiries (NRE) systems. There are few data feeds available, the simplest of them is Darwin, which provides real-time arrival and departure predictions, platform numbers, delay estimates, schedule changes and cancellations. Other data feeds provide historical data, Historic Service Performance (HSP), and much more. trainR
simplifies the data retrieval, so that the users can focus on their analyses. For more details visit <https://www.nationalrail.co.uk/46391.aspx>.
Calculates total survey error (TSE) for a survey under multiple, different weighting schemes, using both scale-dependent and scale-independent metrics. Package works directly from the data set, with no hand calculations required: just upload a properly structured data set (see TESTWGT and its documentation), properly input column names (see functions documentation), and run your functions. For more on TSE, see: Weisberg, Herbert (2005, ISBN:0-226-89128-3); Biemer, Paul (2010) <doi:10.1093/poq/nfq058>; Biemer, Paul et.al. (2017, ISBN:9781119041672); etc.
This package provides an algorithm to detect and characterize disturbances (start, end dates, intensity) that can occur at different hierarchical levels by studying the dynamics of longitudinal observations at the unit level and group level based on Nadaraya-Watson's smoothing curves, but also a shiny app which allows to visualize the observations and the detected disturbances. Finally the package provides a dataframe mimicking a pig farming system subsected to disturbances simulated according to Le et al.(2022) <doi:10.1016/j.animal.2022.100496>.
ILoReg
is a tool for identification of cell populations from scRNA-seq
data. In particular, ILoReg
is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided.
This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. limpca applies a GLM (General Linear Model) version of ASCA and APCA to analyse multivariate sample profiles generated by an experimental design. ASCA/APCA provide powerful visualization tools for multivariate structures in the space of each effect of the statistical model linked to the experimental design and contrarily to MANOVA, it can deal with mutlivariate datasets having more variables than observations. This method can handle unbalanced design.
We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity. Jiang, W., Song, S., Hou, L. and Zhao, H. "A set of efficient methods to generate high-dimensional binary data with specified correlation structures." The American Statistician. See <doi:10.1080/00031305.2020.1816213> for a detailed presentation of the method.
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with MatchIt
', WeightIt
', MatchThem
', twang', Matching', optmatch', CBPS', ebal', cem', sbw', and designmatch for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
It offers comprehensive tools for the analysis of functional time series data, focusing on white noise hypothesis testing and goodness-of-fit evaluations, alongside functions for simulating data and advanced visualization techniques, such as 3D rainbow plots. These methods are described in Kokoszka, Rice, and Shang (2017) <doi:10.1016/j.jmva.2017.08.004>, Yeh, Rice, and Dubin (2023) <doi:10.1214/23-EJS2112>, Kim, Kokoszka, and Rice (2023) <doi:10.1214/23-ss143>, and Rice, Wirjanto, and Zhao (2020) <doi:10.1111/jtsa.12532>.