mist (Methylation Inference for Single-cell along Trajectory) is a hierarchical Bayesian framework for modeling DNA methylation trajectories and performing differential methylation (DM) analysis in single-cell DNA methylation (scDNAm) data. It estimates developmental-stage-specific variations, identifies genomic features with drastic changes along pseudotime, and, for two phenotypic groups, detects features with distinct temporal methylation patterns. mist uses Gibbs sampling to estimate parameters for temporal changes and stage-specific variations.
This package provides a inferential analysis method for detecting differentially expressed CpG sites in MeDIP-seq data. It uses statistical framework and EM algorithm, to identify differentially expressed CpG sites. The methods on this package are described in the article Methylation-level Inferences and Detection of Differential Methylation with Medip-seq Data by Yan Zhou, Jiadi Zhu, Mingtao Zhao, Baoxue Zhang, Chunfu Jiang and Xiyan Yang (2018, pending publication).
Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.
This package provides an approach which is based on the methodology of the Burden of Communicable Diseases in Europe (BCoDE) and can be used for large and small samples such as individual countries. The Burden of Healthcare-Associated Infections (BHAI) is estimated in disability-adjusted life years, number of infections as well as number of deaths per year. Results can be visualized with various plotting functions and exported into tables.
This package provides correlation-based penalty estimators for both linear and logistic regression models by implementing a new regularization method that incorporates correlation structures within the data. This method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. See Tutz and Ulbricht (2009) <doi:10.1007/s11222-008-9088-5> and Algamal and Lee (2015) <doi:10.1016/j.eswa.2015.08.016>.
This package performs classical age-depth modelling of dated sediment deposits - prior to applying more sophisticated techniques such as Bayesian age-depth modelling. Any radiocarbon dated depths are calibrated. Age-depth models are constructed by sampling repeatedly from the dated levels, each time drawing age-depth curves. Model types include linear interpolation, linear or polynomial regression, and a range of splines. See Blaauw (2010) <doi:10.1016/j.quageo.2010.01.002>.
This package implements parametric (Direct) regression methods for modeling cumulative incidence functions (CIFs) in the presence of competing risks. Methods include the direct Gompertz-based approach and generalized regression models as described in Jeong and Fine (2006) <doi:10.1111/j.1467-9876.2006.00532.x> and Jeong and Fine (2007) <doi:10.1093/biostatistics/kxj040>. The package facilitates maximum likelihood estimation, variance computation, with applications to clinical trials and survival analysis.
Create high-performance clinical reporting tables (TLGs) from ADaM-like inputs. The package provides a consistent, programmatic API to generate common tables such as demographics, adverse event incidence, and laboratory summaries, using data.table for fast aggregation over large populations. Functions support flexible target-variable selection, stratification by treatment, and customizable summary statistics, and return tidy, machine-readable results ready to render with downstream table/formatting packages in analysis pipelines.
Training and prediction functions are provided for the Extreme Learning Machine algorithm (ELM). The ELM use a Single Hidden Layer Feedforward Neural Network (SLFN) with random generated weights and no gradient-based backpropagation. The training time is very short and the online version allows to update the model using small chunk of the training set at each iteration. The only parameter to tune is the hidden layer size and the learning function.
This package provides tools to perform fuzzy formal concept analysis, presented in Wille (1982) <doi:10.1007/978-3-642-01815-2_23> and in Ganter and Obiedkov (2016) <doi:10.1007/978-3-662-49291-8>. It provides functions to load and save a formal context, extract its concept lattice and implications. In addition, one can use the implications to compute semantic closures of fuzzy sets and, thus, build recommendation systems.
The Forecast Linear Augmented Projection (flap) method reduces forecast variance by adjusting the forecasts of multivariate time series to be consistent with the forecasts of linear combinations (components) of the series by projecting all forecasts onto the space where the linear constraints are satisfied. The forecast variance can be reduced monotonically by including more components. For a given number of components, the flap method achieves maximum forecast variance reduction among linear projections.
Fitting and analyzing a Joint Trait Distribution Model. The Joint Trait Distribution Model is implemented in the Bayesian framework using conjugate priors and posteriors, thus guaranteeing fast inference. In particular the package computes joint probabilities and multivariate confidence intervals, and enables the investigation of how they depend on the environment through partial response curves. The method implemented by the package is described in Poggiato et al. (2023) <doi:10.1111/geb.13706>.
This package provides functions to fit quantile regression models for hierarchical data (2-level nested designs) as described in Geraci and Bottai (2014, Statistics and Computing) <doi:10.1007/s11222-013-9381-9>. A vignette is given in Geraci (2014, Journal of Statistical Software) <doi:10.18637/jss.v057.i13> and included in the package documents. The packages also provides functions to fit quantile models for independent data and for count responses.
This package performs maximum likelihood estimation for finite mixture models for families including Normal, Weibull, Gamma and Lognormal by using EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary. It also conducts mixture model selection by using information criteria or bootstrap likelihood ratio test. The data used for mixture model fitting can be raw data or binned data. The model fitting process is accelerated by using R package Rcpp'.
This package provides a complement to all editions of *Modern Data Science with R* (ISBN: 978-0367191498, publisher URL: <https://www.routledge.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/p/book/9780367191498>). This package contains data and code to complete exercises and reproduce examples from the text. It also facilitates connections to the SQL database server used in the book. All editions of the book are supported by this package.
This package provides an interface to the NoSQL database CouchDB (<http://couchdb.apache.org>). Methods are provided for managing databases within CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local CouchDB instance, or a remote CouchDB databases such as Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and JSON'. Targeted at CouchDB v2 or greater.
This package provides a fast, consistent tool for plotting and facilitating the analysis of stratigraphic and sedimentological data. Taking advantage of the flexible plotting tools available in R, SDAR uses stratigraphic and sedimentological data to produce detailed graphic logs for outcrop sections and borehole logs. These logs can include multiple features (e.g., bed thickness, lithology, samples, sedimentary structures, colors, fossil content, bioturbation index, gamma ray logs) (Johnson, 1992, <ISSN 0037-0738>).
Returns a data frame with the names of the input data points and hex colors (or CIELab coordinates). Data can be mapped to colors for use in data visualization. It optimally maps data points into a polygon that represents the CIELab colour space. Since Euclidean distance approximates relative perceptual differences in CIELab color space, the result is a color encoding that aims to capture much of the structure of the original data.
Mappable vector library provides convenient way to access large datasets. Use all of your data at once, with few limits. Memory mapped data can be shared between multiple R processes. Access speed depends on storage medium, so solid state drive is recommended, preferably with PCI Express (or M.2 nvme) interface or a fast network file system. The data is memory mapped into R and then accessed using usual R list and array subscription operators. Convenience functions are provided for merging, grouping and indexing large vectors and data.frames. The layout of underlying MVL files is optimized for large datasets. The vectors are stored to guarantee alignment for vector intrinsics after memory map. The package is built on top of libMVL, which can be used as a standalone C library. libMVL has simple C API making it easy to interchange datasets with outside programs. Large MVL datasets are distributed via Academic Torrents <https://academictorrents.com/collection/mvl-datasets>.
The SCDE package implements a set of statistical methods for analyzing single-cell RNA-seq data. SCDE fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The SCDE package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells.
This package offers an implementation of the Abnormal blood profile score (ABPS). The ABPS is a part of the Athlete biological passport program of the World anti-doping agency, which combines several blood parameters into a single score in order to detect blood doping. The package also contains functions to calculate other scores used in anti-doping programs, such as the ratio of hemoglobin to reticulocytes (OFF-score), as well as example data.
This package implements maximum likelihood methods for evaluating the durability of vaccine efficacy in a randomized, placebo-controlled clinical trial with staggered enrollment of participants and potential crossover of placebo recipients before the end of the trial. Lin, D. Y., Zeng, D., and Gilbert, P. B. (2021) <doi:10.1093/cid/ciab226> and Lin, D. Y., Gu, Y., Zeng, D., Janes, H. E., and Gilbert, P. B. (2021) <doi:10.1093/cid/ciab630>.
The models of probability density functions are Gaussian or exponential distributions with polynomial correction terms. Using a maximum likelihood method, dsdp computes parameters of Gaussian or exponential distributions together with degrees of polynomials by a grid search, and coefficient of polynomials by a variant of semidefinite programming. It adopts Akaike Information Criterion for model selection. See a vignette for a tutorial and more on our Github repository <https://github.com/tsuchiya-lab/dsdp/>.
Returns the noncentrality parameter of the noncentral F distribution if probability of type I and type II error, degrees of freedom of the numerator and the denominator are given. It may be useful for computing minimal detectable differences for general ANOVA models. This program is documented in the paper of A. Baharev, S. Kemeny, On the computation of the noncentral F and noncentral beta distribution; Statistics and Computing, 2008, 18 (3), 333-340.