The Futureverse is a set of packages for parallel and distributed process with the future package at its core, cf. Bengtsson (2021) <doi:10.32614/RJ-2021-048>. This package is designed to make it easy to install and load multiple Futureverse packages in a single step. This package is intended for end-users, interactive use, and R scripts. Packages must not list it as a dependency - instead, explicitly declare each Futureverse package as a dependency as needed.
Evolutionary black box optimization algorithms building on the bbotk package. miesmuschel offers both ready-to-use optimization algorithms, as well as their fundamental building blocks that can be used to manually construct specialized optimization loops. The Mixed Integer Evolution Strategies as described by Li et al. (2013) <doi:10.1162/EVCO_a_00059> can be implemented, as well as the multi-objective optimization algorithms NSGA-II by Deb, Pratap, Agarwal, and Meyarivan (2002) <doi:10.1109/4235.996017>.
Useful functions for one-sample (individual level data) Mendelian randomization and instrumental variable analyses. The package includes implementations of; the Sanderson and Windmeijer (2016) <doi:10.1016/j.jeconom.2015.06.004> conditional F-statistic, the multiplicative structural mean model Hernán and Robins (2006) <doi:10.1097/01.ede.0000222409.00878.37>, and two-stage predictor substitution and two-stage residual inclusion estimators explained by Terza et al. (2008) <doi:10.1016/j.jhealeco.2007.09.009>.
Routines for state estimate in a linear Gaussian state space model and a simple stochastic volatility model using particle filtering. Parameter inference is also carried out in these models using the particle Metropolis-Hastings algorithm that includes the particle filter to provided an unbiased estimator of the likelihood. This package is a collection of minimal working examples of these algorithms and is only meant for educational use and as a start for learning to them on your own.
This package provides a collection of functions that primarily produce graphics to aid in a Propensity Score Analysis (PSA). Functions include: cat.psa and box.psa to test balance within strata of categorical and quantitative covariates, circ.psa for a representation of the estimated effect size by stratum, loess.psa that provides a graphic and loess based effect size estimate, and various balance functions that provide measures of the balance achieved via a PSA in a categorical covariate.
This package provides a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient, stable, and high confidential variables from omics-based data. Using a bagging strategy in combination of a parametric method or inflection point search method for cut-off threshold determination. This package can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates. Luo H, Zhao Q, et al (2020) <doi:10.1126/scitranslmed.aax7533> for more details.
This package provides a class and subclasses for storing non-scalar objects in matrix entries. This is akin to a ragged array but the raggedness is in the third dimension, much like a bumpy surface--hence the name. Of particular interest is the BumpyDataFrameMatrix
, where each entry is a Bioconductor data frame. This allows us to naturally represent multivariate data in a format that is compatible with two-dimensional containers like the SummarizedExperiment
and MultiAssayExperiment
objects.
Function-oriented Make-like declarative pipelines for statistics and data science are supported in the targets R package. As an extension to targets, the tarchetypes package provides convenient user-side functions to make targets easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the drake R package by Will Landau (2018) <doi:10.21105/joss.00550>.
inline-c
is a small crate that allows a user to write C (including C++) code inside Rust. Both environments are strictly sandboxed. The C code is transformed into a string which is written to a temporary file. This file is then compiled into an object file, that is finally executed.
The primary goal of inline-c
is to ease the testing of a C API of a Rust program (generated with cbindgen
for example).
Finds the most likely originating tissue(s) and developmental stage(s) of tissue-specific RNA sequencing data. The package identifies both pure transcriptomes and mixtures of transcriptomes. The most likely identity is found through comparisons of the sequencing data with high-throughput in situ hybridisation patterns. Typical uses are the identification of cancer cell origins, validation of cell culture strain identities, validation of single-cell transcriptomes, and validation of identity and purity of flow-sorting and dissection sequencing products.
This package implements fast Monte Carlo simulations for goodness-of-fit (GOF) tests for discrete distributions. This includes tests based on the Chi-squared statistic, the log-likelihood-ratio (G^2) statistic, the Freeman-Tukey (Hellinger-distance) statistic, the Kolmogorov-Smirnov statistic, the Cramer-von Mises statistic as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828>, and the root-mean-square statistic, see Perkins, Tygert, and Ward (2011) <doi:10.1016/j.amc.2011.03.124>.
An implementation of several functions for feature extraction in ordinal time series datasets. Specifically, some of the features proposed by Weiss (2019) <doi:10.1080/01621459.2019.1604370> can be computed. These features can be used to perform inferential tasks or to feed machine learning algorithms for ordinal time series, among others. The package also includes some interesting datasets containing financial time series. Practitioners from a broad variety of fields could benefit from the general framework provided by otsfeatures'.
Using Gaussian graphical models we propose a novel approach to perform pathway analysis using gene expression. Given the structure of a graph (a pathway) we introduce two statistical tests to compare the mean and the concentration matrices between two groups. Specifically, these tests can be performed on the graph and on its connected components (cliques). The package is based on the method described in Massa M.S., Chiogna M., Romualdi C. (2010) <doi:10.1186/1752-0509-4-121>.
This package contains functions for a variational Bayesian method for sparse PCA proposed by Ning (2020) <arXiv:2102.00305>
. There are two algorithms: the PX-CAVI algorithm (if assuming the loadings matrix is jointly row-sparse) and the batch PX-CAVI algorithm (if without this assumption). The outputs of the main function, VBsparsePCA()
, include the mean and covariance of the loadings matrix, the score functions, the variable selection results, and the estimated variance of the random noise.
This package provides functions to estimate a factor model using discrete and continuous proxy variables. The function dproxyme estimates a factor model of discrete proxy variables using an EM algorithm (Dempster, Laird, Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x>; Hu (2008) <doi:10.1016/j.jeconom.2007.12.001>; Hu(2017) <doi:10.1016/j.jeconom.2017.06.002> ). The function cproxyme estimates a linear factor model (Cunha, Heckman, and Schennach (2010) <doi:10.3982/ECTA6551>).
The fusion learning method uses a model selection algorithm to learn from multiple data sets across different experimental platforms through group penalization. The responses of interest may include a mix of discrete and continuous variables. The responses may share the same set of predictors, however, the models and parameters differ across different platforms. Integrating information from different data sets can enhance the power of model selection. Package is based on Xin Gao, Raymond J. Carroll (2017) <arXiv:1610.00667v1>
.
This package provides functions that support a broad range of common tasks in physical activity research, including but not limited to creation of Bland-Altman plots (<doi:10.1136/bmj.313.7049.106>), metabolic calculations such as basal metabolic rate predictions (<https://europepmc.org/article/med/4044297/reloa>), demographic calculations such as age-for-body-mass-index percentile (<https://www.cdc.gov/growthcharts/cdc_charts.htm>), and analysis of bout detection algorithm performance (<https://pubmed.ncbi.nlm.nih.gov/34258524/>).
An API wrapper around the ProPublica
API <https://projects.propublica.org/api-docs/congress-api/> for U.S. Congressional Bills. Users can include their API key, U.S. Congress, branch, and offset ranges, to return a dataframe of all results within those parameters. This package is different from the RPublica package because it is for the ProPublica
U.S. Congress data API, and the RPublica package is for the Nonprofit Explorer, Forensics, and Free the Files data APIs.
This package contains functions for analysis and summary of tidal datasets. Also provides access to tidal data collected by the National Oceanic and Atmospheric Administration's Center for Operational Oceanographic Products and Services and the Permanent Service for Mean Sea Level. For detailed description and application examples, see Hill, T.D. and S.C. Anisfeld (2021) <doi:10.6084/m9.figshare.14161202.v1> and Hill, T.D. and S.C. Anisfeld (2015) <doi:10.1016/j.ecss.2015.06.004>.
A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences. In order to use the program, the user submits a sequence in FASTA format. The output consists of two files: a repeat table file and an alignment file. Submitted sequences may be of arbitrary length. Repeats with pattern size in the range from 1 to 2000 bases are detected.
Programming oncology specific Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM
) datasets in R'. ADaM
datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team (2021), <https://www.cdisc.org/standards/foundational/adam>). The package is an extension package of the admiral package.
This package provides functions to compute distances between probability measures or any other data object than can be posed in this way, entropy measures for samples of curves, distances and depth measures for functional data, and the Generalized Mahalanobis Kernel distance for high dimensional data. For further details about the metrics please refer to Martos et al (2014) <doi:10.3233/IDA-140706>; Martos et al (2018) <doi:10.3390/e20010033>; Hernandez et al (2018, submitted); Martos et al (2018, submitted).
The four-gamete test is based on the infinite-sites model which assumes that the probability of the same mutation occurring twice (recurrent or parallel mutations) and the probability of a mutation back to the original state (reverse mutations) are close to zero. Without these types of mutations, the only explanation for observing the four dilocus genotypes (example below) is recombination (Hudson and Kaplan 1985, Genetics 111:147-164). Thus, the presence of all four gametes is also called phylogenetic incompatibility.
This package provides a set of tools supporting more flexible heatmaps. The graphics is grid-like using the old graphics system. The main function is heatmap.n2()
, which is a wrapper around the various functions constructing individual parts of the heatmap, like sidebars, picket plots, legends etc. The function supports zooming and splitting, i.e., having (unlimited) small heatmaps underneath each other in one plot deriving from the same data set, e.g., clustered and ordered by a supervised clustering method.