This package provides a set of functions to quantify the relationship between development rate and temperature and to build phenological models. The package comprises a set of models and estimated parameters borrowed from a literature review in ectotherms. The methods and literature review are described in Rebaudo et al. (2018) <doi:10.1111/2041-210X.12935>, Rebaudo and Rabhi (2018) <doi:10.1111/eea.12693>, and Regnier et al. (2021) <doi:10.1093/ee/nvab115>. An example can be found in Rebaudo et al. (2017) <doi:10.1007/s13355-017-0480-5>.
This package provides a comprehensive toolkit for single-cell annotation with the CellMarker2.0
database (see Xia Li, Peng Wang, Yunpeng Zhang (2023) <doi: 10.1093/nar/gkac947>). Streamlines biological label assignment in single-cell RNA-seq data and facilitates transcriptomic analysis, including preparation of TCGA<https://portal.gdc.cancer.gov/> and GEO<https://www.ncbi.nlm.nih.gov/geo/> datasets, differential expression analysis and visualization of enrichment analysis results. Additional utility functions support various bioinformatics workflows. See Wei Cui (2024) <doi: 10.1101/2024.09.14.609619> for more details.
The Delphi Epidata API provides real-time access to epidemiological surveillance data for influenza, COVID-19', and other diseases for the USA at various geographical resolutions, both from official government sources such as the Center for Disease Control (CDC) and Google Trends and private partners such as Facebook and Change Healthcare'. It is built and maintained by the Carnegie Mellon University Delphi research group. To cite this API: David C. Farrow, Logan C. Brooks, Aaron Rumack', Ryan J. Tibshirani', Roni Rosenfeld (2015). Delphi Epidata API. <https://github.com/cmu-delphi/delphi-epidata>.
This package implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
This package provides a collection of cancer transcriptomics gene signatures as well as a simple and tidy interface to compute single sample enrichment scores either with the original procedure or with three alternatives: the "combined z-score" of Lee et al. (2008) <doi:10.1371/journal.pcbi.1000217>, the "single sample GSEA" of Barbie et al. (2009) <doi:10.1038/nature08460> and the "singscore" of Foroutan et al. (2018) <doi:10.1186/s12859-018-2435-4>. The get_sig_info()
function can be used to retrieve information about each signature implemented.
This package provides a statistical learning method that tries to find the best set of predictors and interactions between predictors for modeling binary or quantitative response data in a decision tree. Several search algorithms and ensembling techniques are implemented allowing for finetuning the method to the specific problem. Interactions with quantitative covariables can be properly taken into account by fitting local regression models. Moreover, a variable importance measure for assessing marginal and interaction effects is provided. Implements the procedures proposed by Lau et al. (2024, <doi:10.1007/s10994-023-06488-6>).
Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006) <doi:10.1198/106186006X133933>.
Implementation of a framework for cluster analysis with selection of the final number of clusters and an optional variable selection procedure. The package is designed to integrate the results of multiple imputed datasets while accounting for the uncertainty that the imputations introduce in the final results. In addition, the package can also be used for a cluster analysis of the complete cases of a single dataset. The package also includes specific methods to summarize and plot the results. The methods are described in Basagana et al. (2013) <doi:10.1093/aje/kws289>.
This package provides access to teaching materials for various statistics courses, including R and Python programs, Shiny apps, data, and PDF/HTML documents. These materials are stored on the Internet as a ZIP file (e.g., in a GitHub
repository) and can be downloaded and displayed or run locally. The content of the ZIP file is temporarily or permanently stored. By default, the package uses the GitHub
repository sigbertklinke/mmstat4.data. Additionally, the package includes association_measures.R from the archived package ryouready by Mark Heckman and some auxiliary functions.
Datasets, constants, conversion factors, and utilities for MArine', Riverine', Estuarine', LAcustrine and Coastal science. The package contains among others: (1) chemical and physical constants and datasets, e.g. atomic weights, gas constants, the earths bathymetry; (2) conversion factors (e.g. gram to mol to liter, barometric units, temperature, salinity); (3) physical functions, e.g. to estimate concentrations of conservative substances, gas transfer and diffusion coefficients, the Coriolis force and gravity; (4) thermophysical properties of the seawater, as from the UNESCO polynomial or from the more recent derivation based on a Gibbs function.
biomaRt provides an interface to a growing collection of databases implementing the http://www.biomart.org. The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. Examples of BioMart databases are Ensembl, COSMIC, Uniprot, HGNC, Gramene, Wormbase and dbSNP mapped to Ensembl. These major databases give biomaRt users direct access to a diverse set of data and enable a wide range of powerful online queries from gene annotation to database mining.
This package provides a pipeline toolkit for statistics and data science in R; the targets
package brings function-oriented programming to Make-like declarative pipelines. It orchestrates a pipeline as a graph of dependencies, skips steps that are already up to date, runs the necessary computation with optional parallel workers, abstracts files as R objects, and provides tangible evidence that the results are reproducible given the underlying code and data. The methodology in this package borrows from GNU Make (2015, ISBN:978-9881443519) and drake (2018, <doi:10.21105/joss.00550>).
Ratpoison is a simple window manager with no fat library dependencies, no fancy graphics, no window decorations, and no rodent dependence. It is largely modelled after GNU Screen which has done wonders in the virtual terminal market.
The screen can be split into non-overlapping frames. All windows are kept maximized inside their frames to take full advantage of your precious screen real estate.
All interaction with the window manager is done through keystrokes. Ratpoison has a prefix map to minimize the key clobbering that cripples Emacs and other quality pieces of software.
Contrast trees represent a new approach for assessing the accuracy of many types of machine learning estimates that are not amenable to standard (cross) validation methods; see "Contrast trees and distribution boosting", Jerome H. Friedman (2020) <doi:10.1073/pnas.1921562117>. In situations where inaccuracies are detected, boosted contrast trees can often improve performance. Functions are provided to to build such trees in addition to a special case, distribution boosting, an assumption free method for estimating the full probability distribution of an outcome variable given any set of joint input predictor variable values.
Dependency-free, ultra fast calculation of geodesic distances. Includes the reference nanometre-accuracy geodesic distances of Karney (2013) <doi:10.1007/s00190-012-0578-z>, as used by the sf package, as well as Haversine and Vincenty distances. Default distance measure is the "Mapbox cheap ruler" which is generally more accurate than Haversine or Vincenty for distances out to a few hundred kilometres, and is considerably faster. The main function accepts one or two inputs in almost any generic rectangular form, and returns either matrices of pairwise distances, or vectors of sequential distances.
Uses an approach based on k-nearest neighbor information to sequentially detect change-points. Offers analytic approximations for false discovery control given user-specified average run length. Can be applied to any type of data (high-dimensional, non-Euclidean, etc.) as long as a reasonable similarity measure is available. See references (1) Chen, H. (2019) Sequential change-point detection based on nearest neighbors. The Annals of Statistics, 47(3):1381-1407. (2) Chu, L. and Chen, H. (2018) Sequential change-point detection for high-dimensional and non-Euclidean data <arXiv:1810.05973>
.
This package provides researchers and educators with easy-to-learn user friendly tools for calculating key spatial statistics and to apply simple as well as advanced methods of spatial analysis in real data. These include: Local Pearson and Geographically Weighted Pearson Correlation Coefficients, Spatial Inequality Measures (Gini, Spatial Gini, LQ, Focal LQ), Spatial Autocorrelation (Global and Local Moran's I), several Geographically Weighted Regression techniques and other Spatial Analysis tools (other geographically weighted statistics). This package also contains functions for measuring the significance of each statistic calculated, mainly based on Monte Carlo simulations.
Utilizing model-based clustering (unsupervised) for functional magnetic resonance imaging (fMRI
) data. The developed methods (Chen and Maitra (2023) <doi:10.1002/hbm.26425>) include 2D and 3D clustering analyses (for p-values with voxel locations) and segmentation analyses (for p-values alone) for fMRI
data where p-values indicate significant level of activation responding to stimulate of interesting. The analyses are mainly identifying active voxel/signal associated with normal brain behaviors. Analysis pipelines (R scripts) utilizing this package (see examples in inst/workflow/') is also implemented with high performance techniques.
Estimates one-inflated positive Poisson (OIPP) and one-inflated zero-truncated negative binomial (OIZTNB) regression models. A suite of ancillary statistical tools are also provided, including: estimation of positive Poisson (PP) and zero-truncated negative binomial (ZTNB) models; marginal effects and their standard errors; diagnostic likelihood ratio and Wald tests; plotting; predicted counts and expected responses; and random variate generation. The models and tools, as well as four applications, are shown in Godwin, R. T. (2024). "One-inflated zero-truncated count regression models" arXiv
preprint <doi:10.48550/arXiv.2402.02272>
.
Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs pbdMPI
to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through pbdMPI
and MPI implementations such as OpenMPI
and MPICH'. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.
This package provides a collection of tools for analyzing significance of assets, funds, and trading strategies, based on the Sharpe ratio and overfit of the same. Provides density, distribution, quantile and random generation of the Sharpe ratio distribution based on normal returns, as well as the optimal Sharpe ratio over multiple assets. Computes confidence intervals on the Sharpe and provides a test of equality of Sharpe ratios based on the Delta method. The statistical foundations of the Sharpe can be found in the author's Short Sharpe Course <doi:10.2139/ssrn.3036276>.
This package provides estimation of simultaneous bootstrap and asymptotic confidence intervals for diversity indices, namely the Shannon and the Simpson index. Several pre--specified multiple comparison types are available to choose. Further user--defined contrast matrices are applicable. In addition, simboot estimates adjusted as well as unadjusted p--values for two of the three proposed bootstrap methods. Further simboot allows for comparing biological diversities of two or more groups while simultaneously testing a user-defined selection of Hill numbers of orders q, which are considered as appropriate and useful indices for measuring diversity.
MEM, Marker Enrichment Modeling, automatically generates and displays quantitative labels for cell populations that have been identified from single-cell data. The input for MEM is a dataset that has pre-clustered or pre-gated populations with cells in rows and features in columns. Labels convey a list of measured features and the features levels of relative enrichment on each population. MEM can be applied to a wide variety of data types and can compare between MEM labels from flow cytometry, mass cytometry, single cell RNA-seq, and spectral flow cytometry using RMSD.
This package provides a collection of functions designed for analyzing deconvolution of the bulk sample(s) using an atlas of reference omic signature profiles and a user-selected model. Users are given the option to create or extend a reference atlas and,also simulate the desired size of the bulk signature profile of the reference cell types. The package includes the cell-type-specific methylation atlas and, Illumina Epic B5 probe ids that can be used in deconvolution. Additionally, we included BSmeth2Probe
, to make mapping WGBS data to their probe IDs easier.