This package implements an approximate string matching version of R's native match function. It can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal string alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences.
The Racket BC (``before Chez'' or ``bytecode'') implementation was the default before Racket 8.0. It uses a compiler written in C targeting architecture-independent bytecode, plus a JIT compiler on most platforms. Racket BC has a different C API than the current default runtime system, Racket CS (based on ``Chez Scheme'').
This package is the normal implementation of Racket BC with a precise garbage collector, 3M (``Moving Memory Manager'').
The fonts provide uppercase formal script letters for use as symbols in scientific and mathematical typesetting (in contrast to the informal script fonts such as that used for the calligraphic symbols in the TeX maths symbol font). The fonts are provided as Metafont source, and as derived Adobe Type 1 format. LaTeX support, for using these fonts in mathematics, is available via one of the packages calrsfs and mathrsfs.
This package implements inferential methods to compare gene lists in terms of their biological meaning as expressed in the GO. The compared gene lists are characterized by cross-tabulation frequency tables of enriched GO items. Dissimilarity between gene lists is evaluated using the Sorensen-Dice index. The fundamental guiding principle is that two gene lists are taken as similar if they share a great proportion of common enriched GO items.
This package provides a set of tools for performing graph theory analysis of brain MRI data. It works with data from a Freesurfer analysis (cortical thickness, volumes, local gyrification index, surface area), diffusion tensor tractography data (e.g., from FSL) and resting-state fMRI data (e.g., from DPABI). It contains a graphical user interface for graph visualization and data exploration, along with several functions for generating useful figures.
This package implements the count splitting methodology from Neufeld et al. (2022) <doi:10.1093/biostatistics/kxac047> and Neufeld et al. (2023) <arXiv:2307.12985>. Intended for turning a matrix of single-cell RNA sequencing counts, or similar count datasets, into independent folds that can be used for training/testing or cross validation. Assumes that the entries in the matrix are from a Poisson or a negative binomial distribution.
Example data sets to run the example problems from causal inference textbooks. Currently, contains data sets for Huntington-Klein, Nick (2021 and 2025) "The Effect" <https://theeffectbook.net>, first and second edition, Cunningham, Scott (2021 and 2025, ISBN-13: 978-0-300-25168-5) "Causal Inference: The Mixtape", and Hernán, Miguel and James Robins (2020) "Causal Inference: What If" <https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/>.
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Generates (U,W) mixture graphs where U is a line graph graphon and W is a dense graphon. Graphons are graph limits and graphon U can be written as sequence of positive numbers adding to 1. Graphs are sampled from U and W and joined randomly to obtain the mixture graph. Given a mixture graph, U can be inferred. Kandanaarachchi and Ong (2025) <doi:10.48550/arXiv.2505.13864>.
This package provides tools for plotting gene clusters and transcripts by importing data from GenBank, FASTA, and GFF files. It performs BLASTP and MUMmer alignments [Altschul et al. (1990) <doi:10.1016/S0022-2836(05)80360-2>; Delcher et al. (1999) <doi:10.1093/nar/27.11.2369>] and displays results on gene arrow maps. Extensive customization options are available, including legends, labels, annotations, scales, colors, tooltips, and more.
Reads annual financial reports including assets, liabilities, dividends history, stockholder composition and much more from Bovespa's DFP, FRE and FCA systems <http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm>. These are web based interfaces for all financial reports of companies traded at Bovespa. The package is specially designed for large scale data importation, keeping a tabular (long) structure for easier processing.
Computes individual causes of death and population cause-specific mortality fractions using the InSilicoVA algorithm from McCormick et al. (2016) <DOI:10.1080/01621459.2016.1152191>. It uses data derived from verbal autopsy (VA) interviews, in a format similar to the input of the widely used InterVA method. This package provides general model fitting and customization for InSilicoVA algorithm and basic graphical visualization of the output.
This package implements differential methylation region (DMR) detection using a multistage Markov chain Monte Carlo (MCMC) algorithm based on the alpha-skew generalized normal (ASGN) distribution. Version 0.2.0 removes the Anderson-Darling test stage, improves computational efficiency of the core ASGN and multistage MCMC routines, and adds convenience functions for summarizing and visualizing detected DMRs. The methodology is based on Yang (2025) <https://www.proquest.com/docview/3218878972>.
Multiple tools are now available for inferring the personalised germ line set from an adaptive immune receptor repertoire. Output from these tools is converted to a single format and supplemented with rich data such as usage and characterisation of novel germ line alleles. This data can be particularly useful when considering the validity of novel inferences. Use of the analysis provided is described in <doi:10.3389/fimmu.2019.00435>.
Design and analysis of confirmatory adaptive clinical trials using the optimal conditional error framework according to Brannath and Bauer (2004) <doi:10.1111/j.0006-341X.2004.00221.x>. An extension to the optimal conditional error function using interim estimates as described in Brannath and Dreher (2024) <doi:10.48550/arXiv.2402.00814> and functions to ensure that the resulting conditional error function is non-increasing are also available.
Web application using shiny for the SSD (Species Sensitivity Distribution) module of the MOSAIC (MOdeling and StAtistical tools for ecotoxICology) platform. It estimates the Hazardous Concentration for x% of the species (HCx) from toxicity values that can be censored and provides various plotting options for a better understanding of the results. See our companion paper Kon Kam King et al. (2014) <doi:10.48550/arXiv.1311.5772>.
Tidal analysis of evenly spaced observed time series (time step 1 to 60 min) with or without shorter gaps using the harmonic representation of inequalities. The analysis should preferably cover an observation period of at least 19 years. For shorter periods low frequency constituents are not taken into account, in accordance with the Rayleigh-Criterion. The main objective of this package is to synthesize or predict a tidal time series.
This package provides tools for the statistical analysis of regular vine copula models, see Aas et al. (2009) <doi:10.1016/j.insmatheco.2007.02.001> and Dissman et al. (2013) <doi:10.1016/j.csda.2012.08.010>. The package includes tools for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.
Group-Lasso INTERaction-NET. Fits linear pairwise-interaction models that satisfy strong hierarchy: if an interaction coefficient is estimated to be nonzero, then its two associated main effects also have nonzero estimated coefficients. Accommodates categorical variables (factors) with arbitrary numbers of levels, continuous variables, and combinations thereof. Implements the machinery described in the paper "Learning interactions via hierarchical group-lasso regularization" (JCGS 2015, Volume 24, Issue 3). Michael Lim & Trevor Hastie (2015)
Rustic is a fork of Rust mode. In addition to its predecessor, it offers the following features:
Flycheck integration,
Cargo popup,
multiline error parsing,
translation of ANSI control sequences through XTerm color,
asynchronous Org Babel,
custom compilation process,
rustfmterrors in a Rust compilation mode,automatic LSP configuration with Eglot or LSP mode,
optional Rust inline documentation,
etc.
Generates DNA sequences based on Markov model techniques for matched sequences. This can be generalized to several sequences. The sequences (taxa) are then arranged in an evolutionary tree (phylogenetic tree) depicting how taxa diverge from their common ancestors. This gives the tests and estimation methods for the parameters of different models. Standard phylogenetic methods assume stationarity, homogeneity and reversibility for the Markov processes, and often impose further restrictions on the parameters.
Offers meta programming style tools to generate configurable R functions that produce HTML forms based on table input and SQL meta data. Also generates functions for collecting the parameters of those HTML forms after they are submitted. Useful for quickly generating HTML forms based on existing SQL tables. To use the resultant functions, the output files containing those functions must be read into the R environment (perhaps using base::source()).
This package implements fast, scalable optimization algorithms for fitting generalized principal components analysis (GLM-PCA) models, as described in "A Generalization of Principal Components Analysis to the Exponential Family" Collins M, Dasgupta S, Schapire RE (2002, ISBN:9780262271738), and subsequently "Feature Selection and Dimension Reduction for Single-Cell RNA-Seq Based on a Multinomial Model" Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) <doi:10.1186/s13059-019-1861-6>.
Given a landscape resistance surface, creates minimum planar graph (Fall et al. (2007) <doi:10.1007/s10021-007-9038-7>) and grains of connectivity (Galpern et al. (2012) <doi:10.1111/j.1365-294X.2012.05677.x>) models that can be used to calculate effective distances for landscape connectivity at multiple scales. Documentation is provided by several vignettes, and a paper (Chubaty, Galpern & Doctolero (2020) <doi:10.1111/2041-210X.13350>).