Based on large margin principle, this package performs feature selection methods: "IM4E"(Iterative Margin-Maximization under Max-Min Entropy Algorithm); "Immigrate"(Iterative Max-Min Entropy Margin-Maximization with Interaction Terms Algorithm); "BIM"(Boosted version of IMMIGRATE algorithm); "Simba"(Iterative Search Margin Based Algorithm); "LFE"(Local Feature Extraction Algorithm). This package also performs prediction for the above feature selection methods.
This package provides a graphical user interface to apply an advanced method optimization algorithm to various sampling and analysis instruments. This includes generating experimental designs, uploading and viewing data, and performing various analyses to determine the optimal method. Details of the techniques used in this package are published in Gamble, Granger, & Mannion (2024) <doi:10.1021/acs.analchem.3c05763>.
Fit Maximum Entropy Optimality Theory models to data sets, generate the predictions made by such models for novel data, and compare the fit of different models using a variety of metrics. The package is described in Mayer, C., Tan, A., Zuraw, K. (in press) <https://sites.socsci.uci.edu/~cjmayer/papers/cmayer_et_al_maxent_ot_accepted.pdf>.
Multi-Dimensional Analysis (MDA) is an adaptation of factor analysis developed by Douglas Biber (1992) <doi:10.1007/BF00136979>. Its most common use is to describe language as it varies by genre, register, and use. This package contains functions for carrying out the calculations needed to describe and plot MDA results: dimension scores, dimension means, and factor loadings.
Implementation of the Monothetic Clustering algorithm (Chavent, 1998 <doi:10.1016/S0167-8655(98)00087-7>) on continuous data sets. A lot of extensions are included in the package, including applying Monothetic clustering on data sets with circular variables, visualizations with the results, and permutation and cross-validation based tests to support the decision on the number of clusters.
Pipeline for Genome-Wide Association Study using Multi-Locus Mixed Model from Segura V, Vilhjálmsson BJ et al. (2012) <doi:10.1038/ng.2314>. The pipeline include detection of associated SNPs with MLMM, model selection by lowest eBIC and p-value threshold, estimation of the effects of the SNPs in the selected model and graphical functions.
The Prize-Collecting Steiner Tree problem asks to find a subgraph connecting a given set of vertices with the most expensive nodes and least expensive edges. Since it is proven to be NP-hard, exact and efficient algorithm does not exist. This package provides convenient functionality for obtaining an approximate solution to this problem using loopy belief propagation algorithm.
Matches cases to controls based on genotype principal components (PC). In order to produce better results, matches are based on the weighted distance of PCs where the weights are equal to the % variance explained by that PC. A weighted Mahalanobis distance metric (Kidd et al. (1987) <DOI:10.1016/0031-3203(87)90066-5>) is used to determine matches.
Personalize drug regimens using individual pharmacokinetic (PK) and pharmacokinetic-pharmacodynamic (PK-PD) profiles. By combining therapeutic drug monitoring (TDM) data with a population model, posologyr offers accurate posterior estimates and helps compute optimal individualized dosing regimens. The empirical Bayes estimates are computed following the method described by Kang et al. (2012) <doi:10.4196/kjpp.2012.16.2.97>.
Modifies the distance matrix obtained from data with batch effects, so as to improve the performance of sample pattern detection, such as clustering, dimension reduction, and construction of networks between subjects. The method has been published in Bioinformatics (Fei et al, 2018, <doi:10.1093/bioinformatics/bty117>). Also available on GitHub <https://github.com/tengfei-emory/QuantNorm>.
This package implements Bayesian inference in accelerated failure time (AFT) models for right-censored survival times assuming a log-logistic distribution. Details of the variational Bayes algorithms, with and without shared frailty, are described in Xian et al. (2024) <doi:10.1007/s11222-023-10365-6> and Xian et al. (2024) <doi:10.48550/arXiv.2408.00177>, respectively.
An interface to access data from Substack publications via API. Users can fetch the latest, top, search for specific posts, or retrieve a single post by its slug. This functionality is useful for developers and researchers looking to analyze Substack content or integrate it into their applications. For more information, visit the API documentation at <https://substackapi.dev/introduction>.
Estimation of mean squared prediction error of a small area predictor is provided. In particular, the recent method of Simple, Unified, Monte-Carlo Assisted approach for the mean squared prediction error estimation of small area predictor is provided. We also provide other existing methods of mean squared prediction error estimation such as jackknife method for the mixed logistic model.
Simplicially constrained regression models for proportions in both sides. The constraint is always that the betas are non-negative and sum to 1. References: Iverson S.J.., Field C., Bowen W.D. and Blanchard W. (2004) "Quantitative Fatty Acid Signature Analysis: A New Method of Estimating Predator Diets". Ecological Monographs, 74(2): 211-235. <doi:10.1890/02-4105>.
This package provides indices and tools for directed acyclic graphs (DAGs), particularly DAG representations of intermittent streams. A detailed introduction to the package can be found in the publication: "Non-perennial stream networks as directed acyclic graphs: The R-package streamDAG" (Aho et al., 2023) <doi:10.1016/j.envsoft.2023.105775>, and in the introductory package vignette.
This package provides functions for estimating times of common ancestry and molecular clock rates of evolution using a variety of evolutionary models, parametric and nonparametric bootstrap confidence intervals, methods for detecting outlier lineages, root-to-tip regression, and a statistical test for selecting molecular clock models. For more details see Volz and Frost (2017) <doi:10.1093/ve/vex025>.
Data structures and methods to work with web tracking data. The functions cover data preprocessing steps, enriching web tracking data with external information and methods for the analysis of digital behavior as used in several academic papers (e.g., Clemm von Hohenberg et al., 2023 <doi:10.17605/OSF.IO/M3U9P>; Stier et al., 2022 <doi:10.1017/S0003055421001222>).
This package provides an R API and htmlwidget facilitating interactive visualization of spatial single-cell data with Vitessce. The R API contains classes and functions for loading single-cell data stored in compatible on-disk formats. The htmlwidget is a wrapper around the Vitessce JavaScript library and can be used in the Viewer tab of RStudio or Shiny apps.
The ACE file format is used in genomics to store contigs from sequencing machines. This tools converts it into FASTQ format. Both formats contain the sequence characters and their corresponding quality information. Unlike the FASTQ file, the ACE file stores the quality values numerically. The conversion algorithm uses the standard Sanger formula. The package facilitates insertion into pipelines, and content inspection.
This package provides a collection of tools to evaluate probability density functions, cumulative distribution functions, quantile functions and random numbers for truncated random variables. These functions are provided to also compute the expected value and variance. Q-Q plots can be produced. All the probability functions in the stats, stats4 and evd packages are automatically available for truncation.
DoReMiTra is an R data package providing access to curated transcriptomic datasets related to blood radiation, with a focus on neutron, x-ray, and gamma ray studies. It is designed to facilitate radiation biology research and support data exploration and reproducibility in radiation transcriptomics. All datasets are provided as SummarizedExperiment objects, allowing seamless integration with the Bioconductor ecosystem.
The complexity of high-throughput quantitative omics experiments often leads to low replicates numbers and many missing values. We implemented a new test to simultaneously consider missing values and quantitative changes, which we combined with well-performing statistical tests for high confidence detection of differentially regulated features. The package contains functions to run the test and to visualize the results.
Pathifier is an algorithm that infers pathway deregulation scores for each tumor sample on the basis of expression data. This score is determined, in a context-specific manner, for every particular dataset and type of cancer that is being investigated. The algorithm transforms gene-level information into pathway-level information, generating a compact and biologically relevant representation of each sample.
Obtain overlapping clustering models for object-by-variable data matrices using the Additive Profile Clustering (ADPROCLUS) method. Also contains the low dimensional ADPROCLUS method for simultaneous dimension reduction and overlapping clustering. For reference see Depril, Van Mechelen, Mirkin (2008) <doi:10.1016/j.csda.2008.04.014> and Depril, Van Mechelen, Wilderjans (2012) <doi:10.1007/s00357-012-9112-5>.