Generates and predicts a set of linearly stacked Random Forest models using bootstrap sampling. Individual datasets may be heterogeneous (not all samples have full sets of features). Contains support for parallelization but the user should register their cores before running. This is an extension of the method found in Matlock (2018) <doi:10.1186/s12859-018-2060-2>.
Spatial clustering with hidden markov random field fitted via EM algorithm, details of which can be found in Yi Yang (2021) <doi:10.1101/2021.06.05.447181>. It is not only computationally efficient and scalable to the sample size increment, but also is capable of choosing the smoothness parameter and the number of clusters as well.
Unobserved components time series model using the linear innovations state space representation (single source of error) with choice of error distributions and option for dynamic variance. Methods for estimation using automatic differentiation, automatic model selection and ensembling, prediction, filtering, simulation and backtesting. Based on the model described in Hyndman et al (2012) <doi:10.1198/jasa.2011.tm09771>.
This package implements the TabNet
model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442>
with Coherent Hierarchical Multi-label Classification Networks by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151>
and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the tidymodels ecosystem.
The BACON algorithms are methods for multivariate outlier nomination (detection) and robust linear regression by Billor, Hadi, and Velleman (2000) <doi:10.1016/S0167-9473(99)00101-2>. The extension to weighted problems is due to Beguin and Hulliger (2008) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200800110616>; see also <doi:10.21105/joss.03238>.
MEIGOR provides a comprehensive environment for performing global optimization tasks in bioinformatics and systems biology. It leverages advanced metaheuristic algorithms to efficiently search the solution space and is specifically tailored to handle the complexity and high-dimensionality of biological datasets. This package supports various optimization routines and is integrated with Bioconductor's infrastructure for a seamless analysis workflow.
An R package for multiple-group comparison to detect tissue/cell-specific marker genes among subtypes. It provides functions to compute OVESEG-test statistics, derive component weights in the mixture null distribution model and estimate p-values from weightedly aggregated permutations. Obtained posterior probabilities of component null hypotheses can also portrait all kinds of upregulation patterns among subtypes.
This package provides a robust and outlier-aware method for testing differences in cell-type proportion in single-cell data. This model can infer changes in tissue composition and heterogeneity, and can produce realistic data simulations based on any existing dataset. This model can also transfer knowledge from a large set of integrated datasets to increase accuracy further.
r128gain is a multi platform command line tool to scan your audio files and tag them with loudness metadata (ReplayGain v2 or Opus R128 gain format), to allow playback of several tracks or albums at a similar loudness level. r128gain can also be used as a Python module from other Python projects to scan and/or tag audio files.
This R package lets you estimate signatures of mutational processes and their activities on mutation count data. Starting from a set of single-nucleotide variants (SNVs), it allows both estimation of the exposure of samples to predefined mutational signatures (including whether the signatures are present at all), and identification of signatures de novo from the mutation counts.
Nucleotide conversion sequencing experiments have been developed to add a temporal dimension to RNA-seq and single-cell RNA-seq. Such experiments require specialized tools for primary processing such as GRAND-SLAM, and specialized tools for downstream analyses. grandR
provides a comprehensive toolbox for quality control, kinetic modeling, differential gene expression analysis and visualization of such data.
This package provides tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding.
Redshift adjusts the color temperature according to the position of the sun. A different color temperature is set during night and daytime. During twilight and early morning, the color temperature transitions smoothly from night to daytime temperature to allow your eyes to slowly adapt. At night the color temperature should be set to match the lamps in your room.
Analysis of DNA mixtures involving relatives by computation of likelihood ratios that account for dropout and drop-in, mutations, silent alleles and population substructure. This is useful in kinship cases, like non-invasive prenatal paternity testing, where deductions about individuals relationships rely on DNA mixtures, and in criminal cases where the contributors to a mixed DNA stain may be related. Relationships are represented by pedigrees and can include kinship between more than two individuals. The main function is relMix()
and its graphical user interface relMixGUI()
. The implementation and method is described in Dorum et al. (2017) <doi:10.1007/s00414-016-1526-x>, Hernandis et al. (2019) <doi:10.1016/j.fsigss.2019.09.085> and Kaur et al. (2016) <doi:10.1007/s00414-015-1276-1>.
Interact with Condor from R via SSH connection. Files are first uploaded from user machine to submitter machine, and the job is then submitted from the submitter machine to Condor'. Functions are provided to submit, list, and download Condor jobs from R. Condor is an open source high-throughput computing software framework for distributed parallelization of computationally intensive tasks.
This package contains tools for working with data during statistical analysis, promoting flexible, intuitive, and reproducible workflows. There are functions designated for specific statistical tasks such building a custom univariate descriptive table, computing pairwise association statistics, etc. These are built on a collection of data manipulation tools designed for general use that are motivated by the functional programming concept.
This package implements a kernel-based association test for copy number variation (CNV) aggregate analysis in a certain genomic region (e.g., gene set, chromosome, or genome) that is robust to the within-locus and across-locus etiological heterogeneity, and bypass the need to define a "locus" unit for CNVs. Brucker, A., et al. (2020) <doi:10.1101/666875>.
This package produces diversity estimates and species lists with associated global distribution for any vascular plant family and genus from Plants of the World Online database <https://powo.science.kew.org/>, by interacting with the source code of each plant taxon page. It also creates global maps of species richness, graphics of species discoveries and nomenclatural changes over time.
This package contains functions for operations with fuzzy cognitive maps using t-norm and s-norm operators. T-norms and S-norms are described by Dov M. Gabbay and George Metcalfe (2007) <doi:10.1007/s00153-007-0047-1>. System indicators are described by Cox, Earl D. (1995) <isbn:1886801010>. Executable examples are provided in the "inst/examples" folder.
This Rcpp'-based package implements highly efficient functions for the calculation of the Jonckheere-Terpstra statistic. It can be used for a variety of applications, including feature selection in machine learning problems, or to conduct genome-wide association studies (GWAS) with multiple quantitative phenotypes. The code leverages OpenMP
directives for multi-core computing to reduce overall processing time.
This package contains Rcpp and RcppEigen
implementations of matrix operations useful for Gaussian process models, such as the inversion of a symmetric Toeplitz matrix, sampling from multivariate normal distributions, evaluation of the log-density of a multivariate normal vector, and Bayesian inference for latent variable Gaussian process models with elliptical slice sampling (Murray, Adams, and MacKay
2010).
Approximate frequentist inference for generalized linear mixed model analysis with expectation propagation used to circumvent the need for multivariate integration. In this version, the random effects can be any reasonable dimension. However, only probit mixed models with one level of nesting are supported. The methodology is described in Hall, Johnstone, Ormerod, Wand and Yu (2018) <arXiv:1805.08423v1>
.
An RStudio Addin for Hippie Expand (AKA Hippie Code Completion or Cyclic Expand Word). This type of completion searches for matching tokens within the user's current source editor file, regardless of file type. By searching only within the current source file, hippie offers a fast way to identify and insert completions that appear around the user's cursor.
H-index and h-alpha are a bibliometric indicators. This package provides functions to simulate how these indicators may develop over time for a given set of researchers and to visualize the simulation data. The implementation is based on the STATA ado h-index and is described in more detail in Bornmann et al. (2019) <arXiv:1905.11052>
.