Identity by Descent (IBD) distributions in pedigrees. A Hidden Markov Model is used to compute identity coefficients, simulate IBD segments and to derive the distribution of total IBD sharing and segment count across chromosomes. The methods are applied in Kruijver (2025) <doi:10.3390/genes16050492>. The probability that the total IBD sharing is zero can be computed using the method of Donnelly (1983) <doi:10.1016/0040-5809(83)90004-7>.
This package provides a tool for optimizing scales of effect when modeling ecological processes in space. Specifically, the scale parameter of a distance-weighted kernel distribution is identified for all environmental layers included in the model. Includes functions to assist in model selection, model evaluation, efficient transformation of raster surfaces using fast Fourier transformation, and projecting models. For more details see Peterman (2025) <doi:10.21203/rs.3.rs-7246115/v1>.
Routines for PLS-based genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIP-chip analysis. The >=1.2-1 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logit-SPLS; and an adaptive version of the sparse PLS.
This package provides tools for analysing the agreement of two or more rankings of the same items. Examples are importance rankings of predictor variables and risk predictions of subjects. Benchmarks for agreement are computed based on random permutation and bootstrap. See Ekstrøm CT, Gerds TA, Jensen, AK (2018). "Sequential rank agreement methods for comparison of ranked lists." _Biostatistics_, *20*(4), 582-598 <doi:10.1093/biostatistics/kxy017> for more information.
This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.
An automated pipeline for the detection, integration and reporting of predefined features across a large number of mass spectrometry data files. It enables the real time annotation of multiple compounds in a single file, or the parallel annotation of multiple compounds in multiple files. A graphical user interface as well as command line functions will assist in assessing the quality of annotation and update fitting parameters until a satisfactory result is obtained.
This package provides tools for Bayesian basket trial design and analysis using a novel three-component local power prior framework with global borrowing control, pairwise similarity assessment and a borrowing threshold. Supports simulation-based evaluation of operating characteristics and comparison with other methods. Applicable to both equal and unequal sample size settings in early-phase oncology trials. For more details see Zhou et al. (2023) <doi:10.48550/arXiv.2312.15352>.
Sampling from the Cholesky factorization of a Wishart random variable, sampling from the inverse Wishart distribution, sampling from the Cholesky factorization of an inverse Wishart random variable, sampling from the pseudo Wishart distribution, sampling from the generalized inverse Wishart distribution, computing densities for the Wishart and inverse Wishart distributions, and computing the multivariate gamma and digamma functions. Provides a header file so the C functions can be called directly from other programs.
Built by Hodges lab members for current and future Hodges lab members. Other individuals are welcome to use as well. Provides useful functions that the lab uses everyday to analyze various genomic datasets. Critically, only general use functions are provided; functions specific to a given technique are reserved for a separate package. As the lab grows, we expect to continue adding functions to the package to build on previous lab members code.
Enhances mlexperiments <https://CRAN.R-project.org/package=mlexperiments> with additional machine learning ('ML') learners for survival analysis. The package provides R6-based survival learners for the following algorithms: glmnet <https://CRAN.R-project.org/package=glmnet>, ranger <https://CRAN.R-project.org/package=ranger>, xgboost <https://CRAN.R-project.org/package=xgboost>, and rpart <https://CRAN.R-project.org/package=rpart>. These can be used directly with the mlexperiments R package.
This package provides a collection of NASCAR race, driver, owner and manufacturer data across the three major NASCAR divisions: NASCAR Cup Series, NASCAR Xfinity Series, and NASCAR Craftsman Truck Series. The curated data begins with the 1949 season and extends through the end of the 2024 season. Explore race, season, or career performance for drivers, teams, and manufacturers throughout NASCAR's history. Data was sourced with permission from DriverAverages.com.
Given a dataset, the user is invited to utilize the Empirical Cumulative Distribution Function (ECDF) to guess interactively the mean and the mean deviation. Thereafter, using the quadratic curve the user can guess the Root Mean Squared Deviation (RMSD) and visualize the standard deviation (SD). For details, see Sarkar and Rashid (2019)<doi:10.3126/njs.v3i0.25574>, Have You Seen the Standard Deviaton?, Nepalese Journal of Statistics, Vol. 3, 1-10.
Leveraging (large) language models for automatic topic labeling. The main function converts a list of top terms into a label for each topic. Hence, it is complementary to any topic modeling package that produces a list of top terms for each topic. While human judgement is indispensable for topic validation (i.e., inspecting top terms and most representative documents), automatic topic labeling can be a valuable tool for researchers in various scenarios.
It performs the smoothing approach provided by penalized least squares for univariate and bivariate time series, as proposed by Guerrero (2007) and Gerrero et al. (2017). This allows to estimate the time series trend by controlling the amount of resulting (joint) smoothness. --- Guerrero, V.M (2007) <DOI:10.1016/j.spl.2007.03.006>. Guerrero, V.M; Islas-Camargo, A. and Ramirez-Ramirez, L.L. (2017) <DOI:10.1080/03610926.2015.1133826>.
This package provides a set of distributions which can be used for modelling the response variables in Generalized Additive Models for Location Scale and Shape. The distributions can be continuous, discrete or mixed distributions. Extra distributions can be created, by transforming, any continuous distribution defined on the real line, to a distribution defined on ranges 0 to infinity or 0 to 1, by using a log or a logit transformation, respectively.
This package provides methods to compute simultaneous prediction and confidence bands for dense time series data. The implementation builds on the functional bootstrap approach proposed by Lenhoff et al. (1999) <doi:10.1016/S0966-6362(98)00043-5> and extended by Koska et al. (2023) <doi:10.1016/j.jbiomech.2023.111506> to support both independent and clustered (hierarchical) data. Includes a simple API (see band()) and an Rcpp backend for performance.
Pairwise Hamming distances are computed between the rows of a binary (0/1) matrix using highly optimized C code. The input is an integer matrix where each row represents a binary feature vector and returns a symmetric integer matrix of pairwise distances. Internally, rows are bit-packed into 64-bit words for fast XOR-based comparisons, with hardware-accelerated popcount operations to count differences. OpenMP parallelization ensures efficient performance for large matrices.
Using hybrid data, this package created a vividly colored hybrid heat map. The input is two files which are auto-selected. The first file has three columns, the first two for pairs of species, with the third column for the hybrid experiment code (an integer). The second file is a list of code and their descriptions in two columns. The output is a figure showing the hybrid heat map with a color legend.
Code to support a systems biology research program from inception through publication. The methods focus on dimension reduction approaches to detect patterns in complex, multivariate experimental data and places an emphasis on informative visualizations. The goal for this project is to create a package that will evolve over time, thereby remaining relevant and reflective of current methods and techniques. As a result, we encourage suggested additions to the package, both methodological and graphical.
Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. MiscMetabar is mainly built on top of the phyloseq', dada2 and targets R packages. It helps to build reproducible and robust bioinformatics pipelines in R. MiscMetabar makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.
The Open University Learning Analytics Dataset (OULAD) is available from Kuzilek et al. (2017) <doi:10.1038/sdata.2017.171>. The ouladFormat package loads, cleans and formats the OULAD for data analysis (each row of the returned data set is an individual student). The packageâ s main function, combined_dataset(), allows the user to choose whether the returned data set includes assessment, demographics, virtual learning environment (VLE), or registration variables etc.
This package provides a software package help user to create virtual species for species distribution modelling. It includes several methods to help user to create virtual species distribution map. Those maps can be used for Species Distribution Modelling (SDM) study. SDM use environmental data for sites of occurrence of a species to predict all the sites where the environmental conditions are suitable for the species to persist, and may be expected to occur.
Manipulating input and output files of the STICS crop model. Files are either JavaSTICS XML files or text files used by the model fortran executable. Most basic functionalities are reading or writing parameter names and values in both XML or text input files, and getting data from output files. Advanced functionalities include XML files generation from XML templates and/or spreadsheets, or text files generation from XML files by using xslt transformation.
This package contains the experimental data and a complete executable transcript (vignette) of the statistical analysis presented in the paper "Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages" by Y. Ohnishi, W. Huber, A. Tsumura, M. Kang, P. Xenopoulos, K. Kurimoto, A. K. Oles, M. J. Arauzo-Bravo, M. Saitou, A.-K. Hadjantonakis and T. Hiiragi; Nature Cell Biology (2014) 16(1): 27-37. doi: 10.1038/ncb2881.".