For each string in a set of strings, determine a unique tag that is a substring of fixed size k unique to that string, if it has one. If no such unique substring exists, the least frequent substring is used. If multiple unique substrings exist, the lexicographically smallest substring is used. This lexicographically smallest substring of size k is called the "UniqTag
" of that string.
Comparison of variance - covariance patterns using relative principal component analysis (relative eigenanalysis), as described in Le Maitre and Mitteroecker (2019) <doi:10.1111/2041-210X.13253>. Also provides functions to compute group covariance matrices, distance matrices, and perform proportionality tests. A worked sample on the body shape of cichlid fishes is included, based on the dataset from Kerschbaumer et al. (2013) <doi:10.5061/dryad.fc02f>.
PanomiR
is a package to detect miRNAs
that target groups of pathways from gene expression data. This package provides functionality for generating pathway activity profiles, determining differentially activated pathways between user-specified conditions, determining clusters of pathways via the PCxN
package, and generating miRNAs
targeting clusters of pathways. These function can be used separately or sequentially to analyze RNA-Seq data.
The package allows for predicting whether a coiled coil sequence (amino acid sequence plus heptad register) is more likely to form a dimer or more likely to form a trimer. Additionally to the prediction itself, a prediction profile is computed which allows for determining the strengths to which the individual residues are indicative for either class. Prediction profiles can also be visualized as curves or heatmaps.
This package uses a statistical framework for rapid and accurate detection of aneuploid cells with local copy number deletion or amplification. Our method uses an EM algorithm with mixtures of Poisson distributions while incorporating cytogenetics information (e.g., regional deletion or amplification) to guide the classification (partCNV
). When applicable, we further improve the accuracy by integrating a Hidden Markov Model for feature selection (partCNVH
).
This package does k-nearest neighbor based statistics and visualizations with flow and mass cytometery data. This gives tSNE
maps"fold change" functionality and provides a data quality metric by assessing manifold overlap between fcs files expected to be the same. Other applications using this package include imputation, marker redundancy, and testing the relative information loss of lower dimension embeddings compared to the original manifold.
EBImage provides general purpose functionality for image processing and analysis. In the context of (high-throughput) microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and facilitates the use of other tools in the R environment for signal processing, statistical modeling, machine learning and visualization with image data.
Model adsorption behavior using classical isotherms, including Langmuir, Freundlich, Brunauerâ Emmettâ Teller (BET), and Temkin models. The package supports parameter estimation through both linearized and non-linear fitting techniques and generates high-quality plots for model diagnostics. It is intended for environmental scientists, chemists, and researchers working on adsorption phenomena in soils, water treatment, and material sciences. Functions are compatible with base R and ggplot2 for visualization.
Make some distributions from the C++ library Boost available in R'. In addition, the normal-inverse Gaussian distribution and the generalized inverse Gaussian distribution are provided. The distributions are represented by R6 classes. The method to sample from the generalized inverse Gaussian distribution is the one given in "Random variate generation for the generalized inverse Gaussian distribution" Luc Devroye (2012) <doi:10.1007/s11222-012-9367-z>.
This package provides functions to append confidence intervals, prediction intervals, and other quantities of interest to data frames. All appended quantities are for the response variable, after conditioning on the model and covariates. This package has a data frame first syntax that allows for easy piping. Currently supported models include (log-) linear, (log-) linear mixed, generalized linear models, generalized linear mixed models, and accelerated failure time models.
Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <doi:10.48550/arXiv.2310.15512>
.
This package provides functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Contains data restructuring functions and functions for generating biometrically informed data for kin pairs. See [Garrison and Rodgers, 2016 <doi:10.1016/j.intell.2016.08.008>], [Sims, Trattner, and Garrison, 2024 <doi:10.3389/fpsyg.2024.1430978>] for empirical examples, and Garrison et al for theoretical work <https://osf.io/zpdwt/>.
An easy package for scraping and processing Australia Rules Football (AFL) data. fitzRoy
provides a range of functions for accessing publicly available data from AFL Tables <https://afltables.com/afl/afl_index.html>, Footy Wire <https://www.footywire.com> and The Squiggle <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
This package provides a function to assess and test for heterogeneity in the utility of a surrogate marker with respect to a baseline covariate. The main function can be used for either a continuous or discrete baseline covariate. More details will be available in the future in: Parast, L., Cai, T., Tian L (2021). "Testing for Heterogeneity in the Utility of a Surrogate Marker." Biometrics, In press.
In some cases you will have data in a histogram format, where you have a vector of all possible observations, and a vector of how many times each observation appeared. You could expand this into a single 1D vector, but this may not be advisable if the counts are extremely large. HistDat
allows for the calculation of summary statistics without the need for expanding your data.
This package provides a collection of datasets and supporting functions accompanying Health Metrics and the Spread of Infectious Diseases by Federica Gazzelloni (2024). This package provides data for health metrics calculations, including Disability-Adjusted Life Years (DALYs), Years of Life Lost (YLLs), and Years Lived with Disability (YLDs), as well as additional tools for analyzing and visualizing health data. Federica Gazzelloni (2024) <doi:10.5281/zenodo.10818338>.
Fit a predictive model using iteratively reweighted boosting (IRBoost) to minimize robust loss functions within the CC-family (concave-convex). This constitutes an application of iteratively reweighted convex optimization (IRCO), where convex optimization is performed using the functional descent boosting algorithm. IRBoost assigns weights to facilitate outlier identification. Applications include robust generalized linear models and robust accelerated failure time models. Wang (2025) <doi:10.6339/24-JDS1138>.
OD-means is a hierarchical adaptive k-means algorithm based on origin-destination pairs. In the first layer of the hierarchy, the clusters are separated automatically based on the variation of the within-cluster distance of each cluster until convergence. The second layer of the hierarchy corresponds to the sub clustering process of small clusters based on the distance between the origin and destination of each cluster.
This package provides programmatic access to GitHub
API with a focus on project management. Key functionality includes setting up issues and milestones from R objects or YAML configurations, querying outstanding or completed tasks, and generating progress updates in tables, charts, and RMarkdown reports. Useful for those using GitHub
in personal, professional, or academic settings with an emphasis on streamlining the workflow of data analysis projects.
Bayesian analysis of censored linear mixed-effects models that replace Gaussian assumptions with a flexible class of distributions, such as the scale mixture of normal family distributions, considering a damped exponential correlation structure which was employed to account for within-subject autocorrelation among irregularly observed measures. For more details, see Kelin Zhong, Fernanda L. Schumacher, Luis M. Castro, Victor H. Lachos (2025) <doi:10.1002/sim.10295>.
Quality control charts for survival outcomes. Allows users to construct the Continuous Time Generalized Rapid Response CUSUM (CGR-CUSUM) <doi:10.1093/biostatistics/kxac041>, the Biswas & Kalbfleisch (2008) <doi:10.1002/sim.3216> CUSUM, the Bernoulli CUSUM and the risk-adjusted funnel plot for survival data <doi:10.1002/sim.1970>. These procedures can be used to monitor survival processes for a change in the failure rate.
This package provides a geomorphology-based hydrological modelling for transferring streamflow measurements from gauged to ungauged catchments. Inverse modelling enables to estimate net rainfall from streamflow measurements following Boudhraâ et al. (2018) <doi:10.1080/02626667.2018.1425801>. Resulting net rainfall is then estimated on the ungauged catchments by spatial interpolation in order to finally simulate streamflow following de Lavenne et al. (2016) <doi:10.1002/2016WR018716>.
This package provides a library for creating time based charts, like Gantt or timelines. Possible outputs include ggplot2 diagrams, plotly.js graphs, Highcharts.js widgets and data.frames. Results can be used in the RStudio viewer pane, in RMarkdown documents or in Shiny apps. In the interactive outputs created by vistime()
and hc_vistime()
, you can interact with the plot using mouse hover or zoom.
This package provides methods for the nalysis of data from clinical proteomic profiling studies. The focus is on the studies of human subjects, which are often observational case-control by design and have technical replicates. A method for sample size determination for planning these studies is proposed. It incorporates routines for adjusting for the expected heterogeneities and imbalances in the data and the within-sample replicate correlations.