This package uses a statistical framework for rapid and accurate detection of aneuploid cells with local copy number deletion or amplification. Our method uses an EM algorithm with mixtures of Poisson distributions while incorporating cytogenetics information (e.g., regional deletion or amplification) to guide the classification (partCNV). When applicable, we further improve the accuracy by integrating a Hidden Markov Model for feature selection (partCNVH).
This package does k-nearest neighbor based statistics and visualizations with flow and mass cytometery data. This gives tSNE maps"fold change" functionality and provides a data quality metric by assessing manifold overlap between fcs files expected to be the same. Other applications using this package include imputation, marker redundancy, and testing the relative information loss of lower dimension embeddings compared to the original manifold.
This tool takes longitudinal dataset as input and analyzes if there is significant change of the features over time (a proxy for treatments), while detects and controls for covariates simultaneously. LongDat is able to take in several data types as input, including count, proportion, binary, ordinal and continuous data. The output table contains p values, effect sizes and covariates of each feature, making the downstream analysis easy.
Model adsorption behavior using classical isotherms, including Langmuir, Freundlich, Brunauerâ Emmettâ Teller (BET), and Temkin models. The package supports parameter estimation through both linearized and non-linear fitting techniques and generates high-quality plots for model diagnostics. It is intended for environmental scientists, chemists, and researchers working on adsorption phenomena in soils, water treatment, and material sciences. Functions are compatible with base R and ggplot2 for visualization.
Make some distributions from the C++ library Boost available in R'. In addition, the normal-inverse Gaussian distribution and the generalized inverse Gaussian distribution are provided. The distributions are represented by R6 classes. The method to sample from the generalized inverse Gaussian distribution is the one given in "Random variate generation for the generalized inverse Gaussian distribution" Luc Devroye (2012) <doi:10.1007/s11222-012-9367-z>.
Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <doi:10.48550/arXiv.2310.15512>.
This package provides a flexible interface for interacting with Large Language Model ('LLM') providers including OpenAI', Groq', Anthropic', DeepSeek', DashScope', Gemini', Grok and GitHub Models'. Supports both synchronous and asynchronous chat-completion APIs, with features such as retry logic, dynamic model selection, customizable parameters, and multi-message conversation handling. Designed to streamline integration with state-of-the-art LLM services across multiple platforms.
This package provides functions to append confidence intervals, prediction intervals, and other quantities of interest to data frames. All appended quantities are for the response variable, after conditioning on the model and covariates. This package has a data frame first syntax that allows for easy piping. Currently supported models include (log-) linear, (log-) linear mixed, generalized linear models, generalized linear mixed models, and accelerated failure time models.
This package provides functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Contains data restructuring functions and functions for generating biometrically informed data for kin pairs. See [Garrison and Rodgers, 2016 <doi:10.1016/j.intell.2016.08.008>], [Sims, Trattner, and Garrison, 2024 <doi:10.3389/fpsyg.2024.1430978>] for empirical examples, and Garrison and colleagues for theoretical work <https://osf.io/zpdwt/>.
An easy package for scraping and processing Australia Rules Football (AFL) data. fitzRoy provides a range of functions for accessing publicly available data from AFL Tables <https://afltables.com/afl/afl_index.html>, Footy Wire <https://www.footywire.com> and The Squiggle <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
This package provides tools for fitting sparse generalised linear mixed models with l0 regularisation. Selects fixed and random effects under the hierarchy constraint that fixed effects must precede random effects. Uses coordinate descent and local search algorithms to rapidly deliver near-optimal estimates. Gaussian and binomial response families are currently supported. For more details see Thompson, Wand, and Wang (2025) <doi:10.48550/arXiv.2506.20425>.
This package provides a function to assess and test for heterogeneity in the utility of a surrogate marker with respect to a baseline covariate. The main function can be used for either a continuous or discrete baseline covariate. More details will be available in the future in: Parast, L., Cai, T., Tian L (2021). "Testing for Heterogeneity in the Utility of a Surrogate Marker." Biometrics, In press.
This package provides a collection of datasets and supporting functions accompanying Health Metrics and the Spread of Infectious Diseases by Federica Gazzelloni (2024). This package provides data for health metrics calculations, including Disability-Adjusted Life Years (DALYs), Years of Life Lost (YLLs), and Years Lived with Disability (YLDs), as well as additional tools for analyzing and visualizing health data. Federica Gazzelloni (2024) <doi:10.5281/zenodo.10818338>.
In some cases you will have data in a histogram format, where you have a vector of all possible observations, and a vector of how many times each observation appeared. You could expand this into a single 1D vector, but this may not be advisable if the counts are extremely large. HistDat allows for the calculation of summary statistics without the need for expanding your data.
Fit a predictive model using iteratively reweighted boosting (IRBoost) to minimize robust loss functions within the CC-family (concave-convex). This constitutes an application of iteratively reweighted convex optimization (IRCO), where convex optimization is performed using the functional descent boosting algorithm. IRBoost assigns weights to facilitate outlier identification. Applications include robust generalized linear models and robust accelerated failure time models. Wang (2025) <doi:10.6339/24-JDS1138>.
This package provides a toolbox for calculating continuous norms for psychological tests, where the norms can be age-dependent. The norms are based Generalized Additive Models for Location, Scale, and Shape (GAMLSS) for the test scores in the normative sample. The package includes functions for model selection, reliability estimation, and calculating norms, including confidence intervals. For more details, see Timmerman et al. (2021) <doi:10.1037/met0000348>.
OD-means is a hierarchical adaptive k-means algorithm based on origin-destination pairs. In the first layer of the hierarchy, the clusters are separated automatically based on the variation of the within-cluster distance of each cluster until convergence. The second layer of the hierarchy corresponds to the sub clustering process of small clusters based on the distance between the origin and destination of each cluster.
This package provides programmatic access to GitHub API with a focus on project management. Key functionality includes setting up issues and milestones from R objects or YAML configurations, querying outstanding or completed tasks, and generating progress updates in tables, charts, and RMarkdown reports. Useful for those using GitHub in personal, professional, or academic settings with an emphasis on streamlining the workflow of data analysis projects.
Quality control charts for survival outcomes. Allows users to construct the Continuous Time Generalized Rapid Response CUSUM (CGR-CUSUM) <doi:10.1093/biostatistics/kxac041>, the Biswas & Kalbfleisch (2008) <doi:10.1002/sim.3216> CUSUM, the Bernoulli CUSUM and the risk-adjusted funnel plot for survival data <doi:10.1002/sim.1970>. These procedures can be used to monitor survival processes for a change in the failure rate.
Bayesian analysis of censored linear mixed-effects models that replace Gaussian assumptions with a flexible class of distributions, such as the scale mixture of normal family distributions, considering a damped exponential correlation structure which was employed to account for within-subject autocorrelation among irregularly observed measures. For more details, see Kelin Zhong, Fernanda L. Schumacher, Luis M. Castro, Victor H. Lachos (2025) <doi:10.1002/sim.10295>.
Estimators for semi-parametric linear regression models with truncated response variables (fixed truncation point). The estimators implemented are the Symmetrically Trimmed Least Squares (STLS) estimator introduced by Powell (1986) <doi:10.2307/1914308>, the Quadratic Mode (QME) estimator introduced by Lee (1993) <doi:10.1016/0304-4076(93)90056-B>, and the Left Truncated (LT) estimator introduced by Karlsson (2006) <doi:10.1007/s00184-005-0023-x>.
This package provides a geomorphology-based hydrological modelling for transferring streamflow measurements from gauged to ungauged catchments. Inverse modelling enables to estimate net rainfall from streamflow measurements following Boudhraâ et al. (2018) <doi:10.1080/02626667.2018.1425801>. Resulting net rainfall is then estimated on the ungauged catchments by spatial interpolation in order to finally simulate streamflow following de Lavenne et al. (2016) <doi:10.1002/2016WR018716>.
This package provides a library for creating time based charts, like Gantt or timelines. Possible outputs include ggplot2 diagrams, plotly.js graphs, Highcharts.js widgets and data.frames. Results can be used in the RStudio viewer pane, in RMarkdown documents or in Shiny apps. In the interactive outputs created by vistime() and hc_vistime(), you can interact with the plot using mouse hover or zoom.
This package provides methods for the nalysis of data from clinical proteomic profiling studies. The focus is on the studies of human subjects, which are often observational case-control by design and have technical replicates. A method for sample size determination for planning these studies is proposed. It incorporates routines for adjusting for the expected heterogeneities and imbalances in the data and the within-sample replicate correlations.