Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <doi:10.48550/arXiv.2310.15512>
.
This package provides functions to append confidence intervals, prediction intervals, and other quantities of interest to data frames. All appended quantities are for the response variable, after conditioning on the model and covariates. This package has a data frame first syntax that allows for easy piping. Currently supported models include (log-) linear, (log-) linear mixed, generalized linear models, generalized linear mixed models, and accelerated failure time models.
This package provides functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Contains data restructuring functions and functions for generating biometrically informed data for kin pairs. See [Garrison and Rodgers, 2016 <doi:10.1016/j.intell.2016.08.008>], [Sims, Trattner, and Garrison, 2024 <doi:10.3389/fpsyg.2024.1430978>] for empirical examples, and Garrison et al for theoretical work <https://osf.io/zpdwt/>.
An easy package for scraping and processing Australia Rules Football (AFL) data. fitzRoy
provides a range of functions for accessing publicly available data from AFL Tables <https://afltables.com/afl/afl_index.html>, Footy Wire <https://www.footywire.com> and The Squiggle <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
This package provides a collection of datasets and supporting functions accompanying Health Metrics and the Spread of Infectious Diseases by Federica Gazzelloni (2024). This package provides data for health metrics calculations, including Disability-Adjusted Life Years (DALYs), Years of Life Lost (YLLs), and Years Lived with Disability (YLDs), as well as additional tools for analyzing and visualizing health data. Federica Gazzelloni (2024) <doi:10.5281/zenodo.10818338>.
In some cases you will have data in a histogram format, where you have a vector of all possible observations, and a vector of how many times each observation appeared. You could expand this into a single 1D vector, but this may not be advisable if the counts are extremely large. HistDat
allows for the calculation of summary statistics without the need for expanding your data.
This package provides a function to assess and test for heterogeneity in the utility of a surrogate marker with respect to a baseline covariate. The main function can be used for either a continuous or discrete baseline covariate. More details will be available in the future in: Parast, L., Cai, T., Tian L (2021). "Testing for Heterogeneity in the Utility of a Surrogate Marker." Biometrics, In press.
Fit a predictive model using iteratively reweighted boosting (IRBoost) to minimize robust loss functions within the CC-family (concave-convex). This constitutes an application of iteratively reweighted convex optimization (IRCO), where convex optimization is performed using the functional descent boosting algorithm. IRBoost assigns weights to facilitate outlier identification. Applications include robust generalized linear models and robust accelerated failure time models. Wang (2025) <doi:10.6339/24-JDS1138>.
OD-means is a hierarchical adaptive k-means algorithm based on origin-destination pairs. In the first layer of the hierarchy, the clusters are separated automatically based on the variation of the within-cluster distance of each cluster until convergence. The second layer of the hierarchy corresponds to the sub clustering process of small clusters based on the distance between the origin and destination of each cluster.
This package provides programmatic access to GitHub
API with a focus on project management. Key functionality includes setting up issues and milestones from R objects or YAML configurations, querying outstanding or completed tasks, and generating progress updates in tables, charts, and RMarkdown reports. Useful for those using GitHub
in personal, professional, or academic settings with an emphasis on streamlining the workflow of data analysis projects.
Bayesian analysis of censored linear mixed-effects models that replace Gaussian assumptions with a flexible class of distributions, such as the scale mixture of normal family distributions, considering a damped exponential correlation structure which was employed to account for within-subject autocorrelation among irregularly observed measures. For more details, see Kelin Zhong, Fernanda L. Schumacher, Luis M. Castro, Victor H. Lachos (2025) <doi:10.1002/sim.10295>.
Quality control charts for survival outcomes. Allows users to construct the Continuous Time Generalized Rapid Response CUSUM (CGR-CUSUM) <doi:10.1093/biostatistics/kxac041>, the Biswas & Kalbfleisch (2008) <doi:10.1002/sim.3216> CUSUM, the Bernoulli CUSUM and the risk-adjusted funnel plot for survival data <doi:10.1002/sim.1970>. These procedures can be used to monitor survival processes for a change in the failure rate.
This package provides a geomorphology-based hydrological modelling for transferring streamflow measurements from gauged to ungauged catchments. Inverse modelling enables to estimate net rainfall from streamflow measurements following Boudhraâ et al. (2018) <doi:10.1080/02626667.2018.1425801>. Resulting net rainfall is then estimated on the ungauged catchments by spatial interpolation in order to finally simulate streamflow following de Lavenne et al. (2016) <doi:10.1002/2016WR018716>.
This package provides a library for creating time based charts, like Gantt or timelines. Possible outputs include ggplot2 diagrams, plotly.js graphs, Highcharts.js widgets and data.frames. Results can be used in the RStudio viewer pane, in RMarkdown documents or in Shiny apps. In the interactive outputs created by vistime()
and hc_vistime()
, you can interact with the plot using mouse hover or zoom.
This package provides methods for the nalysis of data from clinical proteomic profiling studies. The focus is on the studies of human subjects, which are often observational case-control by design and have technical replicates. A method for sample size determination for planning these studies is proposed. It incorporates routines for adjusting for the expected heterogeneities and imbalances in the data and the within-sample replicate correlations.
This tool takes longitudinal dataset as input and analyzes if there is significant change of the features over time (a proxy for treatments), while detects and controls for covariates simultaneously. LongDat is able to take in several data types as input, including count, proportion, binary, ordinal and continuous data. The output table contains p values, effect sizes and covariates of each feature, making the downstream analysis easy.
This package provides a tool that improves the prediction performance of multilevel regression with post-stratification (MrP
) by combining a number of machine learning methods. For information on the method, please refer to Broniecki, Wüest, Leemann (2020) Improving Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP
) in the Journal of Politics'. Final pre-print version: <https://lucasleemann.files.wordpress.com/2020/07/automrp-r2pa.pdf>.
Automatize downloading of meteorological and hydrological data from publicly available repositories: OGIMET (<http://ogimet.com/index.phtml.en>), University of Wyoming - atmospheric vertical profiling data (<http://weather.uwyo.edu/upperair/>), Polish Institute of Meteorology and Water Management - National Research Institute (<https://danepubliczne.imgw.pl>), and National Oceanic & Atmospheric Administration (NOAA). This package also allows for searching geographical coordinates for each observation and calculate distances to the nearest stations.
An extension of knitr that adds flexibility in several ways. One common source of frustration with knitr is that it assumes the directory where the source file lives should be the working directory, which is often not true. ezknitr addresses this problem by giving you complete control over where all the inputs and outputs are, and adds several other convenient features to make rendering markdown/HTML documents easier.
This package implements the AdaptiveImpute
matrix completion algorithm of Intelligent Initialization and Adaptive Thresholding for Iterative Matrix Completion <doi:10.1080/10618600.2018.1518238> as well as the specialized variant of Co-Factor Analysis of Citation Networks <doi:10.1080/10618600.2024.2394464>. AdaptiveImpute
is useful for embedding sparsely observed matrices, often out performs competing matrix completion algorithms, and self-tunes its hyperparameter, making usage easy.
This package provides a logistic regression tree is a decision tree with logistic regressions at its leaves. A particular stochastic expectation maximization algorithm is used to draw a few good trees, that are then assessed via the user's criterion of choice among BIC / AIC / test set Gini. The formal development is given in a PhD
chapter, see Ehrhardt (2019) <https://github.com/adimajo/manuscrit_these/releases/>.
This package provides a not-so-comprehensive list of methods for estimating graphon, a symmetric measurable function, from a single or multiple of observed networks. For a detailed introduction on graphon and popular estimation techniques, see the paper by Orbanz, P. and Roy, D.M.(2014) <doi:10.1109/TPAMI.2014.2334607>. It also contains several auxiliary functions for generating sample networks using various network models and graphons.
This package provides a compilation of functions to create visually appealing and information-rich plots of meta-analytic data using ggplot2'. Currently allows to create forest plots, funnel plots, and many of their variants, such as rainforest plots, thick forest plots, additional evidence contour funnel plots, and sunset funnel plots. In addition, functionalities for visual inference with the funnel plot in the context of meta-analysis are provided.
An open-source implementation of latent variable methods and multivariate modeling tools. The focus is on exploratory analyses using dimensionality reduction methods including low dimensional embedding, classical multivariate statistical tools, and tools for enhanced interpretation of machine learning methods (i.e. intelligible models to provide important information for end-users). Target domains include extension to dedicated applications e.g. for manufacturing process modeling, spectroscopic analyses, and data mining.