Routines to handle family data with a pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. Also includes a tool for pedigree drawing which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the pedigree object with various criteria, and kinship for the X chromosome.
Estimation of a multi-group count regression models (i.e., Poisson, negative binomial) with latent covariates. This packages provides two extensions compared to ordinary count regression models based on a generalized linear model: First, measurement models for the predictors can be specified allowing to account for measurement error. Second, the count regression can be simultaneously estimated in multiple groups with stochastic group weights. The marginal maximum likelihood estimation is described in Kiefer & Mayer (2020) <doi:10.1080/00273171.2020.1751027>.
Simulate a (bivariate) multivariate renewal Hawkes (MRHawkes) self-exciting process, with given immigrant hazard rate functions and offspring density function. Calculate the likelihood of a MRHawkes process with given hazard rate functions and offspring density function for an (increasing) sequence of event times. Calculate the Rosenblatt residuals of the event times. Predict future event times based on observed event times up to a given time. For details see Stindl and Chen (2018) <doi:10.1016/j.csda.2018.01.021>.
This package provides an end-to-end workflow for integrative analysis of two omics layers using sparse canonical correlation analysis (sCCA), including sample alignment, feature selection, network edge construction, and visualization of gene-metabolite relationships. The underlying methods are based on penalized matrix decomposition and sparse CCA (Witten, Tibshirani and Hastie (2009) <doi:10.1093/biostatistics/kxp008>), with design principles inspired by multivariate integrative frameworks such as mixOmics (Rohart et al. (2017) <doi:10.1371/journal.pcbi.1005752>).
This package provides tools for analyzing data generated from conjoint survey experiments, a method widely used in the social sciences for studying multidimensional preferences. The package implements estimation of marginal means (MMs) and average marginal component effects (AMCEs), with corrections for measurement error. Methods include profile-level and choice-level estimators, bias correction using intra-respondent reliability (IRR), and visualization utilities. For details on the methodology, see Clayton, Horiuchi, Kaufman, King, and Komisarchik (2025) <https://gking.harvard.edu/conjointE>.
This package implements the American Heart Association Predicting Risk of cardiovascular disease EVENTs (PREVENT) equations from Khan SS, Matsushita K, Sang Y, and colleagues (2023) <doi:10.1161/CIRCULATIONAHA.123.067626>, with optional comparison with their de facto predecessor, the Pooled Cohort Equations from the American Heart Association and American College of Cardiology (2013) <doi:10.1161/01.cir.0000437741.48606.98> and the revision to the Pooled Cohort Equations from Yadlowsky and colleagues (2018) <doi:10.7326/M17-3011>.
Monitoring reporting rates of subject-level clinical events (e.g. adverse events, protocol deviations) reported by clinical trial sites is an important aspect of risk-based quality monitoring strategy. Sites that are under-reporting or over-reporting events can be detected using bootstrap simulations during which patients are redistributed between sites. Site-specific distributions of event reporting rates are generated that are used to assign probabilities to the observed reporting rates. (Koneswarakantha 2024 <doi:10.1007/s43441-024-00631-8>).
Computes phylogenetic distances between any two taxa using hierarchical lineage data retrieved from The Taxonomicon <http://taxonomicon.taxonomy.nl>, a comprehensive curated classification of all life based on Systema Naturae 2000 (Brands, 1989 <http://taxonomicon.taxonomy.nl>). Given any two taxon names, retrieves their full lineages, identifies the most recent common ancestor (MRCA), and computes a dissimilarity index based on the depth of the most recent common ancestor. Supports individual distance queries, pairwise distance matrices, clade filtering, and lineage utilities.
This package aims to analyse count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits. The method is built a rich flexible model that allows for the effects, on the methylation levels, of multiple covariates to vary smoothly along genomic regions. At the same time, this method also allows for sequencing errors and can adjust for variability in cell type mixture.
Automatic model selection for structural time series decomposition into trend, cycle, and seasonal components, plus optionality for structural interpolation, using the Kalman filter. Koopman, Siem Jan and Marius Ooms (2012) "Forecasting Economic Time Series Using Unobserved Components Time Series Models" <doi:10.1093/oxfordhb/9780195398649.013.0006>. Kim, Chang-Jin and Charles R. Nelson (1999) "State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications" <doi:10.7551/mitpress/6444.001.0001><http://econ.korea.ac.kr/~cjkim/>.
For researchers to quickly and comprehensively acquire disease genes, so as to understand the mechanism of disease, we developed this program to acquire disease-related genes. The data is integrated from three public databases. The three databases are eDGAR', DrugBank and MalaCards'. The eDGAR is a comprehensive database, containing data on the relationship between disease and genes. DrugBank contains information on 13443 drugs and 5157 targets. MalaCards integrates human disease information, including disease-related genes.
Provide early termination phase II trial designs with a decreasingly informative prior (DIP) or a regular Bayesian prior chosen by the user. The program can determine the minimum planned sample size necessary to achieve the user-specified admissible designs. The program can also perform power and expected sample size calculations for the tests in early termination Phase II trials. See Wang C and Sabo RT (2022) <doi:10.18203/2349-3259.ijct20221110>; Sabo RT (2014) <doi:10.1080/10543406.2014.888441>.
BEAST2 (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAUti 2 (which is part of BEAST2') is a GUI tool that allows users to specify the many possible setups and generates the XML file BEAST2 needs to run. This package provides a way to create BEAST2 input files without active user input, but using R function calls instead.
This package provides functions to perform the following analyses: i) inferring epistasis from RNAi double knockdown data; ii) identifying gene pairs of multiple mutation patterns; iii) assessing association between gene pairs and survival; and iv) calculating the smallworldness of a graph (e.g., a gene interaction network). Data and analyses are described in Wang, X., Fu, A. Q., McNerney, M. and White, K. P. (2014). Widespread genetic epistasis among breast cancer genes. Nature Communications. 5 4828. <doi:10.1038/ncomms5828>.
This package provides various functions for reading and preparing the Panel Study of Income Dynamics (PSID) for longitudinal analysis, including functions that read the PSID's fixed width format files directly into R, rename all of the PSID's longitudinal variables so that recurring variables have consistent names across years, simplify assembling longitudinal datasets from cross sections of the PSID Family Files, and export the resulting PSID files into file formats common among other statistical programming languages ('SAS', STATA', and SPSS').
Simultaneously estimates sparse regression coefficients and response network structure in multivariate models with missing data. Unlike traditional approaches requiring imputation, handles missingness natively through unbiased estimating equations (MCAR/MAR compatible). Employs dual L1 regularization with automated selection via cross-validation or information criteria. Includes parallel computation, warm starts, adaptive grids, publication-ready visualizations, and prediction methods. Ideal for genomics, neuroimaging, and multi-trait studies with incomplete high-dimensional outcomes. See Zeng et al. (2025) <doi:10.48550/arXiv.2507.05990>.
This R package provides a calculation of between-cases AUC estimate, corresponding covariance, and variance estimate in the nested data problem. Also, the package has the function to simulate the nested data. The calculated between-cases AUC estimate is used to evaluate the reader's diagnostic performance in clinical tasks with nested data. For more details on the above methods, please refer to the paper by H Du, S Wen, Y Guo, F Jin, BD Gallas (2022) <doi:10.1177/09622802221111539>.
Creation of linkage maps in polyploid species from marker dosage scores of an F1 cross from two heterozygous parents. Currently works for outcrossing diploid, autotriploid, autotetraploid and autohexaploid species, as well as segmental allotetraploids. Methods are described in a manuscript of Bourke et al. (2018) <doi:10.1093/bioinformatics/bty371>. Since version 1.1.0, both discrete and probabilistic genotypes are acceptable input; for more details on the latter see Liao et al. (2021) <doi:10.1007/s00122-021-03834-x>.
This package contains functions to fit proportional hazards (PH) model to partly interval-censored (PIC) data (Pan et al. (2020) <doi:10.1177/0962280220921552>), PH model with spatial frailty to spatially dependent PIC data (Pan and Cai (2021) <doi:10.1080/03610918.2020.1839497>), and mixed effects PH model to clustered PIC data. Each random intercept/random effect can follow both a normal prior and a Dirichlet process mixture prior. It also includes the corresponding functions for general interval-censored data.
Various functions for discrete time survival analysis and longitudinal analysis. SIMEX method for correcting for bias for errors-in-variables in a mixed effects model. Asymptotic mean and variance of different proportional hazards test statistics using different ties methods given two survival curves and censoring distributions. Score test and Wald test for regression analysis of grouped survival data. Calculation of survival curves for events defined by the response variable in a mixed effects model crossing a threshold with or without confirmation.
The tcplfit2 R package performs basic concentration-response curve fitting. The original tcplFit() function in the tcpl R package performed basic concentration-response curvefitting to 3 models. With tcplfit2, the core tcpl concentration-response functionality has been expanded to process diverse high-throughput screen (HTS) data generated at the US Environmental Protection Agency, including targeted ToxCast, high-throughput transcriptomics (HTTr) and high-throughput phenotypic profiling (HTPP). tcplfit2 can be used independently to support analysis for diverse chemical screening efforts.
Implementation of four extensions of the Zipf distribution: the Marshall-Olkin Extended Zipf (MOEZipf) Pérez-Casany, M., & Casellas, A. (2013) <arXiv:1304.4540>, the Zipf-Poisson Extreme (Zipf-PE), the Zipf-Poisson Stopped Sum (Zipf-PSS) and the Zipf-Polylog distributions. In log-log scale, the two first extensions allow for top-concavity and top-convexity while the third one only allows for top-concavity. All the extensions maintain the linearity associated with the Zipf model in the tail.
This package is a rasterization preprocessing framework that aggregates cellular information into spatial pixels to reduce resource requirements for spatial omics data analysis. SEraster reduces the number of points in spatial omics datasets for downstream analysis through a process of rasterization where single cells gene expression or cell-type labels are aggregated into equally sized pixels based on a user-defined resolution. SEraster can be incorporated with other packages to conduct downstream analyses for spatial omics datasets, such as detecting spatially variable genes.
This package lets you compute the median ranking according to Kemeny's axiomatic approach. Rankings can or cannot contain ties, rankings can be both complete or incomplete. The package contains both branch-and-bound algorithms and heuristic solutions recently proposed. The searching space of the solution can either be restricted to the universe of the permutations or unrestricted to all possible ties. The package also provides some useful utilities for deal with preference rankings, including both element-weight Kemeny distance and correlation coefficient.