Node centrality measures for temporal networks. Available measures are temporal degree centrality, temporal closeness centrality and temporal betweenness centrality defined by Kim and Anderson (2012) <doi:10.1103/PhysRevE.85.026107>. Applying the REN algorithm by Hanke and Foraita (2017) <doi:10.1186/s12859-017-1677-x> when calculating the centrality measures keeps the computational running time linear in the number of graph snapshots. Further, all methods can run in parallel up to the number of nodes in the network.
HGC (short for Hierarchical Graph-based Clustering) is an R package for conducting hierarchical clustering on large-scale single-cell RNA-seq (scRNA-seq) data. The key idea is to construct a dendrogram of cells on their shared nearest neighbor (SNN) graph. HGC provides functions for building graphs and for conducting hierarchical clustering on the graph. The users with old R version could visit https://github.com/XuegongLab/HGC/tree/HGC4oldRVersion to get HGC package built for R 3.6.
This package contains a robust set of tools designed for constructing deep neural networks, which are highly adaptable with user-defined loss function and probability models. It includes several practical applications, such as the (deepAFT) model, which utilizes a deep neural network approach to enhance the accelerated failure time (AFT) model for survival data. Another example is the (deepGLM) model that applies deep neural network to the generalized linear model (glm), accommodating data types with continuous, categorical and Poisson distributions.
In the context of data quality assessment, this package provides a number of functions for evaluating data quality across various dimensions, including completeness, plausibility, concordance, conformance, currency, timeliness, and correctness. It has been developed based on two well-known frameworksâ Michael G. Kahn (2016) <doi: 10.13063/2327-9214.1244> and Nicole G. Weiskopf (2017) <doi: 10.5334/egems.218>â for data quality assessment. Using this package, users can evaluate the quality of their datasets, provided that corresponding metadata are available.
Runs a Shiny App in the local machine for basic statistical and graphical analyses. The point-and-click interface of Shiny App enables obtaining the same analysis outputs (e.g., plots and tables) more quickly, as compared with typing the required code in R, especially for users without much experience or expertise with coding. Examples of possible analyses include tabulating descriptive statistics for a variable, creating histograms by experimental groups, and creating a scatter plot and calculating the correlation between two variables.
This package provides a framework to build and evaluate diagnosis or prognosis models using stacking, voting, and bagging ensemble techniques with various base learners. The package also includes tools for visualization and interpretation of models. The development version of the package is available on GitHub at <https://github.com/xiaojie0519/E2E>. The methods are based on the foundational work of Breiman (1996) <doi:10.1007/BF00058655> on bagging and Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> on stacking.
Access to the datasets and many of the functions used in "Statistics Using R: An Integrative Approach". These datasets include a subset of the National Education Longitudinal Study, the Framingham Heart Study, as well as several simulated datasets used in the examples throughout the textbook. The functions included in the package reproduce some of the functionality of Stata that is not directly available in R'. The package also contains a tutorial on basic data frame management, including how to handle missing data.
R-dsb improves protein expression analysis in droplet-based single-cell studies. The package specifically addresses noise in raw protein UMI counts from methods like CITE-seq. It identifies and removes two main sources of noise—protein-specific noise from unbound antibodies and droplet/cell-specific noise. The package is applicable to various methods, including CITE-seq, REAP-seq, ASAP-seq, TEA-seq, and Mission Bioplatform data. Check the vignette for tutorials on integrating dsb with Seurat and Bioconductor, and using dsb in Python.
This package provides functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data, in particular representation, manipulation and simulation of multistate data - the Lexis suite of functions, which includes interfaces to the mstate, etm and cmprsk packages. It also contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
APL is a package developed for computation of Association Plots (AP), a method for visualization and analysis of single cell transcriptomics data. The main focus of APL is the identification of genes characteristic for individual clusters of cells from input data. The package performs correspondence analysis (CA) and allows to identify cluster-specific genes using Association Plots. Additionally, APL computes the cluster-specificity scores for all genes which allows to rank the genes by their specificity for a selected cell cluster of interest.
This package provides a complete toolkit for connecting R environments with Large Language Models (LLMs). Provides utilities for describing R objects, package documentation, and workspace state in plain text formats optimized for LLM consumption. Supports multiple workflows: interactive copy-paste to external chat interfaces, programmatic tool registration with ellmer chat clients, batteries-included chat applications via shinychat', and exposure to external coding agents through the Model Context Protocol. Project configuration files enable stable, repeatable conversations with project-specific context and preferred LLM settings.
The Bayesian Federated Inference ('BFI') method combines inference results obtained from local data sets in the separate centers. In this version of the package, the BFI methodology is programmed for linear, logistic and survival regression models. For GLMs, see Jonker, Pazira and Coolen (2024) <doi:10.1002/sim.10072>; for survival models, see Pazira, Massa, Weijers, Coolen and Jonker (2025) <doi:10.48550/arXiv.2404.17464>; and for heterogeneous populations, see Jonker, Pazira and Coolen (2025) <doi:10.1017/rsm.2025.6>.
It provides functions to compute the values of different modifications of the Rand and Wallace indices. The indices are used to measure the stability or similarity of two partitions obtained on two different sets of units with a non-empty intercept. Splitting and merging of clusters can (depends on the selected index) have a different effect on the value of the indices. The indices are proposed in Cugmas and Ferligoj (2018) <http://ibmi.mf.uni-lj.si/mz/2018/no-1/Cugmas2018.pdf>.
This package contains functions for calculating the Federal Highway Administration (FHWA) Transportation Performance Management (TPM) performance measures. Currently, the package provides methods for the System Reliability and Freight (PM3) performance measures calculated from travel time data provided by The National Performance Management Research Data Set (NPMRDS), including Level of Travel Time Reliability (LOTTR), Truck Travel Time Reliability (TTTR), and Peak Hour Excessive Delay (PHED) metric scores for calculating statewide reliability performance measures. Implements <https://www.fhwa.dot.gov/tpm/guidance/pm3_hpms.pdf>.
This package provides a model for the growth of self-limiting populations using three, four, or five parameter functions, which have wide applications in a variety of fields. The dependent variable in a dynamical modeling could be the population size at time x, where x is the independent variable. In the analysis of quantitative polymerase chain reaction (qPCR), the dependent variable would be the fluorescence intensity and the independent variable the cycle number. This package then would calculate the TWW cycle threshold.
Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <doi:10.2140/astat.2023.14.37>, Barnhill et al. (2023) <doi:10.48550/arXiv.2303.02539>, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <doi:10.1007/s11538-024-01327-8>, Yoshida et al. (2022) <doi:10.1109/TCBB.2024.3420815>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.
Computes the exact observation weights for the Kalman filter and smoother, based on the method described in Koopman and Harvey (2003) <www.sciencedirect.com/science/article/pii/S0165188902000611>. The package supports in-depth exploration of state-space models, enabling researchers and practitioners to extract meaningful insights from time series data. This functionality is especially valuable in dynamic factor models, where the computed weights can be used to decompose the contributions of individual variables to the latent factors. See the README file for examples.
This package provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census, including SF1 (Decennial short-form), SF3 (Decennial long-form), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs objects. The package provides new methods to conduct standard operations on acs objects and present/plot data in statistically appropriate ways.
This LPE library is used to do significance analysis of microarray data with small number of replicates. It uses resampling based FDR adjustment, and gives less conservative results than traditional BH or BY procedures. Data accepted is raw data in txt format from MAS4, MAS5 or dChip. Data can also be supplied after normalization. LPE library is primarily used for analyzing data between two conditions. To use it for paired data, see LPEP library. For using LPE in multiple conditions, use HEM library.
This package provides a testing framework for testing the multivariate point null hypothesis. A testing framework described in Elder et al. (2022) <arXiv:2203.01897> to test the multivariate point null hypothesis. After the user selects a parameter of interest and defines the assumed data generating mechanism, this information should be encoded in functions for the parameter estimator and its corresponding influence curve. Some parameter and data generating mechanism combinations have codings in this package, and are explained in detail in the article.
This code provides a method to fit the hidden compact representation model as well as to identify the causal direction on discrete data. We implement an effective solution to recover the above hidden compact representation under the likelihood framework. Please see the Causal Discovery from Discrete Data using Hidden Compact Representation from NIPS 2018 by Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang and Zhifeng Hao (2018) <https://nips.cc/Conferences/2018/Schedule?showEvent=11274> for a description of some of our methods.
Variable selection techniques are essential tools for model selection and estimation in high-dimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) (Fan and Lv (2008)<doi:10.1111/j.1467-9868.2008.00674.x>) and all of its variants in generalized linear models (Fan and Song (2009)<doi:10.1214/10-AOS798>) and the Cox proportional hazards model (Fan, Feng and Wu (2010)<doi:10.1214/10-IMSCOLL606>).
Uniform Error Index is the weighted average of different error measures. Uniform Error Index utilizes output from different error function and gives more robust and stable error values. This package has been developed to compute Uniform Error Index from ten different loss function like Error Square, Square of Square Error, Quasi Likelihood Error, LogR-Square, Absolute Error, Absolute Square Error etc. The weights are determined using Principal Component Analysis (PCA) algorithm of Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
This package provides functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for "water quality" and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter.