Traditional phasing programs are limited to diploid organisms. Our method modifies Li and Stephens algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haplotype searches in a multiple infection setting. This package is primarily developed as part of the Pf3k project, which is a global collaboration using the latest sequencing technologies to provide a high-resolution view of natural variation in the malaria parasite Plasmodium falciparum. Parasite DNA are extracted from patient blood sample, which often contains more than one parasite strain, with unknown proportions. This package is used for deconvoluting mixed haplotypes, and reporting the mixture proportions from each sample.
Testing and documenting code that communicates with remote databases can be painful. Although the interaction with R is usually relatively simple (e.g. data(frames) passed to and from a database), because they rely on a separate service and the data there, testing them can be difficult to set up, unsustainable in a continuous integration environment, or impossible without replicating an entire production cluster. This package addresses that by allowing you to make recordings from your database interactions and then play them back while testing (or in other contexts) all without needing to spin up or have access to the database your code would typically connect to.
The R4EPIs project <https://r4epi.github.io/sitrep/> seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from Medecins Sans Frontieres Operational Centre Amsterdam for outbreak scenarios (Acute Jaundice Syndrome, Cholera, Diphtheria, Measles, Meningitis) and surveys (Retrospective mortality and access to care, Malnutrition, Vaccination coverage and Event Based Surveillance) - as described in the following <https://scienceportal.msf.org/assets/standardised-mortality-surveys?utm_source=chatgpt.com>. In addition, a data generator from these dictionaries is provided. It is also possible to read in any Open Data Kit format data dictionary.
Computes a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs. These are described in Balland (2017) <http://econ.geo.uu.nl/peeg/peeg1709.pdf>.
This package provides a suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. â ideanetâ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, â ideanetâ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.
Automatic disaggregation of small-area population estimates by demographic groups (e.g., age, sex, race, marital status, educational level, etc) along with the estimates of uncertainty, using advanced Bayesian statistical modelling approaches based on integrated nested Laplace approximation (INLA) Rue et al. (2009) <doi:10.1111/j.1467-9868.2008.00700.x> and stochastic partial differential equation (SPDE) methods Lindgren et al. (2011) <doi:10.1111/j.1467-9868.2011.00777.x>. The package implements hierarchical Bayesian modeling frameworks for small area estimation as described in Leasure et al. (2020) <doi:10.1073/pnas.1913050117> and Nnanatu et al. (2025) <doi:10.1038/s41467-025-59862-4>.
This package performs predictions of totals and weighted sums, or finite population block kriging, on spatial data using the methods in Ver Hoef (2008) <doi:10.1007/s10651-007-0035-y>. The primary outputs are an estimate of the total, mean, or weighted sum in the region, an estimated prediction variance, and a plot of the predicted and observed values. This is useful primarily to users with ecological data that are counts or densities measured on some sites in a finite area of interest. Spatial prediction for the total count or average density in the entire region can then be done using the functions in this package.
The package offers functions for analyzing and interactively exploring large-scale single-cell RNA-seq datasets. Pagoda2 primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. pagoda2 was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos.
For a balanced design of experiments, this package calculates the sample size required to detect a certain standardized effect size, under a significance level. This package also provides three graphs; detectable standardized effect size vs power, sample size vs detectable standardized effect size, and sample size vs power, which show the mutual relationship between the sample size, power and the detectable standardized effect size. The detailed procedure is described in R. V. Lenth (2006-9) <https://homepage.divms.uiowa.edu/~rlenth/Power/>, Y. B. Lim (1998), M. A. Kastenbaum, D. G. Hoel and K. O. Bowman (1970) <doi:10.2307/2334851>, and Douglas C. Montgomery (2013, ISBN: 0849323312).
Estimates the time-varying reproduction number, rate of spread, and doubling time using a renewal equation approach combined with Bayesian inference via Stan. Supports Gaussian process and random walk priors for modelling changes in transmission over time. Accounts for delays between infection and observation (incubation period, reporting delays), right-truncation in recent data, day-of-week effects, and observation overdispersion. Can estimate relationships between primary and secondary outcomes (e.g., cases to hospitalisations or deaths) and forecast both. Runs across multiple regions in parallel. Based on Abbott et al. (2020) <doi:10.12688/wellcomeopenres.16006.1> and Gostic et al. (2020) <doi:10.1101/2020.06.18.20134858>.
Meta-analysis of generalized additive models and generalized additive mixed models. A typical use case is when data cannot be shared across locations, and an overall meta-analytic fit is sought. metagam provides functionality for removing individual participant data from models computed using the mgcv and gamm4 packages such that the model objects can be shared without exposing individual data. Furthermore, methods for meta-analysing these fits are provided. The implemented methods are described in Sorensen et al. (2020), <doi:10.1016/j.neuroimage.2020.117416>, extending previous works by Schwartz and Zanobetti (2000) and Crippa et al. (2018) <doi:10.6000/1929-6029.2018.07.02.1>.
We implement functions allowing for mediation analysis to be performed in cases where the mediator is a count variable with excess zeroes. First a function is provided allowing users to perform analysis for zero-inflated count variables using the marginalized zero-inflated Poisson (MZIP) model (Long et al. 2014 <DOI:10.1002/sim.6293>). Using the counterfactual approach to mediation and MZIP we can obtain natural direct and indirect effects for the overall population. Using delta method processes variance estimation can be performed instantaneously. Alternatively, bootstrap standard errors can be used. We also provide functions for cases with exposure-mediator interactions with four-way decomposition of total effect.
Nonparametric survival function estimates and semiparametric regression for the multivariate failure time data with right-censoring. For nonparametric survival function estimates, the Volterra, Dabrowska, and Prentice-Cai estimates for bivariate failure time data may be computed as well as the Dabrowska estimate for the trivariate failure time data. Bivariate marginal hazard rate regression can be fitted for the bivariate failure time data. Functions are also provided to compute (bootstrap) confidence intervals and plot the estimates of the bivariate survival function. For details, see "The Statistical Analysis of Multivariate Failure Time Data: A Marginal Modeling Approach", Prentice, R., Zhao, S. (2019, ISBN: 978-1-4822-5657-4), CRC Press.
This package provides methods and tools for forecasting univariate time series using the NARFIMA (Neural AutoRegressive Fractionally Integrated Moving Average) model. It combines neural networks with fractional differencing to capture both nonlinear patterns and long-term dependencies. The NARFIMA model supports seasonal adjustment, Box-Cox transformations, optional exogenous variables, and the computation of prediction intervals. In addition to the NARFIMA model, this package provides alternative forecasting models including NARIMA (Neural ARIMA), NBSTS (Neural Bayesian Structural Time Series), and NNaive (Neural Naive) for performance comparison across different modeling approaches. The methods are based on algorithms introduced by Chakraborty et al. (2025) <doi:10.48550/arXiv.2509.06697>.
An implementation of the sample size computation method for network models proposed by Constantin et al. (2023) <doi:10.1037/met0000555>. The implementation takes the form of a three-step recursive algorithm designed to find an optimal sample size given a model specification and a performance measure of interest. It starts with a Monte Carlo simulation step for computing the performance measure and a statistic at various sample sizes selected from an initial sample size range. It continues with a monotone curve-fitting step for interpolating the statistic across the entire sample size range. The final step employs stratified bootstrapping to quantify the uncertainty around the fitted curve.
It is a versatile tool for predicting time series data using Long Short-Term Memory (LSTM) models. It is specifically designed to handle time series with an exogenous variable, allowing users to denote whether data was available for a particular period or not. The package encompasses various functionalities, including hyperparameter tuning, custom loss function support, model evaluation, and one-step-ahead forecasting. With an emphasis on ease of use and flexibility, it empowers users to explore, evaluate, and deploy LSTM models for accurate time series predictions and forecasting in diverse applications. More details can be found in Garai and Paul (2023) <doi:10.1016/j.iswa.2023.200202>.
TEMPoral TEnsor Decomposition (TEMPTED), is a dimension reduction method for multivariate longitudinal data with varying temporal sampling. It formats the data into a temporal tensor and decomposes it into a summation of low-dimensional components, each consisting of a subject loading vector, a feature loading vector, and a continuous temporal loading function. These loadings provide a low-dimensional representation of subjects or samples and can be used to identify features associated with clusters of subjects or samples. TEMPTED provides the flexibility of allowing subjects to have different temporal sampling, so time points do not need to be binned, and missing time points do not need to be imputed.
R7RS-small Scheme library for reading and writing RSV (Rows of String Values) data format, a very simple binary format for storing tables of strings. It is a competitor for e.g. CSV (Comma Seperated Values), and TSV (Tab Separated Values). Its main benefit is that the strings are represented as Unicode encoded as UTF-8, and the value and row separators are byte values that are never used in UTF-8, so the strings do not need any error prone escaping and thus can be written and read verbatim.
Specified in https://github.com/Stenway/RSV-Specification and demonstrated in https://www.youtube.com/watch?v=tb_70o6ohMA.
This package provides tools to estimate tail area-based false discovery rates as well as local false discovery rates for a variety of null models (p-values, z-scores, correlation coefficients, t-scores). The proportion of null values and the parameters of the null distribution are adaptively estimated from the data. In addition, the package contains functions for non-parametric density estimation (Grenander estimator), for monotone regression (isotonic regression and antitonic regression with weights), for computing the greatest convex minorant (GCM) and the least concave majorant (LCM), for the half-normal and correlation distributions, and for computing empirical higher criticism (HC) scores and the corresponding decision threshold.
The empirical cumulative average deviation function introduced by the author is utilized to develop both Ad- and Ud-plots. The Ad-plot can identify symmetry, skewness, and outliers of the data distribution, including anomalies. The Ud-plot created by slightly modifying Ad-plot is exceptional in assessing normality, outperforming normal QQ-plot, normal PP-plot, and their derivations. The d-value that quantifies the degree of proximity between the Ud-plot and the graph of the estimated normal density function helps guide to make decisions on confirmation of normality. Full description of this methodology can be found in the article by Wijesuriya (2025) <doi:10.1080/03610926.2024.2440583>.
This package provides a tool that implements the clustering algorithms from mothur (Schloss PD et al. (2009) <doi:10.1128/AEM.01541-09>). clustur make use of the cluster() and make.shared() command from mothur'. Our cluster() function has five different algorithms implemented: OptiClust', furthest', nearest', average', and weighted'. OptiClust is an optimized clustering method for Operational Taxonomic Units, and you can learn more here, (Westcott SL, Schloss PD (2017) <doi:10.1128/mspheredirect.00073-17>). The make.shared() command is always applied at the end of the clustering command. This functionality allows us to generate and create clustering and abundance data efficiently.
This package implements multiple variants of the Information Bottleneck ('IB') method for clustering datasets containing continuous, categorical (nominal/ordinal) and mixed-type variables. The package provides deterministic, agglomerative, generalized, and standard IB clustering algorithms that preserve relevant information while forming interpretable clusters. The Deterministic Information Bottleneck is described in Costa et al. (2026) <doi:10.1016/j.patcog.2026.113580>. The standard IB method originates from Tishby et al. (2000) <doi:10.48550/arXiv.physics/0004057>, the agglomerative variant from Slonim and Tishby (1999) <https://papers.nips.cc/paper/1651-agglomerative-information-bottleneck>, and the generalized IB from Strouse and Schwab (2017) <doi:10.1162/NECO_a_00961>.
This package contains implementations of the integrative Cox model with uncertain event times proposed by Wang, et al. (2020) <doi:10.1214/19-AOAS1287>, the regularized Cox cure rate model with uncertain event status proposed by Wang, et al. (2023) <doi:10.1007/s12561-023-09374-w>, and other survival analysis routines including the Cox cure rate model proposed by Kuk and Chen (1992) <doi:10.1093/biomet/79.3.531> via an EM algorithm proposed by Sy and Taylor (2000) <doi:10.1111/j.0006-341X.2000.00227.x>, the regularized Cox cure rate model with elastic net penalty following Masud et al. (2018) <doi:10.1177/0962280216677748>.
This package contains functions that allow Bayesian meta-analysis (1) with binomial data, counts(y) and total counts (n) or, (2) with user-supplied point estimates and associated variances. Case (1) provides an analysis based on the logit transformation of the sample proportion. This methodology is also appropriate for combining data from sample surveys and related sources. The functions can calculate the corresponding similarity matrix. More details can be found in Cahoy and Sedransk (2023), Cahoy and Sedransk (2022) <doi:10.1007/s42519-018-0027-2>, Evans and Sedransk (2001) <doi:10.1093/biomet/88.3.643>, and Malec and Sedransk (1992) <doi:10.1093/biomet/79.3.593>.