Trend filtering is a widely used nonparametric method for knot detection. This package provides an efficient solution for L0 trend filtering, avoiding the traditional methods of using Lagrange duality or Alternating Direction Method of Multipliers algorithms. It employ a splicing approach that minimizes L0-regularized sparse approximation by transforming the L0 trend filtering problem. The package excels in both efficiency and accuracy of trend estimation and changepoint detection in segmented functions. References: Wen et al. (2020) <doi:10.18637/jss.v094.i04>; Zhu et al. (2020)<doi:10.1073/pnas.2014241117>; Wen et al. (2023) <doi:10.1287/ijoc.2021.0313>.
Statnet is a collection of packages for statistical network analysis that are designed to work together because they share common data representations and API design. They provide an integrated set of tools for the representation, visualization, analysis, and simulation of many different forms of network data. This package is designed to make it easy to install and load the key statnet packages in a single step. Learn more about statnet at <http://www.statnet.org>. Tutorials for many packages can be found at <https://github.com/statnet/Workshops/wiki>. For an introduction to functions in this package, type help(package='statnet').
Plot official statistics time series conveniently: automatic legends, highlight windows, stacked bar chars with positive and negative contributions, sum-as-line option, two y-axes with automatic horizontal grids that fit both axes and other popular chart types. tstools comes with a plethora of defaults to let you plot without setting an abundance of parameters first, but gives you the flexibility to tweak the defaults. In addition to charts, tstools provides a super fast, data.table backed time series I/O that allows the user to export / import long format, wide format and transposed wide format data to various file types.
The barbieQ package provides a series of robust statistical tools for analysing barcode count data generated from cell clonal tracking (i.e., lineage tracing) experiments. In these experiments, an initial cell and its offspring collectively form a clone (i.e., lineage). A unique barcode sequence, incorporated into the DNA of the inital cell, is inherited within the clone. This one-to-one mapping of barcodes to clones enables clonal tracking of their behaviors. By counting barcodes, researchers can quantify the population abundance of individual clones under specific experimental perturbations. barbieQ supports barcode count data preprocessing, statistical testing, and visualization.
sRACIPE implements a randomization-based method for gene circuit modeling. It allows us to study the effect of both the gene expression noise and the parametric variation on any gene regulatory circuit (GRC) using only its topology, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises. sRACIPE provides a holistic picture to evaluate the effects of both the stochastic nature of cellular processes and the parametric variation.
Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include syuzhet (default) developed in the Nebraska Literary Lab, afinn developed by Finn Arup Nielsen, bing developed by Minqing Hu and Bing Liu, and nrc developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the get_sentiment function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.
R7RS-small Scheme library for reading and writing RSV data format, a very simple binary format for storing tables of strings. It is a competitor for CSV (Comma Separated Values) and TSV (Tab Separated Values). Its main benefit is that the strings are represented as Unicode encoded as UTF-8, and the value and row separators are byte values that are never used in UTF-8, so the strings do not need any error prone escaping and thus can be written and read verbatim.
The RSV format is specified in https://github.com/Stenway/RSV-Specification.
This is a analysis toolkit to streamline the analyses of minicircle sequence diversity in population-scale genome projects. rKOMICS is a user-friendly R package that has simple installation requirements and that is applicable to all 27 trypanosomatid genera. Once minicircle sequence alignments are generated, rKOMICS allows to examine, summarize and visualize minicircle sequence diversity within and between samples through the analyses of minicircle sequence clusters. We showcase the functionalities of the (r)KOMICS tool suite using a whole-genome sequencing dataset from a recently published study on the history of diversification of the Leishmania braziliensis species complex in Peru. Analyses of population diversity and structure highlighted differences in minicircle sequence richness and composition between Leishmania subspecies, and between subpopulations within subspecies. The rKOMICS package establishes a critical framework to manipulate, explore and extract biologically relevant information from mitochondrial minicircle assemblies in tens to hundreds of samples simultaneously and efficiently. This should facilitate research that aims to develop new molecular markers for identifying species-specific minicircles, or to study the ancestry of parasites for complementary insights into their evolutionary history. ***** !! WARNING: this package relies on dependencies from Bioconductor. For Mac users, this can generate errors when installing rKOMICS. Install Bioconductor and ComplexHeatmap at advance: install.packages("BiocManager"); BiocManager::install("ComplexHeatmap") *****.
This software solves an Advection Bi-Flux Diffusive Problem using the Finite Difference Method FDM. Vasconcellos, J.F.V., Marinho, G.M., Zanni, J.H., 2016, Numerical analysis of an anomalous diffusion with a bimodal flux distribution. <doi:10.1016/j.rimni.2016.05.001>. Silva, L.G., Knupp, D.C., Bevilacqua, L., Galeao, A.C.N.R., Silva Neto, A.J., 2014, Formulation and solution of an Inverse Anomalous Diffusion Problem with Stochastic Techniques. <doi:10.5902/2179460X13184>. In this version, it is possible to include a source as a function depending on space and time, that is, s(x,t).
Imports PxStat data in JSON-stat format and (optionally) reshapes it into wide format. The Central Statistics Office (CSO) is the national statistical institute of Ireland and PxStat is the CSOs online database of Official Statistics. This database contains current and historical data series compiled from CSO statistical releases and is accessed at <https://data.cso.ie>. The CSO PxStat Application Programming Interface (API), which is accessed in this package, provides access to PxStat data in JSON-stat format at <https://data.cso.ie>. This dissemination tool allows developers machine to machine access to CSO PxStat data.
Find, visualize and explore patterns of differential taxa in vegetation data (namely in a phytosociological table), using the Differential Value (DiffVal). Patterns are searched through mathematical optimization algorithms. Ultimately, Total Differential Value (TDV) optimization aims at obtaining classifications of vegetation data based on differential taxa, as in the traditional geobotanical approach (Monteiro-Henriques 2025, <doi:10.3897/VCS.140466>). The Gurobi optimizer, as well as the R package gurobi', can be installed from <https://www.gurobi.com/products/gurobi-optimizer/>. The useful vignette Gurobi Installation Guide, from package prioritizr', can be found here: <https://prioritizr.net/articles/gurobi_installation_guide.html>.
Compares the fit of alternative models of continuous trait differentiation between sister species and other paired lineages. Differences in trait means between two lineages arise as they diverge from a common ancestor, and alternative processes of evolutionary divergence are expected to leave unique signatures in the distribution of trait differentiation in datasets comprised of many lineage pairs. Models include approximations of divergent selection, drift, and stabilizing selection. A variety of model extensions facilitate the testing of process-to-pattern hypotheses. Users supply trait data and divergence times for each lineage pair. The fit of alternative models is compared in a likelihood framework.
Designed for genomic and proteomic data analysis, enabling unbiased PubMed searching, protein interaction network visualization, and comprehensive data summarization. This package aims to help users identify novel targets within their data sets based on protein network interactions and publication precedence of target's association with research context based on literature precedence. Methods in this package are described in detail in: Douglas (Year) <to-be-added DOI or link to the preprint>. Key functionalities of this package also leverage methodologies from previous works, such as: - Szklarczyk et al. (2023) <doi:10.1093/nar/gkac1000> - Winter (2017) <doi:10.32614/RJ-2017-066>.
This package implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.
Based on the work of Curi, Converse, Hajewski, and Oliveira (2019) <doi:10.1109/IJCNN.2019.8852333>. This package provides easy-to-use functions which create a variational autoencoder (VAE) to be used for parameter estimation in Item Response Theory (IRT) - namely the Multidimensional Logistic 2-Parameter (ML2P) model. To use a neural network as such, nontrivial modifications to the architecture must be made, such as restricting the nonzero weights in the decoder according to some binary matrix Q. The functions in this package allow for straight-forward construction, training, and evaluation so that minimal knowledge of tensorflow or keras is required.
When using pooled p-values to adjust for multiple testing, there is an inherent balance that must be struck between rejection based on weak evidence spread among many tests and strong evidence in a few, explored in Salahub and Olford (2023) <arXiv:2310.16600>. This package provides functionality to compute marginal and central rejection levels and the centrality quotient for p-value pooling functions and provides implementations of the chi-squared quantile pooled p-value (described in Salahub and Oldford (2023)) and a proposal from Heard and Rubin-Delanchy (2018) <doi:10.1093/biomet/asx076> to control the quotient's value.
Fits Bayesian hierarchical spatial and spatial-temporal process models for point-referenced Gaussian, Poisson, binomial, and binary data using stacking of predictive densities. It involves sampling from analytically available posterior distributions conditional upon candidate values of the spatial process parameters and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Our algorithm is highly parallelizable and hence, much faster than traditional Markov chain Monte Carlo algorithms while delivering competitive predictive performance. See Zhang, Tang, and Banerjee (2025) <doi:10.1080/01621459.2025.2566449>, and, Pan, Zhang, Bradley, and Banerjee (2025) <doi:10.48550/arXiv.2406.04655> for details.
Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Some functions for drawing some special plots: The function bagplot plots a bagplot, faces plots chernoff faces, iconplot plots a representation of a frequency table or a data matrix, plothulls plots hulls of a bivariate data set, plotsummary plots a graphical summary of a data set, puticon adds icons to a plot, skyline.hist combines several histograms of a one dimensional data set in one plot, slider functions supports some interactive graphics, spin3R helps an inspection of a 3-dim point cloud, stem.leaf plots a stem and leaf plot, stem.leaf.backback plots back-to-back versions of stem and leaf plot.
This package implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.
Simulation of segments shared identical-by-descent (IBD) by pedigree members. Using sex specific recombination rates along the human genome (Halldorsson et al. (2019) <doi:10.1126/science.aau1043>), phased chromosomes are simulated for all pedigree members. Applications include calculation of realised relatedness coefficients and IBD segment distributions. ibdsim2 is part of the pedsuite collection of packages for pedigree analysis. A detailed presentation of the pedsuite', including a separate chapter on ibdsim2', is available in the book Pedigree analysis in R (Vigeland, 2021, ISBN:9780128244302). A Shiny app for visualising and comparing IBD distributions is available at <https://magnusdv.shinyapps.io/ibdsim2-shiny/>.
Social Relation Model (SRM) analyses for single or multiple round-robin groups are performed. These analyses are either based on one manifest variable, one latent construct measured by two manifest variables, two manifest variables and their bivariate relations, or two latent constructs each measured by two manifest variables. Within-group t-tests for variance components and covariances are provided for single groups. For multiple groups two types of significance tests are provided: between-groups t-tests (as in SOREMO) and enhanced standard errors based on Lashley and Bond (1997) <DOI:10.1037/1082-989X.2.3.278>. Handling for missing values is provided.
Model data with a suspected clustering structure (either in co-variate space, regression space or both) using a Bayesian product model with a logistic regression likelihood. Observations are represented graphically and clusters are formed through various edge removals or additions. Cluster quality is assessed through the log Bayesian evidence of the overall model, which is estimated using either a Sequential Monte Carlo sampler or a suitable transformation of the Bayesian Information Criterion as a fast approximation of the former. The internal Iterated Batch Importance Sampling scheme (Chopin (2002 <doi:10.1093/biomet/89.3.539>)) is made available as a free standing function.
Facilitates use and analysis of data about the armed conflict in Colombia resulting from the joint project between La Jurisdicción Especial para la Paz (JEP), La Comisión para el Esclarecimiento de la Verdad, la Convivencia y la No repetición (CEV), and the Human Rights Data Analysis Group (HRDAG). The data are 100 replicates from a multiple imputation through chained equations as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. With the replicates the user can examine four human rights violations that occurred in the Colombian conflict accounting for the impact of missing fields and fully missing observations.