Calculates total survey error (TSE) for a survey under multiple, different weighting schemes, using both scale-dependent and scale-independent metrics. Package works directly from the data set, with no hand calculations required: just upload a properly structured data set (see TESTWGT and its documentation), properly input column names (see functions documentation), and run your functions. For more on TSE, see: Weisberg, Herbert (2005, ISBN:0-226-89128-3); Biemer, Paul (2010) <doi:10.1093/poq/nfq058>; Biemer, Paul et.al. (2017, ISBN:9781119041672); etc.
This package provides an algorithm to detect and characterize disturbances (start, end dates, intensity) that can occur at different hierarchical levels by studying the dynamics of longitudinal observations at the unit level and group level based on Nadaraya-Watson's smoothing curves, but also a shiny app which allows to visualize the observations and the detected disturbances. Finally the package provides a dataframe mimicking a pig farming system subsected to disturbances simulated according to Le et al.(2022) <doi:10.1016/j.animal.2022.100496>.
This package provides an implementation of multilayered visualizations for enhanced graphical representation of functional analysis data. It combines and integrates omics data derived from expression and functional annotation enrichment analyses. Its plotting functions have been developed with an hierarchical structure in mind: starting from a general overview to identify the most enriched categories (modified bar plot, bubble plot) to a more detailed one displaying different types of relevant information for the molecules in a given set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap).
This package contains miscellaneous functions used to interpret and translate, factorize and negate Sum of Products expressions, for both binary and multi-value crisp sets, and to extract information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly flexible recoding routine and a more flexible alternative to the base function with().
We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity. Jiang, W., Song, S., Hou, L. and Zhao, H. "A set of efficient methods to generate high-dimensional binary data with specified correlation structures." The American Statistician. See <doi:10.1080/00031305.2020.1816213> for a detailed presentation of the method.
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with MatchIt', WeightIt', MatchThem', twang', Matching', optmatch', CBPS', ebal', cem', sbw', and designmatch for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
This package provides a collection of functions which fit functional neural network models. In other words, this package will allow users to build deep learning models that have either functional or scalar responses paired with functional and scalar covariates. We implement the theoretical discussion found in Thind, Multani and Cao (2020) <arXiv:2006.09590> through the help of a main fitting and prediction function as well as a number of helper functions to assist with cross-validation, tuning, and the display of estimated functional weights.
It offers comprehensive tools for the analysis of functional time series data, focusing on white noise hypothesis testing and goodness-of-fit evaluations, alongside functions for simulating data and advanced visualization techniques, such as 3D rainbow plots. These methods are described in Kokoszka, Rice, and Shang (2017) <doi:10.1016/j.jmva.2017.08.004>, Yeh, Rice, and Dubin (2023) <doi:10.1214/23-EJS2112>, Kim, Kokoszka, and Rice (2023) <doi:10.1214/23-ss143>, and Rice, Wirjanto, and Zhao (2020) <doi:10.1111/jtsa.12532>.
Imputing blockwise missing data by imprecise imputation, featuring a domain-based, variable-wise, and case-wise strategy. Furthermore, the estimation of lower and upper bounds for unconditional and conditional probabilities based on the obtained imprecise data is implemented. Additionally, two utility functions are supplied: one to check whether variables in a data set contain set-valued observations; and another to merge two already imprecisely imputed data. The method is described in a technical report by Endres, Fink and Augustin (2018, <doi:10.5282/ubm/epub.42423>).
This package provides functions to perform all steps of genome-wide association meta-analysis for studying Genotype x Environment interactions, from collecting the data to the manhattan plot. The procedure accounts for the potential correlation between studies. In addition to the Fixed and Random models, one can investigate the relationship between QTL effects and some qualitative or quantitative covariate via the test of contrast and the meta-regression, respectively. The methodology is available from: (De Walsche, A., et al. (2025) \doi10.1371/journal.pgen.1011553).
This package provides a system to increase the efficiency of dynamic web-scraping with RSelenium by leveraging parallel processing. You provide a function wrapper for your RSelenium scraping routine with a set of inputs, and parsel runs it in several browser instances. Chunked input processing as well as error catching and logging ensures seamless execution and minimal data loss, even when unforeseen RSelenium errors occur. You can additionally build safe scraping functions with minimal coding by utilizing constructor functions that act as wrappers around RSelenium methods.
PHATE is a tool for visualizing high dimensional single-cell data with natural progressions or trajectories. PHATE uses a novel conceptual framework for learning and visualizing the manifold inherent to biological systems in which smooth transitions mark the progressions of cells from one state to another. To see how PHATE can be applied to single-cell RNA-seq datasets from hematopoietic stem cells, human embryonic stem cells, and bone marrow samples, check out our publication in Nature Biotechnology at <doi:10.1038/s41587-019-0336-3>.
Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.
Simulate complex data from a given directed acyclic graph and information about each individual node. Root nodes are simply sampled from the specified distribution. Child Nodes are simulated according to one of many implemented regressions, such as logistic regression, linear regression, poisson regression or any other function. Also includes a comprehensive framework for discrete-time simulation, and networks-based simulation which can generate even more complex longitudinal and dependent data. For more details, see Robin Denz, Nina Timmesfeld (2025) <doi:10.48550/arXiv.2506.01498>.
Time series outlier detection with non parametric test. This is a new outlier detection methodology (washer): efficient for time saving elaboration and implementation procedures, adaptable for general assumptions and for needing very short time series, reliable and effective as involving robust non parametric test. You can find two approaches: single time series (a vector) and grouped time series (a data frame). For other informations: Andrea Venturini (2011) Statistica - Universita di Bologna, Vol.71, pp.329-344. For an informal explanation look at R-bloggers on web.
This package provides tools to compute ordinal, statistics and effect sizes as an alternative to mean comparison: Cliff's delta or success rate difference (SRD), Vargha and Delaney's A or the Area Under a Receiver Operating Characteristic Curve (AUC), the discrete type of McGraw & Wong's Common Language Effect Size (CLES) or Grissom & Kim's Probability of Superiority (PS), and the Number needed to treat (NNT) effect size. Moreover, comparisons to Cohen's d are offered based on Huberty & Lowman's Percentage of Group (Non-)Overlap considerations.
Indole-3-acetaldoxime (IAOx) represents an early intermediate of the biosynthesis of a variety of indolic secondary metabolites including the phytoanticipin indol-3-ylmethyl glucosinolate and the phytoalexin camalexin (3-thiazol-2'-yl-indole). Arabidopsis thaliana cyp79B2 cyp79B3 double knockout plants are completely impaired in the conversion of tryptophan to indole-3-acetaldoxime and do not accumulate IAOx-derived metabolites any longer. Consequently, comparative analysis of wild-type and cyp79B2 cyp79B3 plant lines has the potential to explore the complete range of IAOx-derived indolic secondary metabolites.
SPICEY (SPecificity Index for Coding and Epigenetic activitY) is an R package designed to quantify cell-type specificity in single-cell transcriptomic and epigenomic data, particularly scRNA-seq and scATAC-seq. It introduces two complementary indices: the Gene Expression Tissue Specificity Index (GETSI) and the Regulatory Element Tissue Specificity Index (RETSI), both based on entropy to provide continuous, interpretable measures of specificity. By integrating gene expression and chromatin accessibility, SPICEY enables standardized analysis of cell-type-specific regulatory programs across diverse tissues and conditions.
Automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.) Originally inspired by the AutoNormalize library for Python by Alteryx (<https://github.com/alteryx/autonormalize>), with various changes and improvements. Automatic discovery of functional or approximate dependencies, normalisation based on those, and plotting of the resulting "database" via Graphviz', with options to exclude some attributes at discovery time, or remove discovered dependencies at normalisation time.
Generates Monte Carlo confidence intervals for standardized regression coefficients (beta) and other effect sizes, including multiple correlation, semipartial correlations, improvement in R-squared, squared partial correlations, and differences in standardized regression coefficients, for models fitted by lm(). betaMC combines ideas from Monte Carlo confidence intervals for the indirect effect (Pesigan and Cheung, 2024 <doi:10.3758/s13428-023-02114-4>) and the sampling covariance matrix of regression coefficients (Dudgeon, 2017 <doi:10.1007/s11336-017-9563-z>) to generate confidence intervals effect sizes in regression.
This package provides a general framework using mixture Weibull distributions to accurately predict biomarker-guided trial duration accounting for heterogeneous population. Extensive simulations are performed to evaluate the impact of heterogeneous population and the dynamics of biomarker characteristics and disease on the study duration. Several influential parameters including median survival time, enrollment rate, biomarker prevalence and effect size are identified. Efficiency gains of biomarker-guided trials can be quantitatively compared to the traditional all-comers design. For reference, see Zhang et al. (2024) <arXiv:2401.00540>.
Set the R prompt dynamically, from a function. The package contains some examples to include various useful dynamic information in the prompt: the status of the last command (success or failure); the amount of memory allocated by the current R process; the name of the R package(s) loaded by pkgload and/or devtools'; various git information: the name of the active branch, whether it is dirty, if it needs pushes pulls. You can also create your own prompt if you don't like the predefined examples.
This package creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling) (Cortes, (2018) <arXiv:1811.01908>), which usually leads to very sparse user and item factors (over 90% zero-valued). Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.
Symbolic calculation and evaluation of multivariate polynomials with rational coefficients. This package is strongly inspired by the spray package. It provides a function to compute Gröbner bases (reference <doi:10.1007/978-3-319-16721-3>). It also includes some features for symmetric polynomials, such as the Hall inner product. The header file of the C++ code can be used by other packages. It provides the templated class Qspray that can be used to represent and to deal with multivariate polynomials with another type of coefficients.